Jack is a clinical trial patient who has been battling leukemia for 12 years. One day, Jack complains of abdominal and chest pains and is rushed to the closest hospital. The emergency physicians decide that they would like to send Jack to surgery. They look for his electronic health record but cannot access it due to ownership restrictions put in place by Jack’s healthcare providers. The lack of timely information will impact the decisions of the treating physicians and surgical team.


The lack of data flow in the health system negatively impacts decision-making in the treatment development pipeline.

Data has value only if it is used and circulated. Clinical research makes new discoveries daily, yet our fragmented healthcare system lacks the capability to determine the best treatment for each patient. We have 50 times more data today than all the data in the last 50 years combined, yet lack infrastructure to transport this data to, and between, decision-makers. Data ownership is the elephant in the room: we disagree on whether health data should be valued, exchanged, bought, and sold. Without a system-wide intervention, data in our healthcare system will never be able to inform the treatment development pipeline and vice versa.


A human-centered data ecosystem with clear ownership rules to guide the current treatment development pipeline.

Treatment development is only as strong as the data it uses. In order to bring the right treatment to the right patient at the right time, the pipeline must utilize data from, and contribute data to, a larger health data ecosystem. Generating real time, pragmatic evidence is not enough; a human-centered data ecosystem will employ all tiers of biological and non-biological data, across therapeutic areas and stakeholders, to better respond to individual and population-based health needs. Future data flows within the ecosystem must go beyond ‘data sharing and exchange’ channels to include ‘data commoditization’ highways, enabled through a well-trained workforce, and appropriately designed regulatory pillars.


Data has value only if it is used and circulated.

Jack’s data will inform the best, most timely, treatment for him. Insights from the data ecosystem will influence future decision making.


Designing a human-centered data ecosystem will require multi-stakeholder collaboration along five main pillars.

Data Types

E.g., genomic, clinical, epidemiological, psychological, patient feedback, environmental, socio-economic, insurance, regulatory
As calls for non-frequentist approaches grow, evidence generation within the treatment development system and the implementation of those treatments must consider all human activities, around the clock, from birth-to-death.Treating human biology as a system also means recognizing that therapeutic areas are not silos and disease pathogenesis can occur within and across specialties.

Data infrastructure

E.g., data standards, quality, transfer, management, collection, aggregation
Getting data to users will require assessing data demands at the individual, city, national, and global levels. Data service providers must develop ways to measure the value of data, as well as corresponding standards that inform the quality, collection, storage, access, transfer, and reporting of data for consumption by data users.

Data agency

E.g., patients, companies, countries, research consortia
A human-centered data ecosystem will contain diverse data generating sources and subjects. Empowering these stakeholders to collect, analyze, and decide data use on the basis of their ownership rights will facilitate multiple channels of data transfer, including data donation, exchange, and commoditization.

Data workforce

E.g., standard curricula, data rights, certification, artificial intelligence, harmonization of ontology
A properly trained workforce— providers; insurers; data scientists; clinical researchers; regulators; and patients—will be needed to apply the promise of data analytics and artificial intelligence in health. All stakeholders in the health system must be equipped with the creative ability and know-how to analyze and use new and old forms of data for overall patient and societal benefit.

Data laws and regulations

E.g., ethics, privacy, security, interoperability, pricing
In order for the data economy to grow, legislators must provide clarity on what type of data can be owned, by whom, when, where, and why. This task will require adequately balancing public and private interests to resolve legal, ethical, and regulatory issues concerning data ownership and sharing.

Putting patient needs first in the 21st century means being able to have difficult conversations about data. Already, efforts like the Global Oncology Big Data Alliance,Vivli,and Project Baseline are collecting, aggregating, and analyzing various types and magnitudes of data. Coordinated, focused efforts like these are useful in advancing knowledge and collaboration within and across therapeutic areas. However, this status quo is not enough to capture the promise of big data. Our generation faces unimagined levels of information generated every second by new and existing actors. To avoid unintended consequences like Jack’s, we must overcome existing inertia to design a human-centered data ecosystem with clear ownership rules. This is a critical first step in reaching an adaptive treatment development pipeline that seamlessly anticipates and responds to individual and population-based – -health needs- -producing the right drugs for the right patients at the right time- -and saving researchers, clinicians, and policy-makers time and money.


Washington, V., DeSalvo, K.,
Mostashari, F., & Blumenthal, D.
(2017). The HITECH Era and the Path
Forward. New England Journal of
Medicine, 377(10), 904-906.

Khozin, S., Kim, G., & Pazdur, R.
(2017). Regulatory watch: From big
data to smart data: FDA’s Informed
initiative. Nature Reviews Drug
Discovery, 16(5), 306-306.

Co-Research Leads:

Sauleh Siddiqui
[email protected]
Jen Bernstein
[email protected]