Promise of personalized omics to precision medicine
The rapid development of high-throughput technologies and computational frameworks enables the examination of biological systems in unprecedented detail. The ability to study biological phenomena at omics levels in turn is expected to lead to significant advances in personalized and precision medicine. Patients can be treated according to their own molecular characteristics. Individual omes as well as the integrated profiles of multiple omes, such as the genome, the epigenome, the transcriptome, the proteome, the metabolome, the antibodyome, and other omics information are expected to be valuable for health monitoring, preventative measures, and precision medicine. Moreover, omics technologies have the potential to transform medicine from traditional symptom-oriented diagnosis and treatment of diseases toward disease prevention and early diagnostics. We discuss here the advances and challenges in systems biology-powered personalized medicine at its current stage, as well as a prospective view of future personalized health care at the end of this review. WIREs Syst Biol Med 2013, 5:73–82. doi: 10.1002/wsbm.1198
Conflict of interest: M.S. serves as founder and consultant for Personalis, a member of the scientific advisory board of GenapSys, and a consultant for Illumina.
For further resources related to this article, please visit the WIREs website.
Personalized or precision medicine is expected to become the paradigm of future health care, owing to the substantial improvement of high-throughput technologies and systems approaches in the past two decades.1,2Conventional symptoms-oriented disease diagnosis and treatment has a number of significant limitations: for example, it focuses on only late/terminal symptoms and generally neglects preclinical pathophenotypes or risk factors; it generally disregards the underlying mechanisms of the symptoms; the disease descriptions are often quite broad so that they may actually include multiple diseases with shared symptoms; the reductionist approach to identify therapeutic targets in traditional medicine may over-simplify the complex nature of most diseases.3 Advances in the ability to perform large-scale genetic and molecular profiling are expected to overcome these limitations by addressing individualized differences in diagnosis and treatment in unprecedented detail.
The rapid development of high-throughput technologies also drives modern biological and medical researches from traditional hypothesis-driven designs toward data-driven studies. Modern high-throughput technologies, such as high-throughout DNA sequencing and mass spectrometry, have enabled the facile monitoring of thousands of molecules simultaneously instead of just a few components that have been analyzed in traditional research, thus generating a huge amount of data to document the real-time molecular details of a given biological system. Ultimately, when enough knowledge is gained, these molecular signatures, as well as the biological networks they form, may be associated with the physiological state/phenotype of the biological system at the very moment when the sample is taken.
Future personalized health care is expected to benefit from the combined personal omics data, which should include genomic information as well as longitudinal documentation of all possible molecular components. This combined information not only determines the genetic susceptibility of the person, but also monitors his/her real-time physiological states, as our integrative Personal Omics Profile (iPOP) study exemplified.4 In this review we will cover recent advances in systems biology and personalized medicine. We will also discuss limitations and concerns in applying omics approaches to individualized, precision health care.
GENOMICS IN DISEASE-ORIENTED MEDICINE
The revolution of omics profiling technologies significantly benefited disease-oriented studies and health care, especially in disease mechanism elucidation, molecular diagnosis, and personalized treatment. These new technologies greatly facilitated the development of genomics, transcriptomics, proteomics, and metabolomics, which have become powerful tools for disease studies. Today, molecular disease analyses using large-scale approaches are pursued by an increasing number of physicians and pathologists.5,6
Initially, genome-wide association studies (http://gwas.nih.gov/) were launched in search of association of common genetic variants to certain phenotypes of interest, which typically assayed more than 500,000 single nucleotide polymorphisms (SNPs) and/or copy number variations (CNVs) with DNA microarrays in thousands to hundred thousands of participants.7 To date, 1,355 publications are listed in the National Human Genome Research Institute (NHGRI) GWAS Catalog reporting the association of 7,226 SNPs with 710 complex traits.7 The studied complex traits vary vastly, from cancers (e.g., prostate cancer and breast cancer) and complex diseases (e.g., type 1 and type 2 diabetes (T2D), Crohn’s Disease) to common traits (e.g., height and body mass index). These findings greatly broadened our knowledge on disease loci, and can potentially benefit disease risk prediction and drug treatments (as discussed in the section Integrative Omics in Preventative Medicine). Although powerful, GWAS studies have proven difficult for most complex diseases as typically a large number of loci are identified, each contributing to a small fraction of the genetic risk. These studies have many limitations including the small fraction of the genome that is analyzed, and failure to account for gene-gene interactions, epistasis and environmental factors.8
Whole genome sequencing (WGS) and whole exome sequencing (WES) have become more and more affordable for genomic studies and are rapidly replacing DNA microarrays. Single-base analysis of a genome/exome is achieved, which allows scientists to investigate the genetic basis of health and disease in unprecedented detail. Assigning variants to paternal and maternal chromosomes i.e. ‘phasing’ can be obtained through the analysis of families9or other methods.1,10,11 With the generation of massive amount of whole genome and exome data from diseased and healthy populations, understanding of both human population variation and genetic diseases, especially complex diseases, has been brought to a new level.1,12
One field that significantly benefited from WGS technologies is cancer-related research. A large number of cancer genomes have been sequenced through individual or collaborative efforts, such as the International Cancer Genome Consortium (http://www.icgc.org/) and the Cancer Genome Atlas (http://cancergenome.nih.gov/). The DNA from many types of cancer have been sequenced, including breast cancer,13–15 chronic lymphocytic leukaemia,16 hepatocellular carcinoma,17 pediatric glioblastoma,13melanoma,18 ovarian cancer,19 small-cell lung cancer,20 and Sonic-Hedgehog medulloblastoma,21 and databases are established, such as the cancer cell line encyclopedia.22 In addition, single-cell level cancer genome has also been investigated by WES for clear cell renal cell carcinoma23 and JAK2-negative myeloproliferative neoplasm.24 Somatic mutations and subtyping molecular markers were identified from these genomes. These different studies have revealed that nearly every tumor is different with distinct types of potential ‘driver’ mutations. Importantly, cancer genome sequencing often reveals potential targets that may suggest precision cancer treatment for the specific patients. As an example, a novel spontaneous germline mutation in the p53 gene was identified by WGS in a female patient, which accounted for the three types of cancers she developed in merely 5 years.25 An attempt has been made recently to treat a female patient with T Cell Lymphoma based on the target gene, CTLA4, identified by whole genome sequencing.26 The patient’s cancer was suppressed for two months with the anti-CTLA4 drug ipilimumab, although she died of recurrence soon after.
Whole genome and exome sequencing can also facilitate the identification of possible causal genes for hereditary genetic diseases, and is increasingly used in attempts to understand the basis of these ‘mystery diseases’ once obvious candidates are ruled out. In one successful example, whole genome sequencing of a fraternal twin pair with dopa (3,4-dihydroxyphenylalanine)-responsive dystonia helped the identification of one pair of personalized compound heterozygous mutations in the gene SPR, which accounted for the disease in both individuals.27 Importantly, based on the genome information the authors supplemented the l-dopa therapy with 5-hydroxytryptophan (SPR-dependent serotonin precursor) and significantly improved the health of both patients. In another example, Roach et al. sequenced the whole genomes of a family quartet and identified rare mutations in the genes DHODH and DNAH5 responsible for the two recessive disorders in both children—Miller syndrome and primary ciliary dyskinesia.28
Pharmacogenomics is another important application of genomic sequencing. It is known that the same drug may have different effect on different individuals due to their personal genomic background and living habits.8,29Genetic information can be used to assign drug doses and reduce side effects. For example, genetic variants are known to affect patients’ response to antipsychotic drugs.30 Based on pharmacogenomic trials, genetic tests for four drugs are required by the US Food and Drug Administration (FDA) before the administration of these drugs to patients, including the anti-cancer drugs cetuximab, trastuzumab, and dasatinib, and the anti-HIV drug maraviroc, and more are recommended such as the anticoagulant drug Warfarin and the anti-HIV drug Abacavir.8
OTHER OMICS TECHNOLOGIES AND MEDICINE
Other omics technologies are also likely to impact medicine. High throughput sequencing technologies have enabled whole transcriptome (cDNA) sequencing, or abbreviated as RNA-Seq.31 RNA-Seq has become a powerful tool for disease-related studies, as it has great accuracy and sensitivity relative to microarray technology and it can also detect splicing isoforms.32 As RNA profiles reflect actual gene activity, it is closer to the real phenotype compared to genomic sequence. With RNA-Seq, Shah et al. discovered varied clonal preference and allelic abundance in 104 cases of primary triple-negative breast cancers, and observed that ∼36% of the genomic mutations were actually expressed.33 Combining such information with genomic information may be valuable in treatment of cancer and other diseases. Moreover, RNA-Seq also captures more complex aspects of the transcriptome, such as splicing isoforms34 and editing events,35 which are generally overlooked by hybridization-based methods. Splicing variants have now been associated with several distinct types of cancer and cancer prognosis.36–40
Although proteins have long been deemed as the executors of most biological functions, clinical proteomics is still a relatively young field due to technological limitations to profile the complexity of the proteome with high sensitivity and accuracy. Since the development of new soft desorption methods that enabled the analysis of biological macromolecules with mass spectrometry, proteomics advanced significantly in the past decade.41,42With current mass spectrometry technology, one can now quantify thousands of proteins in a single sample. For example, we were able to reliably detect 6,280 proteins in the human peripheral blood mononuclear cell proteome.4Mass spectrometry also allows the detection of expressed mutations, allele-specific sequences and editing events in the human proteome,4,43 as well as profiling of the phosphoproteome.44 Also of note is the MALDI-TOF (matrix-assisted laser desorption/ionization-time of flight) mass spectrometry-based imaging technology (MALDI-MSI) developed by Cornett et al., which allows spatial proteome profiling in defined two-dimentional laser-shot areas using tissue sections.45 Using MALDI-MSI, Kang et al. identified immunoglobulin heavy constant α2 as a novel potential marker for breast cancer metastasis.46
The field of metabolomics has also advanced significantly with the improvement of mass spectrometry. Both hydrophilic and hydrophobic metabolites can be profiled in specific samples.4,47 As the metabolome reflects the real-time energy status as well as metabolism of the living organism, it is expected that certain metabolome profiles may be associated with different diseases.48 Therefore, metabolomic profiles become an important aspect for personalized medicine.49,50 Jamshidi et al. profiled the metabolome of a female patient with Hereditary Hemorrhagic Telangiectasia (HHT) along with four healthy controls, and identified differences which highlighted the nitric oxide synthase pathway.51 The authors then treated the patient with bevacizumab and shifted her metabolomic profile toward those of the healthy controls and improved the patient’s health. In addition, branched-chain amino acids such as isoleucine have been associated with T2D and may ultimately prove to be valuable biomarkers.52 Finally, since some metabolites bind and directly regulate the activity of other biomolecules (e.g., kinases),53 there is significant potential to modulate cellular pathways using diet and metabolic analogs that serve as agonist or antagonist of protein function.
INTEGRATIVE OMICS IN PREVENTATIVE MEDICINE
The concept of personalized medicine emphasizes not only personalized diagnosis and treatment, but also personalized disease susceptibility assessment, health monitoring and preventative medicine. Because disease is easier to manage prior to it onset or when a disease is at its early stages, risk assessment and early detection will be transformative in personalized medicine. Systems biology has the potential to capture real-time molecular phenotypes of a biological system, which enables the detection of subtle network perturbations preluding the actual development of clinical symptoms.
Disease susceptibility and drug response can be assessed with a person’s genomic information.8 This information may serve as a guideline for monitoring the health of a particular patient to achieve personalized health care, as showcased by Ashley et al.54 Whole genome sequence revealed variants for both high-penetrance Mendelian disorders, such as HTT(Huntington’s disease55) and PAH (Phenylketonuria56), as well as common, complex diseases, such as the disease-associated genetic variants reported in GWAS studies.57 Disease risks can be evaluated for a given person and an increase or decrease in disease risk compared with the population risk (of the same ethnicity, age, and gender) can be estimated (Figure 1). In the study of Ashley et al., the genome of a patient was analyzed and increased post-test probability risks for myocardial infarction and coronary artery disease were estimated.54 Their estimation matched the fact that the patient, although generally healthy, had a family history of vascular disease as well as early sudden death.58 Genetic variants associated with heart-related morbidities as well as drug response were identified in the patient’s genome, the information of which, as the authors stated, may direct the future health care for this particular patient. Similarly, Dewey et al. further extended this work by analysing a family quartet using a major allele reference sequence, and identified high-risk genes for familial thrombophilia, obesity, and psoriasis.59
To further explore variation and power of the full human genome, projects and databases (such as the Personal Genome Project60) are being launched to help advance this field. However, genomic information alone usually is not adequate to predict disease onset, and other factors such as environment are expected to play a critical role in this process.61,62 The predictive capability of whole genome sequence was assessed by Roberts et al. through modeling 24 disease risks in monozygotic twins.63 For each disease, the authors modeled the genotype distribution in the twin population according to the observed concordance/discordance, and discovered that for most individuals and most diseases, the relative risk would be tested negative compared to the population, and in the best-case scenario, only one disease or more could be forewarned for any individual. The results of Roberts et al. are not surprising, as disease manifestation is probabilistic and not deterministic. Nonetheless, whole genome information by itself is expected to have partial value in disease prediction for complex diseases. In addition, from a systems point of view, peripheral components of the biological network would be more likely to contribute to complex diseases, as perturbation of the main nodes, which are usually essential genes, would be lethal.64 Therefore it is more difficult to identify the exact contributors of complex diseases. Moreover, as stated above, non-genomic factors may also exist and further complicate the situation. As an example of this, multiple sclerosis is known to have genetic components, however, Baranzini et al. failed to identify genomic, epigenomic or transcriptomic contributors in discordant monozygotic twins, which may indicate the existence of other factors, such as the environment.65
Current technologies, especially high-throughput sequencing and mass spectrometry, enable the monitoring of at least 105 molecular components, including DNA, RNA, protein, and metabolites in the human body. Therefore it is now feasible to identify the profiles of these components that correlate with various physiological states of the body, and profile alterations as a result of physiological state changes and diseases. Compared with genomic sequences alone, the profiles of transcriptome, proteome and metabolome are closer indicators to the real-time phenotype, therefore collecting these omics information in a longitudinal manner would allow monitoring of an individual’s physiological states. To test this concept, we implemented a study by following a generally healthy participant for 14 (now 32) months with integrated Personal Omics Profile (iPOP) analysis, incorporating information of the participant’s genome with longitudinal data from the person’s transcriptome, proteome, metabolome, and autoantibodyome.4 As blood constantly circulates the human body and exchanges biological matters with local tissues and is presently analyzed in medical tests, we chose to monitor the participant’s physiological states by profiling the blood components (PBMCs, serum and plasma) with iPOP analysis. The genome of this individual was sequenced with two WGS (Illumina and Complete Genomics) and three WES (Agilent, Roche Nimblegen, and Illumina) platforms to achieve high accuracy, which was further analyzed for disease risk and drug efficiency. The identified elevated risks included coronary artery disease, basal-cell carcinoma, hypertriglyceridemia and T2D, and the participant was estimated to have favorable response to rosiglitazone and metformin, both are antidiabetic medications. Although the participant has a known family history for some of the high-risk diseases (but not T2D), he was free from most of them (except for hypertriglyceridemia, for which he used medication) and had a normal Body Mass Index at the start of our study. Nonetheless, these elevated disease risks served as a guideline to monitor his personal health with iPOP analysis. We profiled the transcriptome, proteome and metabolome from 20 time points in the 14 months, and monitored molecular profile changes for physiological state change events during our study, including two viral infections. The subject also acquired T2D during the study, immediately after one of the viral (respiratory syncytial virus) infections. Two types of changes were observed from our iPOP data: the autocorrelated trends that reflect chronic changes, and the spikes which include significantly up/down-regulated genes and pathways especially at the onset of each event. With our iPOP approach, we acquired a comprehensive picture of detailed molecular differences between different physiological states, as well as during disease onset. In particular, interesting changes in glucose and insulin signaling pathways were observed during the onset of T2D. We also obtained other important information from our omics data, such as dynamic changes in allele-specific expression and RNA-editing events, as well as personalized autoantibody profiles. Overall, this study revealed an important application of the use of genomics and other omics profiling for personalized disease risk estimation and precision medicine, as we discovered the increased T2D risk, monitored its early onset, and helped the participant effectively control and eventually reverse the phenotype by proactive interventions (diet change and physical exercise).
Another important feature of our study is that samples are collected in a longitudinal fashion so that aberrant/disease states can be compared to healthy states of the same individual. One other advantage of our iPOP approach is its modularity, as other omics and quantifiable information can also be included in the iPOP profile, which can be readily tailored to monitor any biological or pathological event of interest (Figure 2). Examples of other information are: epigenome,66 gut microbiome,67 microRNA profiles68 and immune receptor repertoire.69 Moreover, quantifiable behavioral parameters such as nutrition, exercise, stress control and sleep may also be added to the profile.70
THE IMPORTANCE OF DATA MINING AND RE-MINING
One important aspect of systems biology is data mining. Data management and access can become a daunting task given the tremendous amount of data generated with current high-throughput technologies, and the data size is constantly increasing with time.71 Challenges exist computationally in each step to handle, process and annotate high-throughput data, integrate data from different sources and platforms, and pursue clinical interpretation of the data.72 These steps can be quite computationally intensive and require significant computational hardware; for example, to map short reads to achieve 30× coverage of the human genome, 13 CPU days is typically required72 although these times are rapidly decreasing. Moreover, as biological systems act more than just the sum of its individual parts, knowledge from multiple levels (such as epistasis, interaction, localization, and activation status) should be considered to capture the underlying highly organized networks for functional annotations.73 Ultimately it will be important to have a comprehensive database that contains Electronic Health records (including treatment information), genome sequences with variant calls and as much molecular information as possible. In principle with appropriate algorithms such a database could be mined by physicians to make data-driven medical decisions.
Currently many high-throughput datasets of similar types (e.g., expression and genome-wide association data collected from different populations of the same disease) were created as smaller, separate studies. Thus combining these publicly available datasets bioinformatically may provide more statistical power and lead to a clearer conclusion that could not be achieved in the individual studies. The work by Roberts et al. mentioned above serves as one example.63 In order to test the capacity of whole genome information, the authors combined monozygotic twin pair data from a total of five sources in 13 publications to obtain a much large dataset for their test. Similarly, Butte and colleagues combined the results of 130 functional microarray experiments for T2D and re-mined the data for repeatedly appeared candidate genes.74 They identified CD44 as the top candidate gene associated with T2D. In a related effort, by analyzing curated data of 2,510 individuals from 74 populations, the group led by Butte also discovered that T2D risk alleles were unevenly distributed across different human populations, with the risk higher in African and lower in Asian populations.75
CONCERNS AND LIMITATIONS
Personalized health monitoring and precision medicine is just accelerating at a rapid pace because of the development of systems biology. As noted above, multiple efforts in both technology development and biological application have occurred, and an increasing number of researchers and physicians alike are sharing this vision. Hood et al. termed this approach as ‘P4 Medicine’ for predictive, preventive, personalized and participatory medicine.12
Nevertheless, many concerns also exist, and guidelines on translational omics research have been recommended by the Institute of Medicine.76 Khoury et al. suggested ‘a fifth P’, that is, the population perspective be added to personalized medicine77 and population validation of systems results with strong evidence should be achieved before its clinical application. Many disease-associated genetic variants discovered in GWAS still need to be functionally validated.78 In addition, Khoury et al. raised concerns that restricted health care resources might be wasted if unneeded disease screening/subclassification with systems approaches were conducted rather than lowering health care costs. However, with the rapid drop in technology costs and carefully designed pilot studies, the optimal screening frequencies/levels of subclassification necessary for precision medicine could be determined and costs maintained at affordable levels. It is worth noting that generating personalized omics data with appropriate interpretation can greatly benefit our understanding of physiological events for health and disease, and precision health care as we gain more knowledge in this field. In addition to personalized diagnosis and treatment, the future of precision medicine with omics approaches should emphasize personalized health monitoring, molecular symptom, early detection and preventative medicine, a paradigm shift from traditional health care.
As the human body is a highly organized, complex system with multiple organs and tissues, it is important to select the correct sample type for understanding a specific biological problem. However, as many sample types are unavailable (e.g., brain tissue) or not regularly accessible (e.g., biopsy samples from internal organs) from living individuals, our scope for personalized health monitoring is thus restricted. Therefore systems biology results, especially iPOP results, should not be over-interpreted. Although iPOP data from blood components may indicate changes in the other parts of the human body, the actual profiles for the tissue of interest might be underrepresented in blood or delayed in phase.
It is still not clear who is to develop and deliver personalized treatments for personalized medicine if they are not available as conventional medication. The cost for developing personalized drugs may become prohibitive to accurately address personal specificity, and may face other difficulties such as Food and Drug Administration approval. However, advances in high-throughput drug discovery will help accelerate this field.
In addition, personalized medicine using omics approaches relies heavily on technology development for biological research. This includes advances in both research instrumentation and computational framework. For example, it is still not possible to accurately determine the entire sequence of a genome due to limitations of current WGS/WES methods,79,80 even after computational improvement of signal-to-noise ratio.81,82 A low sequencing error rate was claimed by both the Illumina HiSeq (for 2 × 100 bp reads, more than 80% of the bases have a quality score above Q30, or 99.9% accuracy, http://www.illumina.com/documents//products/datasheets/datasheet_hiseq_systems.pdf) and the Complete Genomics platform (1 × 10−5 at the time of our study80 and 2 × 10−6 as of October 8th, 2012, www.completegenomics.com); however, per variant error rate is still high (15.50% and 9.08% for Illumina and Complete Genomics respectively with no filter, and 1.01% and 1.12% post multiple filters) as reported by Reumers et al.,81 which agreed with our observation that only 88.1% of the SNP calls overlapped when the same genome was sequenced with the two platforms.80 Thus possible disease-associated variants in these platform-specific regions might be overlooked or misinterpreted. Another issue lies in storage and processing of the omics data, as petabytes of data can easily be generated for a small iPOP study of 200 participants and demanding computing resources will be needed for data analysis. Therefore, interdisciplinary efforts from biologists, computer scientists and hardware engineers should be organized to ensure the continued improvement of this field.
The era of personalized precision medicine is about to emerge. The steady improvement of high-throughput technologies greatly facilitates this process by enabling profiling of various omes such as whole genome, epigenome, transcriptome, proteome and metabolome, which convey detailed information of the human body. Integrated profiles of these omes should reflect the physiological status of the host at the time the samples are collected. Personalized omics approach catalyzes precision medicine at two levels: for diseases and biological processes whose mechanisms are still unclear, omics approach will facilitate researches that would greatly advance our understanding; and when the mechanisms are clarified, individualized health care can be provided through health monitoring, preventative medicine, and personalized treatment. This would be especially helpful for complex diseases such as autism83 and Alzheimer’s disease,84 where multiple factors are responsible for the phenotypes. Furthermore, omics approach also facilitates the development of other less-stressed but important health-related fields, such as nutritional systems biology, which studies personalized diet and its relationship to health in systems point of view.85 With the rapid decrease in the cost of omics profiling, we anticipate an increased number of personalized medicine applications in many aspects of health care besides our proof-of-principle study. This will significantly improve the health of the general public and cut down health care costs. Scientists, governments, pharmaceutical companies and patients should work closely together to ensure the success of this transformation.86
This work is supported by funding from the Stanford University Department of Genetics and the National Institutes of Health. We thank Drs. George I. Mias and Hogune Im for their help in proof-reading the article and the insightful discussions.