The news in our blog

Verification of systems biology research in the age of collaborative competition

Verification of systems biology research in the age of collaborative competition

Nature Biotechnology
Published online

Collaborative competitions in which communities of researchers compete to solve challenges may facilitate more rigorous scrutiny of scientific results.

At a glance



  1. Current approaches to systems biology verification.
    Figure 1
  2. Example application of IMPROVER for verification of a plausible research workflow.
    Figure 2



Systems biology aims to provide a mechanistic understanding of biological systems from high-throughput data. Besides its intrinsic scientific value, this understanding will accelerate product design and development, facilitate health policy decisions and may reduce the need for long-term clinical trials. For this to happen, the knowledge generated by systems biology has to become sufficiently trustworthy for the empirical approach underlying long-term clinical trials to be supplanted by an approach in which mechanism and mechanistic understanding is a driver for decisions. This raises fundamental questions of how to evaluate the veracity of predictions from systems biology models and how to construct mechanistic models that best reflect biological phenomena—questions that are of interest to both academia and industry.

High-throughput verification of systems biology

In 2009, a report1 from the USNational Academy of Sciences (Washington, DC) highlighted four areas where biology could make major contributions: food production, improvement of human health, optimized biofuels and ecosystem restoration. Addressing these challenges requires not only multidisciplinary teams to analyze high-throughput quantitative data, but also verification of the conclusions from such analyses.

One of the obvious steps in raising the confidence of high-throughput data sets is to have better experimental and analytical techniques that yield accurate and reproducible data with known error rates. For example, verification of mass spectrometry proteomic measurements has proven difficult because the measurements can depend strongly on sample preparation, the method of detection and the biological context in which the measurements were made. One approach to address this issue has been the creation of databases such as Peptide Atlas2, a genome-mapped library of peptides derived from liquid chromatography tandem mass spectrometry proteomics experiments in multiple organisms that lends itself to easy navigation using software tools.

Another example of recent efforts to ensure data quality and reproducibility is the area of genome-wide association studies (GWAS), where researchers take an unbiased survey of common single-nucleotide polymorphisms (SNPs) across the genome and look for alleles whose presence correlates with phenotypes such as disease. Hundreds of gene candidates have been found in just a few years, although most have only a modest effect3. The difficulty is that slight differences in the genetic backgrounds of different populations or unknown pairs of relatives in a sample introduce tiny statistical shifts that pose the risk of appearing significant for some of the millions of SNPs analyzed. In response to these difficulties, researchers have adopted a well-defined, quality-control process that can be applied to new data using readily available software tools. Also, many journals are starting to require replication of results for publication of GWAS papers, and in the best scenario, another research group replicates the association study in a different cohort with a similar phenotype4.

The complex networks that translate genotype into phenotype are also highly sensitive to biological context and environmental influences. Typically, context and environment are mediated by signaling networks, for example, through the action of protein kinases. To verify predictions, it is necessary to understand how a network functions and to analyze its dynamical changes under certain conditions. A reliable source of quantitative data allows such predictions through the probabilistic integration of different sources of evidence, as in the case of NetPhorest5 (which creates an index to measure the specificity of protein kinases) or NetworKIN6 (which predicts the interactions between kinases and the substrate proteins they phosphorylate using cellular contextual information). When assessing the performance of such biological classifiers or predictions from models, it is essential to design experiments that reproduce the biological context as closely as possible, and to make use of independent data to corroborate the predictions. This is also the case for the underlying proteomics data.

Although computational methods and high-throughput experiments can be used to map interactions at the genome-wide level, they are often characterized by substantial error rates. Thus, many of the predicted interactions may be incorrect. Critically, these errors and their sources can be identified, quantified and corrected as our knowledge of the underlying system grows. Notably, for many applications it is not crucial that all predicted interactions be correct. For example, for the purpose of identifying master regulators—genes that orchestrate regulatory programs in transcriptional regulatory networks—it does not matter whether the researcher knows which transcription factor–target interactions are correct because, if a sufficiently high percentage of the interactions are correct, then in all likelihood the correct regulator will be predicted7.

Traditional approaches to validation are not particularly amenable to testing hundreds or thousands of potential interactions. However, verifying all the detailed mechanisms conjectured to underlie a biological system may be unnecessary until the model predicts something biologically important. Moreover, a hierarchical validation could be possible, where many predictions are validated at low resolution, and a few of them then investigated in greater detail.

Limitations of peer review for validation

Traditional peer review is widely considered to be one of the most important mechanisms for quality control of scientific papers. Nevertheless, as the number of published papers increases, the peer review system is under increasing strain. Indeed, it has been estimated from PubMed that in the past decade the growth rate of scientific publications was 5.6% per year, or equivalently, a doubling time of 13 years8. This results in increased burdens on peer reviewers who get little reward for their efforts. Furthermore, it is questionable whether the peer review system can objectively assess the quality of the high-throughput data and the validity of the sophisticated analyses and interpretations that nowadays pervade systems biology.

Web-based publishing has created new mechanisms for gauging the reactions to a paper in the same journal in which it is published. The discussions and opposing opinions about an article considerably enrich it, or at least they could if they were more frequently used. In general, participation in open discussions of papers has not attracted the interest of many scientists, except for a few controversial papers. One way to improve reader feedback on journal websites may be to use a unique author identifier9 that is assigned to researchers early in their career so that their online comments and reviews can be taken into account during evaluations, in addition to their reviewed publications.

The proliferation of publications, which is a sign of the faster pace of discovery, may also dilute important discoveries as they may be split across several papers. One of the responses to this reality has been the creation of annotated biological databases (pioneered by SwissProt for over 20 years) based on the peer-reviewed literature. For example, Biobase ( annotates literature data, having processed some 150,000 references on the human proteome; curation is done by trained and paid curators. A similar effort involving massive human curation is being pursued by Ingenuity ( But, comparable with some of the subjectivity that exists in the peer review process, personal biases and different conclusions may be drawn for the same paper, even by highly skilled, rigorously selected curators who follow standard operating procedures.

As the explosive growth of biomedical data strains the capacity of human curation, computational methods to mine the literature are becoming increasingly important10. But automatic text mining has its own weaknesses, such as the difficulty of extracting information from figures or tables, and the ambiguities of interpretation inherent in natural language. Biological databases, whose information is usually subjected to some human curation, contain data and annotations that should be scrutinized for accuracy. In genomic databases, inadvertent annotation errors can be propagated when the putative function of a gene is inferred based on sequence homology. For example, current methods for biochemical annotation of metabolic pathways, especially for microbes, are primarily based on sequence homology and can be inaccurate because most annotations do not provide a quantification of confidence of the homology. New methods need to be developed to police these errors in databases and avoid the propagation of incorrect information11.

Naturally, it would be better to prevent mistakes from entering into the literature or the databases in the first place. But peer review and human curation can only address some of the inaccuracies that are contained in papers or databases. Peer reviewers mainly judge the suitability of data collection protocols, accuracy of inferences and ideas, innovation, the logic of the argument and the consistency of the material being reviewed. However, it is often unfeasible or difficult for reviewers to assess the quality of data itself or the performance of the analytical methodologies described in a manuscript. This is in part due to the lack of a rigorous characterization of error rates and data quality in manuscripts. Thus, a methodology is needed for verifying the results and claims in systems biology.

The power of crowds

In this respect, crowdsourcing—engaging an interested community to collaboratively solve a problem—may be a fruitful strategy for assessing the quality of analyses and predictions from high-throughput data. As one example of such an approach, blogs and tweets can gather sizable amounts of comments on controversial papers12. For instance, as the popular media were covering a paper that identified genomic loci predicting human lifespan with 77% accuracy, scientific bloggers were already raising doubts about the methodology of the paper and the lack of rigor of its results, which blunted enthusiasm for the publication and led to its eventual retraction13.

Crowdsourcing has also been used to assess the validity of research in academic efforts such as CASP (The Critical Assessment of Protein Structure Prediction14;, CAPRI (The Critical Assessment of Prediction of Interactions;, BioCREATIVE ( and DREAM (The Dialogue on Reverse Engineering Assessments and Methods15; as well as commercially organized assessments like Kaggle ( and Innocentive ( These undertakings are organized around ‘challenges’ in which an interested community competes to verify methodologies against carefully chosen benchmarks. We briefly discuss below two projects that illustrate the power of this approach, CASP (a pioneering project on collaborative competition) and DREAM (which deals with the assessment of methods for systems biology).

CASP is a contest begun in 1994 to rank the performance of methods for predicting the three-dimensional structure of proteins based on their amino acid sequence. It is the first of the many biological community-based assessment efforts to emerge. The nine CASP competitions to date have uncovered significant stumbling blocks in the field of protein structure prediction, and they have enabled notable progress.

DREAM is a project designed to assess model predictions and pathway inference algorithms in systems biology. Like CASP, DREAM is structured in the form of challenges presented to the community, comprising open problems whose solutions (the ‘gold standards’) are known to the organizers but not to the participants. Participants submit their predictions of the solutions, which are evaluated by the organizers and eventually discussed in a conference. After the conference, all the data, predictions and gold standards are openly available to the community. This experience has shown that a rigorous scrutiny of scientific research based on community involvement is possible. The outcomes of the DREAM challenges highlight areas in which clear advances in systems biology have been made or need to be made.

Several of the challenges posed by DREAM address the problem of ‘network inference’. In these challenges, teams of researchers try to infer gene-regulatory or signaling networks from gene expression or phosphoproteomic profiles undergoing various perturbations. This is a difficult problem because currently no true gold standard exists for real biological networks. Data simulated with mathematical models that are designed to be as biologically plausible as possible can be used, as simulated data assure a systematic, rigorous assessment16. But the use of simulated data does not ensure that the challenge is necessarily realistic17. Many different methods, including regression, mutual information, correlation, Bayesian networks and others18, have been used to address this challenge. Importantly, combining individual predictions results in a solution that is highly robust and usually the most accurate, demonstrating the need for tackling complex problems as a community18, 19.

For both CASP and DREAM, as well as for most similar efforts, the goal is not about finding a single best method, but rather, reaching a better understanding of the strengths and weaknesses of these methods to enable progress in their respective disciplines.

Meeting the needs of industry

In view of the limited ability of peer review to assure the validity of complex scientific results in the area of systems biology (Fig. 1), and recognizing the power of communities to assess methodological aspects of scientific research, researchers at IBM and Philip Morris International (PMI; Neuchâtel, Switzerland) have been collaborating on a vision for quality assurance in systems biology research. Although industry shares many of the same needs for validation as academia, a methodology for verifying research is needed in the industrial setting that recognizes both speed and protection of proprietary data constraints, as well as the importance of market considerations and consumer protection. IBM and PMI have proposed a scheme called IMPROVER (industrial methodology for process verification of research; Box 1).

Box 1: IMPROVER as a means of assessing complex processes in industrial research

Figure 1: Current approaches to systems biology verification.

Current approaches to systems biology verification.

Different paths for reaching systems biology verification are represented, both for academia (blue) and for industry (red). Black represents pathways common to industry and academia. The color of the rectangles represent the grounds on which the assessment of systems biology results are based: mostly on innovation (green), mostly robustness (orange) and both innovation and robustness (yellow). The thickness of the arrows represents the current predominant pathway.

Applying this methodology first requires identifying the building blocks of a research workflow. Building blocks are basically small pieces of a big research program. Some might involve generating biological measurements, others analyzing data. The validity of these measurements can be assured with quality-assessment and best-practice processes that are familiar to industry. The idea behind IMPROVER is to test each key method at crucial junctures of a research workflow by posing challenges designed to see whether or not the process works at the necessary level of accuracy (Box 1 and Fig. 2). The challenges can be internal to a company, or if they are of interest to a broader community that may benefit from its participation, they can be public, similar to DREAM or CASP. For such external challenges, the organizers will need to establish the same sorts of conditions that have made existing programs successful. In particular, for independent researchers to participate, they will need incentives, which could be recognition or co-authorship, monetary incentives (for instance, as used in the Netflix Prize20) or access to high-value data or experimental validation efforts. The workflow model underlying this methodology reflects an engineering mindset more common in industry, where research is aimed at a concrete fixed goal, but not particularly adapted for completely open-ended discovery, as in academia.

Figure 2: Example application of IMPROVER for verification of a plausible research workflow.

IMPROVER could start a trend by which eventually even the academic community would ask for independent verification of its core technologies and methods. Today, independent verification relies mainly on government agencies whose criteria for assessment are not always transparent to the general public.

Finding a robust signature for disease diagnosis is an example of a challenge that might be of interest to the wider biomedical research community, as well as being essential in many industrial research workflows for stratification of populations, early detection of disease and personalized medicine. Pioneering work21 suggested that molecular classification of tumors, and by extension other diseases, could be more accurate than morphological classification. Similarly, success in predicting survival, disease progression and response to drugs could aid in stratifying patients and choosing treatments. For a signature to be robust, it will probably need to be more than just a single biomarker or gene signature. For example, it could be a set of master regulators of a tumor type7 or a combination of clinical data, gene expression data, pathway information and genomic structural variants22, 23. Integrative network markers and network structures and dynamics will increasingly become a primary focus for both detection and treatment of complex diseases. In this type of challenge, participants would be assessed on their ability to identify disease phenotypes based on gene expression data and, possibly, clinical information. The training set would perhaps not be given explicitly, with participants needing to rely on vast publicly available gene expression databases, such as the National Center for Biotechnology Information’s Gene Expression Omnibus (

A second challenge that may find many industrial applications would be the exploration of the limits of translation of data and conclusions from rodents to humans. The main scientific question here is how accurately observations from in vivo and in vitro rodent models can be translated to a human context. Participants would be given proteomics or expression profile changes in cultured cells, from a particular tissue from both rodents and humans, in response to an agent such as a drug. The challenge would be to predict the response in human cells to a new agent, on the basis of expression changes in rodent cells. One essential element in designing such a challenge is choosing the agent and the cell lines to give a diversity of perturbations that sufficiently cover the maximum number of biological processes. In addition to finding useful methodologies, the goal of this challenge would be to provide insight and understanding regarding the range of applicability of the translation concept.

These challenges, as well as many others that can be envisioned, address the core interests of many industries. These industries could benefit from the power of crowds to find strategies to address their problems. In turn, the community will have the opportunity to try their methods on new data, participate in studies that address grand challenges in biomedical research and close the sometimes open loop between academia and industry.


The abundance of high-throughput and quantitative data in systems biology creates both opportunities and difficulties. In particular, although thousands of predictions may be generated, most are often left unverified. How worthwhile can these predictions be without methods for high-throughput verification? Several avenues to verification of systems biology results exist or are emerging in both academia and industry (Fig. 1).

In this article, we have proposed that systems biology results can be verified using community-wide challenges that test specific methodologies using the power of crowds. Assessing the results of these challenges needs to be done under a rigorous statistical framework. It is curious that little has been done in the area of verification of industrial research, especially as statistical quality control has enabled considerable improvements to industrial manufacturing. If challenge-based verification processes, such as IMPROVER, CASP and DREAM, become routine, it is likely that industrial research workflows will see increased efficiency in the generation of applied scientific results and decreased expense per verified result.

Challenge-based verification processes may also help cope with the explosive growth of scientific publications. This growth taxes the peer review system, especially in systems biology where assessments of the robustness of a complex methodology and the sanity of large data sets are often not performed during the peer review process. We argue that challenge-based verification of scientific results should ideally be done before submission to a reviewer. This could provide better scrutiny of results, because blind tests tend to eliminate some of the subjective bias of interpretation of results during peer review.

Finally, we should stress that an overcrowded field of scientific publications and a lack of systematic verification of systems biology predictions, although problematic, are consequences of something fundamentally positive, because they reflect the fact that science is moving at a fast pace. We hope that some of the specific solutions we have outlined will help avoid false steps in our path toward a more predictive, quantitative and mechanistic understanding of biological systems.


  1. The US National Academy of Sciences. A New Biology for the 21st Century (National Academies Press, Washington, DC, 2009).
  2. Zhang, Q. et al. Genome Biol. 9, R93 (2008).
  3. McCarthy, M.I. et al. Nat. Rev. Genet. 9, 356369 (2008).
  4. Chanock, S.J. et al. Nature 447, 655660 (2007).
  5. Miller, M.L. et al. Sci. Signal. 1, ra2 (2008).
  6. Linding, R. et al. Cell 129, 14151426 (2007).
  7. Carro, M.S. et al. Nature 463, 318325 (2010).
  8. Larsen, P.O. & von Ins, M. Scientometrics 84, 575603 (2010).
  9. Wolinsky, H. EMBO Rep. 9, 11711174 (2008).
  10. Harmston, N., Filsell, W. & Stumpf, M.P. Hum. Genomics 5, 1729 (2010).
  11. Hsiao, T.L. et al. Nat. Chem. Biol. 6, 3440 (2010).
  12. Mandavilli, A. Nature 469, 286287 (2011).
  13. Sebastiani, P. et al. Science 333, 404 (2010).
  14. Moult, J. Curr. Opin. Struct. Biol. 15, 285289 (2005).
  15. Stolovitzky, G., Monroe, D. & Califano, A. Ann. NY Acad. Sci. 1115, 122 (2007).
  16. Marbach, D. et al. J. Comput. Biol. 16, 229239 (2009).
  17. Cantone, I. et al. Cell 137, 172181 (2009).
  18. Marbach, D. et al. Proc. Natl. Acad. Sci. USA 107, 62866291 (2010).
  19. Prill, R.J. et al. PLoS ONE 5, e9202 (2010).
  20. Tuzhilin, A. & Koren, Y. ACM Digital Library (ACM Press, New York, 2008).
  21. Golub, T.R. et al. Science 286, 531537 (1999).
  22. Erler, J.T. & Linding, R. J. Pathol. 220, 290296 (2010).
  23. Tamayo, P. et al. J. Clin. Oncol. 29, 14151423 (2011).

Download references


This paper was the result of vibrant discussions during the recent symposium “Critical Assessment of Systems Biology: Research Verification in the Age of Collaborative-Competition,” which took place in Zurich, Switzerland, on March 23 and 24, 2011. We thank S. Stadler, C. Haettenschwiler and C. Warmer for the organization of the symposium, D. Monroe for a careful proofreading of the manuscript, R. Aebersold, B. Schwikowski, S. Corthesy, K. Kozikis, L. Schilli and all the attendees who are not authors for their contributions as speakers or during the discussion sessions.

Author information


  1. IBM Computational Biology Center, Yorktown Heights, New York, USA.

    • Pablo Meyer,
    • Raquel Norel,
    • J Jeremy Rice,
    • Ajay Royyuru &
    • Gustavo Stolovitzky
  2. National Technical University of Athens, Athens, Greece.

    • Leonidas G Alexopoulos
  3. Philip Morris International R&D, Neuchâtel, Switzerland.

    • Thomas Bonk,
    • Julia Hoeng,
    • Nikolai V Ivanov,
    • Manuel C Peitsch &
    • Katrin Stolle
  4. Center for Computational Biology and Bioinformatics, Department of Biomedical Informatics, Columbia University, New York, USA.

    • Andrea Califano &
    • Dennis Vitkup
  5. Modeling and Simulation, Merck & Co., Rahway, New Jersey, USA.

    • Carolyn R Cho
  6. Center for Advanced Studies, Research and Development in Sardinia (CRS4), Laboratorio di Bioinformatica, Parco tecnologico della Sardegna, Pula, Italy.

    • Alberto de la Fuente
  7. Selventa, Cambridge, Massachusetts, USA.

    • David de Graaf
  8. Duke University, Durham, North Carolina, USA.

    • Alexander J Hartemink
  9. ETH Zurich, Zurich, Switzerland.

    • Heinz Koeppl
  10. Cellular Signal Integration Group (C-SIG), Center for Biological Sequence Analysis (CBS), Department of Systems Biology, Technical University of Denmark (DTU), Copenhagen, Denmark.

    • Rune Linding
  11. Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology (MIT) and Broad Institute of MIT and Harvard, Cambridge, USA.

    • Daniel Marbach
  12. Biobase GmbH, Wolfenbuettel, Germany.

    • Frank Schacherer
  13. IBM Life Sciences Division, Zurich, Switzerland.

    • Joerg Sprengel

Competing financial interests

IBM and PMI authors performed this work under a joint research collaboration funded by PMI.

Corresponding author

Correspondence to:

Benelux Bioinformatics Conference 2013

Benelux Bioinformatics Conference 2013

Welcome to F1000Posters. You can browse by Faculty or Section (subspeciality) listed in the left-hand column, as well as by conference or society.

As well as providing a summary of the work alongside the poster or slide presentation, links to F1000 Faculty Member recommendations and related research articles from the authors are included, if applicable. Please use the information in these posters/slide presentations responsibly, and include the full citation if you wish to reuse any of the material. Please note that most work on this site is preliminary in nature and has not been reviewed.

  • All
  • Posters
  • Slide Presentations

Per page: 20 | 50 | 100 | 120 of 25

  1. From microbial gene essentiality to

    From microbial gene essentiality to novel antimicrobial drug targets

    F Mobegi, SAFT van Hijum , P Burghout, HJ Bootsma, …, J Langereis, P Hermans, MI de Jonge, A ZomerF1000Posters 2015, 6: 234 (poster)

    Summary | Poster [40.68 MB] | Resulting articles

  2. Comparative metagenomics by cross-assembly

    Comparative metagenomics by cross-assembly

    BE DutilhF1000Posters 2015, 6: 209 (poster)

    Summary | Poster [1.09 MB] | Resulting articles

  3. Mining the garbage fragments of

    Mining the garbage fragments of methylation-specific enriched DNA

    K Mensaert, G Trooskens, S Denil, E Schuuring, …, B Wisman, W Van Criekinge, O Thas, T De MeyerF1000Posters 2014, 5: 69 (poster)

    Summary | Poster [13.17 MB]

  4. ORCAE, a wiki style genome

    ORCAE, a wiki style genome annotation portal for eukaryotic genomes

    L Sterck, K Billiau, T Abeel, P Rouzé, Y Van de PeerF1000Posters 2014, 5: 62 (poster)

    Summary | Poster [1.42 MB] | Resulting articles

  5. Data integration and stewardship centre

    Data integration and stewardship centre – tackling the big data challenge in life science research

    J Boiten, J Bouwman, B van Breukelen, L Eijssen, …, M Roos, G Sanchez Perez, R van Schaik, M SwertzF1000Posters 2014, 5: 47 (poster)

    Summary | Poster [1.11 MB]

  6. dbXP: investigating the future

    dbXP: investigating the future of integrative bioinformatics research infrastructures in Europe

    L Eijssen, J Bouwman, A Bohler, N Nunes, …, P de Groot, M Jaillard, B van Ommen, C EveloF1000Posters 2014, 5: 45 (poster)

    Summary | Poster [1.67 MB]

  7. Network deregulation analysis in complex

    Network deregulation analysis in complex diseases via the pairwise elastic net

    N Vlassis, E GlaabF1000Posters 2014, 5: 41 (poster)

    Summary | Poster [3.32 MB]

  8. Comparing fragmentation spectra from two

    Comparing fragmentation spectra from two parasitic worm species to discover unique peptides

    & Yılmaz, B Victor, N Hulstaert, G Gonnelli, P Dorny, M Palmblad, L MartensF1000Posters 2014, 5: 39 (poster)

    Summary | Poster [1.77 MB]

  9. Bellerophon: a hybrid method

    Bellerophon: a hybrid method for detecting interchromosomal rearrangements at base pair resolution using next-generation sequencing data

    M Hayes, J LiF1000Posters 2014, 5: 32 (poster)

    Summary | Poster [1.24 MB] | Resulting articles

  10. Comparative analysis of biome-specific microbial

    Comparative analysis of biome-specific microbial association networks

    K Faust, J RaesF1000Posters 2014, 5: 31 (poster)

    Summary | Poster [6.17 MB]

  11. Bioinformatics and systems biology Masters

    Bioinformatics and systems biology Masters: bridging the gap between heterogeneous student backgrounds

    S Abeln, D Molenaar, KA Feenstra, HJ Hoefsloot, B Teusink, J HeringaF1000Posters 2014, 5: 28 (poster)

    Summary | Poster [8.21 MB] | Resulting articles

  12. Identifying losses and expansions of

    Identifying losses and expansions of selected gene families in incomplete genomic datasets

    A Di Franco, M Hanikenne, D BaurainF1000Posters 2014, 5: 19 (poster)

    Summary | Poster [3.15 MB]

  13. Comparison of methods for pattern

    Comparison of methods for pattern recognition in toxicogenomics time series

    D Hendrickx, D Jennen, J Briede, R Cavill, T de Kok, J KleinjansF1000Posters 2014, 5: 14 (slide presentation)

    Summary | Slide Presentation [3.38 MB]

  14. Convert your favourite protein modeling

    Convert your favourite protein modeling program into a mutation predictor: “MODICT”

    I Tanyalcin, D Coomans, K Stouffs, W Lissens, AC JansenF1000Posters 2014, 5: 2 (poster)

    Summary | Poster [3.64 MB]

  15. Biomedical text mining for disease-gene

    Biomedical text mining for disease-gene discovery

    S ElShal, J Davis, Y MoreauF1000Posters 2013, 4: 1516 (poster)

    Summary | Poster [1.50 MB]

  16. Predicting tryptic cleavage from proteomics

    Predicting tryptic cleavage from proteomics data using decision tree ensembles

    T Fannes, E Vandermarliere, L Schietgat, S Degroeve, K De Grave, L Martens, J RamonF1000Posters 2013, 4: 1458 (poster)

    Summary | Poster [1.83 MB] | Resulting articles

  17. Delivering computational biology and bioinformatics

    Delivering computational biology and bioinformatics tools on PRACE/HPC Systems, using modules and EasyBuild

    F Georgatos, N Christian, K Hoste, C Laczny, …, R Schneider, G Tsouloupas, J Timmerman, VJ PromponasF1000Posters 2013, 4: 1457 (poster)

    Summary | Poster [2.32 MB]

  18. CAPRI: The diverse challenges

    CAPRI: The diverse challenges of computational protein-protein docking

    M LensinkF1000Posters 2013, 4: 1456 (poster)

    Summary | Poster [2.09 MB] | Resulting articles

  19. CellMissy: a tool for

    CellMissy: a tool for management, storage, dissemination and analysis of cell migration data

    P Masuzzo, N Hulstaert, C Ampe, M Van Troys, L MartensF1000Posters 2013, 4: 1455 (poster)

    Summary | Poster [1.81 MB] | Resulting articles

  20. A miRNA expression based diagnostic

    A miRNA expression based diagnostic tool for breast cancer using random forests

    S Wenric, P Freres, C Josse, V Bours, G JerusalemF1000Posters 2013, 4: 1454 (poster)

    Summary | Poster [414.55 KB]

Systems biology in drug discovery


Nature Biotechnology 22, 1253 – 1259 (2004)
Published online: 6 October 2004 | doi:10.1038/nbt1017

Systems biology in drug discovery

Eugene C Butcher1,2, Ellen L Berg3 & Eric J Kunkel3

The hope of the rapid translation of ‘genes to drugs’ has foundered on the reality that disease biology is complex, and that drug development must be driven by insights into biological responses. Systems biology aims to describe and to understand the operation of complex biological systems and ultimately to develop predictive models of human disease. Although meaningful molecular level models of human cell and tissue function are a distant goal, systems biology efforts are already influencing drug discovery. Large-scale gene, protein and metabolite measurements (‘omics’) dramatically accelerate hypothesis generation and testing in disease models. Computer simulations integrating knowledge of organ and system-level responses help prioritize targets and design clinical trials. Automation of complex primary human cell–based assay systems designed to capture emergent properties can now integrate a broad range of disease-relevant human biology into the drug discovery process, informing target and compound validation, lead optimization, and clinical indication selection. These systems biology approaches promise to improve decision making in pharmaceutical development.

Drug discovery and systems biology began together: in traditional or ‘folk’ medicine, herbal drugs were discovered through direct if anecdotal observations in people with diseases, the most relevant complex biological systems there are. With the advent of chemistry in the late 1800s and early 1900s, derivatives of natural products and subsequently novel synthetic chemicals made their way into drug discovery pipelines; but screening was still in the setting of complex disease biology, with animals replacing patients as the primary ‘guinea pigs.’ Most of today’s pharmaceuticals (at least on a ‘doses per patient-year’ basis) derive directly or indirectly from such early ‘systems biology’-based drug discovery. In the interest of speed and the perceived advantages of mechanistic insight, however, animal models were successively replaced with tissue-level screens (e.g., vascular or tracheal muscle tone), simple cell-based pathway screens (proliferation, cytokine production) and finally with today’s ultra-high-throughput screens capable of interrogating individual molecular targets with hundreds of thousands of compounds a day.

Today’s ‘win-by-numbers’ approach is very powerful when applied to known, validated targets (which often means targets of historical drugs), but has led to disappointingly few new drugs when applied to less well biologically understood (e.g., genome-derived) targets. The desire to mine the wealth of the genome has come face to face with the realization that knowing a target is not the same as knowing what the target does, let alone knowing the effects of a chemical inhibitor in diverse disease settings. In fact, despite the enormous investment in genomics and screening technologies over the past 20 years, the cost of new drug discovery continues to rise while approval rates fall1. The primary selection of drug targets and candidates has become divorced from the complexity of disease physiology. Reenter systems biology, in modern guise.

The goal of modern systems biology is to understand physiology and disease from the level of molecular pathways, regulatory networks, cells, tissues, organs and ultimately the whole organism. As currently employed, the term ‘systems biology’ encompasses many different approaches and models for probing and understanding biological complexity, and studies of many organisms from bacteria to man. Much of the academic focus is on developing fundamental computational and informatics tools required to integrate large amounts of reductionist data (global gene expression, proteomic and metabolomic data) into models of regulatory networks and cell behavior. Because biological complexity is an exponential function of the number of system components and the interactions between them, and escalates at each additional level of organization (Fig. 1), such efforts are currently limited to simple organisms or to specific minimal pathways (and generally in very specific cell and environmental contexts) in higher organisms2, 3, 4. Even if our ability to measure molecules and their functional states and interactions were adequate to the task, computational limitations alone would prohibit our understanding of cell and tissue behavior from the molecular level. Thus, methodologies that filter information for relevance, such as biological context and experimental knowledge of cellular and higher level system responses, will be critical for successful understanding of different levels of organization in systems biology research.

Figure 1: Approaches to systems biology in the pharmaceutical industry.

Figure 1 : Approaches to systems biology in the pharmaceutical industry.

Omics (the bottom-up approach) focuses on the identification and global measurement of molecular components. Modeling (the top-down approach) attempts to form integrative (across scales) models of human physiology and disease, although with current technologies, such modeling focuses on relatively specific questions at particular scales, e.g., at the pathway or organ levels. An intermediate approach, with the potential to bridge the two, is to generate profiling data (e.g., biologically multiplexed activity profiling or BioMAP data) from high-throughput assays designed to incorporate biological complexity at multiple levels: multiple interacting active pathways, multiple intercommunicating cell types and multiple different environments. Such a complex cell systems approach addresses the need for data on cell responses to physiological stimuli and to pharmaceutical agents as an aid to modelers, and also as a practical approach to systems biology at the cell signaling network and cell-cell interaction scales.

Full size image (41 KB)

This review focuses on recent advances in the practical applica- tions of systems biology to drug discovery. Three principal approaches are discussed (Fig. 1): informatic integration of ‘omics’ data sets (a bottom-up approach); computer modeling of disease or organ system physiology from cell and organ response level information available in the literature (a top-down approach to target selection, clinical indication and clinical trial design); and the use of complex human cell systems themselves to interpret and predict the biological activities of drugs and gene targets (a direct experimental approach to cataloguing complex disease-relevant biological responses). These complementary approaches, which must ultimately be integrated in the quest for a hierarchical, molecule-to-systems level understanding of human disease, are already having an impact on the drug discovery process.

Omics: large-scale data generation and mining

It could be argued that a full understanding of the responses of a system requires knowledge of all of its component parts. Omics approaches to systems biology focus on the building blocks of complex systems (genes, proteins and metabolites). These approaches have been adopted wholeheartedly by the drug industry to complement traditional approaches to target identification and validation, for generating hypotheses and for experimental analysis in traditional hypothesis-based methods. For example, omics can be used to ask what genes, proteins or phosphorylation states of proteins are expressed or upregulated in a disease process, leading to the testable hypothesis that the regulated species are important to disease induction or progression (Table 1). Integration of genomics, proteomics and metabolite measurements within the context of controlled gene or drug perturbations of complex cell and animal models (and in the context of clinical data) is the basis of systems biology efforts at a number of drug companies, including Eli Lilly (Indianapolis, IN, USA), where they are accelerating the study of complex physiological processes such as bone metabolism5.

Omics classification of disease states can lead to more efficient targeting or even personalization of therapies by identifying the specific molecular pathways active in particular disease states and in individual patients6. Another valuable application of the technology is the identification of surrogate markers for disease detection, or for monitoring of therapies7, 8. Although omics approaches thus accelerate development of mechanistic hypotheses and clinical insights, a systems-level understanding does not automatically emerge.

Significant efforts are underway to understand key pathway and organism-level responses by relying on the emergent properties of global gene and protein expression data (that is, the properties of the system as a whole that cannot be predicted from the parts). In relatively simple organisms, studies incorporating analysis of time-series genome-wide mRNA expression data, large-scale perturbation analyses and identification of coregulated components, and protein-protein interaction studies have led to new insights into pathway functions and signaling network organization in specific biological processes, such as cell proliferation or the response to metabolic perturbation9, 10, 11, 12. Although the added levels of complexity in human disease, as well as economic and computational limitations severely limit the utility of omics as a stand-alone approach for systems-level understanding, omics technologies will be important for constructing the ‘scaffolds’ that help define and limit the possible pathways and connectivities in top-down models of cell-signaling networks3.

Computer models: from pathways to disease physiology

The goal of modeling in systems biology is to provide a framework for hypothesis generation and prediction based on in silico simulation of human disease biology across the multiple distance and time scales of an organism (from molecular reactions to organism homeostasis and disease responses)2, 4. We are certainly a long way from achieving any general, integrated model of human cell behavior, let alone human organismal biology, but real progress is being made in developing and testing computational and experimental methods for in silico systems biology at different scales (Table 2). Moreover, we do not need a global synthesis for modeling and simulation to be useful for basic biological insights and drug development; highly focused, problem-directed models are already having an impact on target validation and clinical development decisions (Table 1).

Mathematical and more recently computational models have a rich history in human physiology4, 13, 14, 15. Modeling efforts useful for drug discovery and development must simulate responses at the scale of cell and tissue or organ complexity (that is, the scale at which disease manifests itself). At the same time, a sufficient level of detail must be included such that intervention points accessible to drug discovery are available and can be modulated in silico to predict an organ level readout. Thus, a model simulation of heart contractility must incorporate the connection between Na+/Ca2+ exchangers and contractility to be useful to predict the effect of drugs targeting these channels14. Difficulty arises in developing models that can effectively integrate the molecular, cellular and organ levels. In addition to pure computational issues, limitations in bottom-up knowledge and in our understanding of pathway and network architecture and interactions, as well as a general lack of standardized knowledge of cell- and tissue-level responses to bioactive stimuli that could be used to validate models (see below) are fundamental, long-term problems that have to be addressed before models integrating complexity at multiple scales can be considered.

A practical approach to address the computational issues is to put in place an organ-level framework and add increasing complexity in a modular format. For example, one can begin with models of inflammation that examine cell-cell communica- tion through cytokine networks and then start replacing the ‘black box’ cells with simulations of cell behavior (Table 2) modeled from network modules (e.g., models of cytoskeleton motility, proliferative or cytokine responses), ultimately replacing ‘black box’ pathway modules with bottom-up approaches4.

Entelos (Foster City, CA, USA) has developed complex simulations of disease physiology using a framework of deterministic differential equations based on empirical data in humans16 (Table 2). In these models, internal signaling pathways are not modeled explicitly; cells or even tissues are represented as black boxes that respond to inputs by giving specified outputs that vary with time. Using such an organ level ‘disease physiology’ framework, Stokes et al.17 have developed a computational model of chronic asthma that incorporates interactions among cells and some of the complexity of their responses to each other and their environment. Model parameters can be modified to reach a particular steady state reference point, for example, the state of chronic asthma (including chronic eosinophilic inflammation, chronic airway obstruction, airway hyperresponsiveness and elevated IgE levels) or the state of exercise-induced airway obstruction. Simulated ‘asthmatics’ respond as expected to various drugs, including beta2 agonists, glucocorticoids and leukotriene anta- gonists17. Moreover, by simulating an antibody-dependent reduction in interleukin (IL)-5 protein (a driver of eosinophilia during asthma), this model predicts a decrease in airway eosinophilia but little therapeutic improvement in airway conductance18, predictions that are consistent with the results of a clinical trial testing a humanized anti-IL-5 antibody in asthmatics19.

Similar cell- and organ-scale models of glucose metabolism and homeostasis have a long history, evolving from simple relationships between glucose and insulin levels in circulation20 to more complex models involving integrated multiple tissue responses and their involvement in glucose metabolism21. A presentation of Entelos’ diabetes ‘PhysioLab’ at a recent conference (In Silico Biology Conference, San Diego, California, USA, June 2–3, 2002; C. Wallwork, personal communication) described how such a computational model has been used in the design of phase 1 trials for an unspecified drug treatment for type 2 diabetes. The results suggested that computational modeling enabled the experimental dosing arms and the number of patients required for the trial to be decreased, thus potentially reducing costs and increasing the probability of clinical success.

More detailed understanding of the systems behavior of intercellular signaling pathways, such as the identification of key nodes or regulatory points in networks or better understanding of crosstalk between pathways, can also help predict drug target effects and their translation to organ and organism level physiology. To this end, a very large number (more than can be fairly cited) of efforts have been focused at the scale of signaling pathways within cells (e.g., see Table 2). These models benefit from the large amount of literature data and the promise that omics efforts can provide constraints on the pathways (see previous ‘Omics: large-scale data generation and mining’ section). As for cell- and organ-level models, simulations of mammalian signaling networks usually rely on time-dependent differential equations and model the pathway in isolation and under very specific (and simple) conditions3, 22. A next level of detail that enhances the utility of such pathway models is the crosstalk between pathways. Bhalla et al.23 modeled signaling modules and found that combinations of simple modules lead to nonlinear responses or ‘emergent properties’ of the system. These nonobvious results based on pathway nonlinearity hold promise for identification and prioritization of intervention points within signaling networks.

Interestingly, the architecture of signaling pathways displays significant conservation during evolution, an insight that is being used to help define and understand mammalian cell signaling pathways based on homology with well-defined pathways in lower organisms, and between evolutionarily duplicated pathways in man (e.g., the PathBlast tool24). However, although pathway homologies may suggest conservation of key points for chemical intervention in signaling, divergence of pathway functions and regulatory interactions are the norm so that ultimately there can be no substitute for studies in complex human systems.

No matter how successful current attempts at predictive modeling turn out to be, such models raise the challenge of experimental validation (theoretically, only possible with human data) and the cycles of improvement inherent to the modeling effort3 (Fig. 2). From a drug discovery point of view, any of the successes to date could be considered anecdotal and until a given model shows a track record of successful prediction in humans, it will be risky to rely on it for development decisions. For the foreseeable future, modeling predictions will likely be one of many inputs into the decision making process in the pharmaceutical industry.

Figure 2: Development cycle of integrated in silico models using component level and system response data.

Figure 2 : Development cycle of integrated in silico models using component level and system response data.

Integrated models of disease can be generated using data from the literature as well as protein expression and interaction data sets, potentially informed by predictions of functional network organization and cell responses based ideally on complex human cell-based assays (e.g., see Fig. 3). Models are iteratively tested and improved by comparison of predictions with systems (cell, tissue or organism) level responses measured experimentally through traditional assays or from profiles generated from complex, activated human cell mixtures under a set of different environmental conditions. Component level ‘omics’ data can provide a scaffold, limiting the range of possible models at the molecular level.

Full size image (33 KB)

Using complex cell systems to assay and model biology

Pathway modeling as yet remains too disconnected from systemic disease biology to have a significant impact on drug discovery. Top-down modeling at the cell-to-organ and organism scale shows promise, but is extremely dependent on contextual cell response data. Moreover, to bridge the gap between omics and modeling, we need to collect a different type of cell biology data—data that incorporate the complexity and emergent properties of cell regulatory systems and yet ideally are reproducible and amenable to storing in databases, sharing and quantitative analysis.

At one extreme, responses of human tissues themselves can be probed ex vivo, an approach that, even with limitations in terms of availability and reproducibility of human tissues, has proven useful for validating selected compounds and targets25. Highly reproducible or even automated approaches to cell biology, however, seem more likely to contribute to the large-scale compound and gene function analyses desired by industry and required as a basis for modeling efforts. Indeed, high-throughput cell-based screening systems, often relying on reporter assays and cell lines, are being used effectively by many companies to identify components of pathways26, screen for active compounds27 and even to profile drugs based on their effects on pathway or simple stimulus-response readouts28, 29. However, these assays are generally designed to isolate individual pathways and to minimize biological complexity and thus neither take advantage of, nor provide insight into, emergent properties of cell systems. This ‘systematic biology’ focus on simplified pathways is thus to be distinguished from the ‘systems biology’ focus on complexity and emergent properties.

At the same time, some groups are beginning to appreciate the importance of emergent properties in drug development. For instance, researchers at CombinatoRx (Boston, MA, USA) search for novel combination therapies by taking advantage of two stimuli (phorbol myristate acetate, an activator of the protein kinase C cascade, and ionomycin, a stimulator of Ca2+ dependent signaling) that turn on multiple pathways in primary cells to search for pairs of compounds that exhibit antagonism (e.g., to tumor necrosis factor (TNF)–alpha secretion from activated T cells) when combined, but not when used singly28. Elsewhere, Rosetta Inpharmatics (Seattle, WA, USA) has measured thousands of output genes in yeast, using the gene response profiles resulting from genetic or chemical (drug) perturbations to determine how genes that effect growth fit into pathways12 and to reveal the mechanism(s) of action of compounds29. These experimental approaches have begun to harness the power of systems biology, but the systems studied remain intentionally simple, focusing on only a few inputs or outputs (CombinatoRx) or a single physiologic state in a model organism (Rosetta). Complexity is a byproduct, not a product of design of these approaches.

Complexity and emergent properties in biology derive from several features: first, complex inputs that stimulate multiple pathways; second, multiple outputs that are integrated network responses to the inputs; third, interactions between multiple cell types; and fourth, multiple contexts and environments for each cell type or combination of cell types. The drug discovery industry has invested billions of dollars in technologies to evaluate outputs, but to incorporate disease- relevant complexity into drug discovery, intentional efforts must also be made to study cells in combination to mimic cell-cell interactions critical to in vivo regulatory networks and to assay cells in different complex environmental contexts (in which different combinations of pathways are activated). Parallel context or ‘multisystem’ analysis is important because proteins and pathways have evolved to integrate inputs and outputs from multiple contexts, so that to understand the effects of a drug (or target), data must be derived from cell responses in multiple environments.

Our group at BioSeek (Burlingame, CA, USA) has developed human cell–based assays that intentionally incorporate complexity at multiple levels, using parallel interrogation of standardized cell ‘systems’ (cells plus environments) designed to mimic physiological complexity by including one or more primary cell types as well as combinations of cells and active pathways (Fig. 3a). Cell systems are engineered to embody disease-relevant responses for biological function analyses, modeling and drug discovery. For example, a panel of just four cell systems (combinations of endothelial cells and blood mononuclear cells in four different complex inflammatory environments) was found to embody complex biology reflecting distinctive contributions of many pharmacologic targets relevant to inflammation30, 31. Profiles made up of as few as 24–40 protein readouts (including cytokines, chemokines, adhesion receptors and other inflammatory mediators) used to assess the responses of these complex systems are able to discriminate and classify most of the pathways and mechanisms effected by known modulators of inflammation, as well as a surprising array of other drugs and pathways tested30, 31 (Fig. 3b). Importantly, the profiles generated from these complex, activated cell mixtures are reproducible, allowing archiving in databases and automated searching and analyses by profile similarity or other characteristics (e.g., effects on key disease-relevant parameters).

Figure 3: Leveraging complexity in cell systems biology for drug discovery: biologically multiplexed activity profiling (BioMAP) applied to gene function, network architecture and drug activity relationships.

Figure 3 : Leveraging complexity in cell systems biology for drug discovery: biologically multiplexed activity profiling (BioMAP) applied to gene function, network architecture and drug activity relationships.

(a) Primary cells (e.g., endothelial cells and/or blood lymphocytes) are combined and exposed to stimuli (e.g., cytokines, growth factors or chemical mediators) in combinations relevant to the disease biology of interest (e.g., inflammation). Readouts used to measure system responses can be proteins, activated states of proteins, genes or other cellular constituents or properties selected for disease relevance (e.g., cytokines, growth factors, adhesion receptors, which are the ultimate mediators of cellular communication and function in disease) and for responsiveness to environmental and pharmacologic inputs (information content). Perturbations to the parallel systems define the biological activity profiles of interrogating drugs or genes. The combination of multiple cell types and multiple pathways activated elicits complex network regulation and emergent properties that enhance the sensitivity and ability of the systems to discriminate unique drug and gene effects. (b) Several complex human cell ‘systems’ (cells or cell combinations in disease-relevant environments) are interrogated with genes (via overexpression or siRNA) or drugs of interest and the effects on the levels of selected protein readouts are determined, generating a profile that serves as a multisystem signature of the function of the test agent. Statistical measures of profile similarity (i.e., do particular agents induce the same multisystem response?) can be used to cluster genes or drugs by function, and to generate graphical representations of their functional relationships with each other28, 29. As examples, clustering of profiles induced by gene overexpression (bottom left) reveals key pathway relationships (e.g., Ras/MAPK, phosphatidyl inositol 3-kinase (PI3K), interferon-gamma (IFN-gamma), and NF-kappaB-associated clusters) as well as pathway–pathway interactions in signaling networks controlling endothelial cell responses in the context of different inflammatory cytokines32. Clustering of drug-induced profiles from inflammatory model systems (comprising activated combinations of endothelial cells and peripheral blood mononuclear cells) detects and discriminates the activities of most known modulators of inflammation as well as a surprising array of other drug targets and pathways, including for example glucocorticoids, cytokine antagonists, and inhibitors of HMG-CoA reductase, calcineurin, inosine monophosphate dehydrogenase, phophodiesterases, nuclear hormone receptors, phosphatidyl inositol 3 kinases, heat shock protein 90, casein kinase 2, janus-activated kinases, and p38 MAPK among others (illustrated in upper right; drugs are colored by mechanistic class)28, 30. Drugs specific for a common target (circled in black) or for targets in a common pathway (circled in red) cluster together, but compounds having different off target activities are readily detected (e.g., the profiles of three JAK inhibitors with known secondary activities; asterisks). Clustering of activity profiles from lead chemical series can define compound-specific structure-activity relationships for lead optimization (lower right; different analogs are color coded; circle size reflects concentration). In the example shown, BioMAP clustering defines two functional activity classes among structurally related p38 MAPK inhibitors.

Full size image (116 KB)

This approach, termed biologically multiplexed activity profiling (BioMAP), has been successfully employed in model studies suggesting its applicability to several stages of the drug discovery process (Table 1). For target identification and validation, informatics approaches based on the similarity of database-stored multisystem profiles have been shown to rapidly associate gene or drug activities with known (or novel) pathways, and to predict functional pathways and network interactions32 (Fig. 3b). Multisystem profiles induced by gene overexpression in endothelial cells in four different cytokine environments (in essence, multisystem signatures of gene function) automatically clustered into groups that reflected known pathway relationships with surprising fidelity32. Moreover, graphical representation of function similarity relationships (Fig. 3b, lower left panel) point to unique roles for two gene products, MyD88 and IRAK, in mediating interactions between the nuclear factor (NF)-kappaB and Ras/mitogen-activated protein kinase (MAPK) pathways. MyD88, previously known to signal via NF-kappaB, was subsequently confirmed in biochemical studies to trigger the MAPK pathway as well, which in turn inhibited NF-kappaB activation in a negative feedback loop activated by IL-1beta but not TNF-alpha32. Clustering multisystem response profiles, in which the systems are designed to capture emergent properties, can thus help define the functional architecture of signaling networks, information important (in conjunction with conventional data sets) for designing and testing computational models.

For compound characterization, the limited data sets, automation and broad functional coverage may make profiles generated from complex, activated cell mixtures an efficient way to screen focused libraries for effects on complex, disease-relevant biology and, more importantly, to prioritize hits from conventional high-throughput screening. In model studies, we have used profiles in four systems to classify hits and leads by their biological activities, to identify compounds with off-target activities (which may be desirable or undesirable), to distinguish ‘well-behaved’ lead series displaying consistent biological responses and to monitor structure-function relationships as a guide to lead optimization31 (Fig. 3b, lower right panel).

An additional strength of the multisystem approach is that parallel systems can be designed to capture a wide range of elicited (disease-relevant) biological and pathway activities; thus, the effects of drugs or genes can be assessed simultaneously for complex biological responses relevant to many different diseases and can be used to screen for novel therapeutic indications. (This contrasts with most modeling efforts and even animal or clinical trials, which are typically designed to address a single disease target.) Complex cell systems models of inflammation (Fig. 3), for example, readily detect the activities of 3-hydroxy-3-methyl-glutaryl-CoA (HMG-CoA) reductase inhibitors (e.g., statins) on inflammatory signaling30. This prompts the interesting question of whether inclusion of complex biological systems analyses in the development of statins could have accelerated the discovery of their potent role in autoimmune and inflammatory disorders33?

Omics could and certainly should be applied to cell systems designed to incorporate meaningful biological complexity. However, as indicated by studies by our group, highly informative functional signatures for gene and drug effects can be generated using very small numbers (tens) of biologically significant parameters, when these are assayed within several different complex cell and environment combinations. This appears to bear out the prediction that biological complexity encodes useful information about drug and protein function, and suggests that it can be leveraged for ‘smarter, faster, cheaper’ industrial-scale functional profiling.

From the practical near-term perspective, these approaches present an opportunity to integrate systems biology more efficiently and cost effectively throughout the drug discovery process. From a fundamental perspective, databases of such quantitative human cell biological responses to drugs and gene alterations, under standardized and reproducible conditions designed to embody disease-relevant complexity and capture emergent properties, are likely to be useful in predicting the functional architecture of complex regulatory networks and will provide an essential bridge for integration of omics data into in silico models of cell systems behavior, as well as a testing ground for these models as they develop (Fig. 2).


During drug development, million-dollar decisions are (and must be) routinely made using flawed criteria based on incomplete biological knowledge: for example, targets are prioritized because they are upregulated at the gene level in disease (even though many of our best historical targets are not); compounds are selected to be biochemically specific (though many of our most effective drugs are not); animal models are considered essential (although these are known to be poor predictors of clinical success). Better biology, preferably more relevant to human disease and capable of being integrated into the drug discovery process, is sorely needed to inform decision-making. Although the systems biology approaches outlined here are in their infancy, they are already contributing to meaningful drug development decisions by accelerating hypothesis-driven biology, by modeling specific physiologic problems in target validation or clinical physiology and by providing rapid characterization and interpretation of disease-relevant cell and cell system level responses.

Although these approaches are currently being pursued by separate laboratories and companies, it is clear that they are complementary and that ultimately they must be integrated for systems biology to achieve its potential. An analogy can be drawn to the genome project, in which multiple individual efforts contributed technology and informatics approaches that eventually enabled a concerted ‘big science’ push to sequence the genome. However, whereas the linear output of the genome project was easily standardized and archived, the multidimensional and multivariate nature of biological function and cell biology studies presents an extraordinary informatics and even social challenge, since standardization of experimental design and data are essential before a ‘big science’ approach to systems biology can be envisioned. Markup languages for gene expression data, emerging ontologies for sharing and integrating different kinds of omic and conventional biological data4 and the introduction of standardi- zed high-throughput systems biology and associated informatics approaches represent important first steps on this path.



Writing of this review was supported in part by SBIR grants (R44 AI048255 and R43 AI049048) to BioSeek, Inc., and by NIH grants to E.C.B. The authors thank Evangelos Hytopoulos and Ivan Plavec for thoughtful criticism and input.

Competing interests statement:

The authors declare  competing financial interests.



  1. DiMasi, J.A., Hansen, R.W. & Grabowski, H.G. The price of innovation: new estimates of drug development costs. J. Health Econ. 22, 151–185 (2003). | Article | PubMed | ISI |
  2. Ideker, T., Galitski, T. & Hood, L. A new approach to decoding life: systems biology. Annu. Rev. Genomics Hum. Genet. 2, 343–372 (2001). | Article | PubMed | ISI | ChemPort |
  3. Ideker, T. & Lauffenburger, D. Building with a scaffold: emerging strategies for high- to low-level cellular modeling. Trends Biotechnol. 21, 255–262 (2003). | Article | PubMed | ISI | ChemPort |
  4. Hunter, P.J. & Borg, T.K. Integration from proteins to organs: the Physiome Project. Nat. Rev. Mol. Cell Biol. 4, 237–243 (2003). | Article | PubMed | ISI | ChemPort |
  5. Kulkarni, N.H. et al. Gene expression profiles classify different classes of bone therapies: PTH, Alendronate and SERMs, Poster 307, 31st European Symposium on Calicified Tissue, June 5, 2004, Nice, France;
  6. Weston, A.D. & Hood, L. Systems biology, proteomics, and the future of health care: toward predictive, preventative, and personalized medicine. J. Proteome. Res. 3, 179–196 (2004). | Article | PubMed | ISI | ChemPort |
  7. Clish, C.B. et al. Integrative biological analysis of the APOE*3-leiden transgenic mouse. Omics 8, 3–13 (2004). | Article | PubMed | ISI | ChemPort |
  8. Kantor, A.B. et al. Biomarker discovery by comprehensive phenotyping for autoimmune diseases. Clin. Immunol. 111, 186–195 (2004). | Article | PubMed | ISI | ChemPort |
  9. Davidson, E.H. et al. A genomic regulatory network for development. Science295, 1669–1678 (2002). | Article | PubMed | ISI | ChemPort |
  10. Ideker, T. et al. Integrated genomic and proteomic analyses of a systematically perturbed metabolic network. Science 292, 929–934 (2001). | Article | PubMed | ISI | ChemPort |
  11. Covert, M.W., Knight, E.M., Reed, J.L., Herrgard, M.J. & Palsson, B.O. Integrating high-throughput and computational data elucidates bacterial networks. Nature429, 92–96 (2004). | Article | PubMed | ISI | ChemPort |
  12. Hughes, T.R. et al. Functional discovery via a compendium of expression profiles.Cell 102, 109–126 (2000). | Article | PubMed | ISI | ChemPort |
  13. Crampin, E.J. et al. Computational physiology and the Physiome Project. Exp. Physiol 89, 1–26 (2004). | Article | PubMed | ISI |
  14. Noble, D. Modeling the heart—from genes to cells to the whole organ. Science295, 1678–1682 (2002). | Article | PubMed | ISI | ChemPort |
  15. Bassingthwaighte, J.B. & Vinnakota, K.C. The computational integrated myocyte: a view into the virtual heart. Ann. NY Acad. Sci. 1015, 391–404 (2004). | Article | PubMed |
  16. Musante, C.J., Lewis, A.K. & Hall, K. Small- and large-scale biosimulation applied to drug discovery and development. Drug Discov. Today 7, S192–S196 (2002). | Article | PubMed | ISI | ChemPort |
  17. Stokes, C.L. et al. A computer model of chronic asthma with application to clinical studies: example of treatment of exercise-induced asthma. J. Allergy. Clin. Immunol. 107, 933 (2001).
  18. Lewis, A.K. et al. The roles of cells and mediators in a computer model of chronic asthma. Inter. Arch. Allergy Immunol. 124, 282–286 (2001). | Article | ISI | ChemPort |
  19. Leckie, M.J. et al. Effects of an interleukin-5 blocking monoclonal antibody on eosinophils, airway hyper-responsiveness, and the late asthmatic response.Lancet 356, 2144–2148 (2000). | Article | PubMed | ISI | ChemPort |
  20. Bergman, R.N., Ider, Y.Z., Bowden, C.R. & Cobelli, C. Quantitative estimation of insulin sensitivity. Am. J. Physiol. 236, E667–E677 (1979). | PubMed | ISI | ChemPort |
  21. Kansal, A.R. Modeling approaches to type 2 diabetes. Diabetes Technol. Ther. 6, 39–47 (2004). | Article | PubMed |
  22. Eungdamrong, N.J. & Iyengar, R. Modeling cell signaling networks. Biol. Cell 96, 355–362 (2004). | Article | PubMed | ISI | ChemPort |
  23. Bhalla, U.S. & Iyengar, R. Emergent properties of networks of biological signaling pathways. Science 283, 381–387 (1999). | Article | PubMed | ISI | ChemPort |
  24. Kelley, B.P. et al. PathBLAST: a tool for alignment of protein interaction networks. Nucleic Acids Res. 32, W83–W88 (2004). | Article | PubMed | ISI | ChemPort |
  25. Coleman, R.A., Bowen, W.P., Baines, I.A., Woodrooffe, A.J. & Brown, A.M. Use of human tissue in ADME and safety profiling of development candidates. Drug Discov. Today 6, 1116–1126 (2001). | Article | PubMed | ISI | ChemPort |
  26. Chanda, S.K. et al. Genome-scale functional profiling of the mammalian AP-1 signaling pathway. Proc. Natl. Acad. Sci. USA 100, 12153–12158 (2003). | Article | PubMed | ChemPort |
  27. Haggarty, S.J., Koeller, K.M., Wong, J.C., Butcher, R.A. & Schreiber, S.L.Multidimensional chemical genetic analysis of diversity-oriented synthesis-derived deacetylase inhibitors using cell-based assays. Chem. Biol. 10, 383–396 (2003). | Article | PubMed | ISI | ChemPort |
  28. Borisy, A.A. et al. Systematic discovery of multicomponent therapeutics. Proc. Natl. Acad. Sci. USA 100, 7977–7982 (2003). | Article | PubMed | ChemPort |
  29. Marton, M.J. et al. Drug target validation and identification of secondary drug target effects using DNA microarrays. Nat. Med. 4, 1293–1301 (1998). | Article | PubMed | ISI | ChemPort |
  30. Kunkel, E.J. et al. An integrative biology approach for analysis of drug action in models of human vascular inflammation. FASEB J. 18, 1279–1281 (2004). | PubMed | ChemPort |
  31. Kunkel, E.J. et al. Rapid structure-activity and selectivity analysis of kinase inhibitors by BioMAP analysis in complex human primary cell-based models.Assay Drug Dev. Technol. 2, 431–441 (2004). | Article | PubMed | ISI | ChemPort |
  32. Plavec, I. et al. Method for analyzing signaling networks in complex cellular systems. Proc. Natl. Acad. Sci. USA 101, 1223–1228 (2004). | Article | PubMed | ChemPort |
  33. Mach, F. Statins as novel immunomodulators: from cell to potential clinical benefit. Thromb. Haemost. 90, 607–610 (2003). | PubMed | ISI | ChemPort |
  34. Christopher, R. et al. Data-driven computer simulation of human cancer cell.Ann. NY Acad. Sci. 1020, 132–153 (2004). | Article | PubMed | ChemPort |
  35. Wiley, H.S., Shvartsman, S.Y. & Lauffenburger, D.A. Computational modeling of the EGF-receptor system: a paradigm for systems biology. Trends Cell Biol. 13, 43–50 (2003). | Article | PubMed | ISI | ChemPort |
  36. Schoeberl, B., Eichler-Jonsson, C., Gilles, E.D. & Muller, G. Computational modeling of the dynamics of the MAP kinase cascade activated by surface and internalized EGF receptors. Nat. Biotechnol. 20, 370–375 (2002). | Article | PubMed | ISI |
  37. Eker, S. et al. Pathway logic: symbolic analysis of biological signaling. Pac. Symp. Biocomput. 7, 400–412 (2002).
  38. Cho, K.H., Shin, S.Y., Lee, H.W. & Wolkenhauer, O. Investigations into the analysis and modeling of the TNF alpha-mediated NF-kappa B-signaling pathway.Genome Res. 13, 2413–2422 (2003). | Article | PubMed | ISI | ChemPort |
  39. Hoffmann, A., Levchenko, A., Scott, M.L. & Baltimore, D. The IkappaB-NF-kappaB signaling module: temporal control and selective gene activation.Science 298, 1241–1245 (2002). | Article | PubMed | ISI | ChemPort |
  1. Laboratory of Immunology and Vascular Biology, Department of Pathology (5324), Stanford University School of Medicine, Stanford, California 94305-5324, USA.
  2. The Veterans Affairs Palo Alto Health Care System, Palo Alto, California 94304, USA.
  3. BioSeek Inc., 863-C Mitten Rd., Burlingame, California 94010, USA.

Correspondence to: Eugene C Butcher1,2 e-mail:


These links to content published by NPG are automatically generated.


ORFeomics: correcting the wiggle in worm genes

Nature Genetics News and Views (01 May 2003)

Connecting genes, drugs and diseases

Nature Biotechnology News and Views (01 Nov 2006)


Surrogate Endpoints for the Treatment of Venous Leg Ulcers

Journal of Investigative Dermatology Original Article

See all 2 matches for Research

ICSBBE 2027 : 29th International Conference on Systems Biology and Biomedical Engineering

Venice, Italy
June 22 – 23, 2027

Conference Information

Conference ObjectivesImportant DatesCall for PapersConference CommitteeConference ProgramConference ProceedingsConference Abstracts

Paper Submission

Paper Submission

Conference Registration

Author RegistrationListener RegistrationRegistration FeesSponsorship and ExhibitionVolunteerAuthor InformationPresentation UploadConference Photos

Conference Program

Conference Flyer

Conference Venue

NH Laguna Palace
Viale Ancona, 2 30172
Mestre, Venezia, Italy
Tel: ++39 0418296005
​Fax: ++39 0418296033
For location information on googlemaps.

International Conference Committee

Ahmed Sowedan   Swansea University, UK
Mohammad Al – Amri   University of Surrey, UK
Huseyin Seker   De Montfort University, UK
Yulin Song   Memorial Sloan-kettering Cancer Center, US
Gabriela Alexe   Dana-farber Cancer Institute/harvard Medical School, US
Gwo-Yu Chuang   National Institutes of Health, US
Marco Schoen   Idaho State University, US
Ehsan Kamrani   Harvard Medical School, US
Naga Srinivas Korivi   Louisiana State University, US
Mona Elshinawy   Howard University, US
Lei Yang   Iowa State University, US
Lalit Ponnala   Cornell University, US
Adarsh Ramakumar   Armed Forces Radiobiology Researach Institute, US
Yun Zhang   Pioneer Hi-bred International Inc., US
Jing Hu   Franklin & Marshall College, US
Erich Baker   Baylor University, US
Boojala Vijay B Reddy   Queens College – City University of New York, US
Zhi Wei   New Jersey Institute of Technology, US
Jacqueline Fairley   Emory University, US
Jingwu He   Georgia State University, US
Chandra Sekhar Pedamallu   New England Biolabs, US
Harmesh Kumar   Panjab University, Chandigarh, IN
Sonika Bhatnagar   Nsit, IN
Brajesh Kumar Jha   Institute of Technology, Nirma University, Ahmedabad, IN
Ajitkumar Patil   Shri Bhagubhai Mafatlal Polytechnic, IN
Tsung-Lu Michael Lee   Kun Shan University, TW
Vinod Kumar Yata   Dr B R Ambedkar National Institute of Technology, IN
Min Soo Kim   Keimyung University, KR
Mohd Maroof Siddiqui   Integral University, IN
Meteb Altaf   King Abdulaziz City for Science and Technology (kacst), SA
Boniface Otieno Kwach   Bondo University, KE

ICSBBE 2015 : 17th International Conference on Systems Biology and Biomedical Engineering

Berlin, Germany

September 14 – 15, 2015

International Conference Committee

Ahmed Sowedan   Swansea University, UK
Mohammad Al – Amri   University of Surrey, UK
Huseyin Seker   De Montfort University, UK
Yulin Song   Memorial Sloan-kettering Cancer Center, US
Gabriela Alexe   Dana-farber Cancer Institute/harvard Medical School, US
Gwo-Yu Chuang   National Institutes of Health, US
Marco Schoen   Idaho State University, US
Ehsan Kamrani   Harvard Medical School, US
Naga Srinivas Korivi   Louisiana State University, US
Mona Elshinawy   Howard University, US
Lei Yang   Iowa State University, US
Lalit Ponnala   Cornell University, US
Adarsh Ramakumar   Armed Forces Radiobiology Researach Institute, US
Yun Zhang   Pioneer Hi-bred International Inc., US
Jing Hu   Franklin & Marshall College, US
Erich Baker   Baylor University, US
Boojala Vijay B Reddy   Queens College – City University of New York, US
Zhi Wei   New Jersey Institute of Technology, US
Jacqueline Fairley   Emory University, US
Jingwu He   Georgia State University, US
Chandra Sekhar Pedamallu   New England Biolabs, US
Harmesh Kumar   Panjab University, Chandigarh, IN
Sonika Bhatnagar   Nsit, IN
Brajesh Kumar Jha   Institute of Technology, Nirma University, Ahmedabad, IN
Ajitkumar Patil   Shri Bhagubhai Mafatlal Polytechnic, IN
Tsung-Lu Michael Lee   Kun Shan University, TW
Vinod Kumar Yata   Dr B R Ambedkar National Institute of Technology, IN
Min Soo Kim   Keimyung University, KR
Mohd Maroof Siddiqui   Integral University, IN
Meteb Altaf   King Abdulaziz City for Science and Technology (kacst), SA
Boniface Otieno Kwach   Bondo University, KE