Bibliography of Author: Karp, P.D.

  1. Karp, P.D. and Paley, S.M.. "Integrated access to metabolic and genomic data." J Comput Biol. 3 (1). 1996. pp. 191-212.
    [ .pdf ] [ .ps ] [ PubMed ]

    The EcoCyc system consists of a knowledge base (KB) that describes the genes and intermediary metabolism of Escherichia coli, and a graphical user interface (GUI) for accessing that knowledge. This paper addresses two problems: How can we create a GUI that provides integrated access to metabolic and genomic data? We describe the design and implementation of visual presentations that closely mimic those found in the biology literature, and that offer hypertext navigation among related entities, and multiple views of the same entity. We employ a frame knowledge representation system (FRS) called HyperTHEO to manage the EcoCyc knowledge base. Among the advantages of FRSs are an expressive data model for capturing the complexities of biological information, and schema-evolution capabilities that facilitate the constant schema changes that biological databases tend to undergo. HyperTHEO also includes rule-based inference facilities that are the foundation of expert systems, a constraint language for maintaining data integrity, and a declarative query language. A graphic KB editor and browser allow the EcoCyc developers to interactively inspect and modify this evolving KB.

    Keywords: *Artificial Intelligence ; Computer Communication Networks ; Computer Graphics ; Computers ; *Database Management Systems ; Escherichia coli_*genetics ; Escherichia coli_*metabolism ; *Genome ; Bacterial ; Programming Languages ; Systems Integration ; User-Computer Interface


  2. Karp, P.D., Paley, S.M., and Romero, P.. "The Pathway Tools software." Bioinformatics. vol. 18 Suppl 1. 2002. pp. S225-32.
    [ .pdf ] [ PubMed ] [ WebSite ]

    Motivation: Bioinformatics requires reusable software tools for creating model-organism databases (MODs). Results: The Pathway Tools is a reusable production-quality software environment for creating a type of MOD called a Pathway/Genome Database (PGDB). A PGDB such as EcoCyc (see http://ecocyc.org) integrates our evolving understanding of the genes, proteins, metabolic network, and genetic network of an organism. This paper provides an overview of the four main components of the Pathway Tools: The PathoLogic component supports creation of new PGDBs from the annotated genome of an organism. The Pathway/Genome Navigator provides query, visualization, and Web-publishing services for PGDBs. The Pathway/Genome Editors support interactive updating of PGDBs. The Pathway Tools ontology defines the schema of PGDBs. The Pathway Tools makes use of the Ocelot object database system for data management services for PGDBs. The Pathway Tools has been used to build PGDBs for 13 organisms within SRI and by external users. Availability: The software is freely available to academics and is available for a fee to commercial institutions. Contact ptools-support


  3. Karp, P.D., Riley, M., Paley, S.M., and Pellegrini-Toole, A.. "The MetaCyc Database." Nucleic Acids Res. 30 (1). 2002. pp. 59-61.
    [ .pdf ] [ PubMed ]

    MetaCyc is a metabolic-pathway database that describes 445 pathways and 1115 enzymes occurring in 158 organisms. MetaCyc is a review-level database in that a given entry in MetaCyc often integrates information from multiple literature sources. The pathways in MetaCyc were determined experimentally and are labeled with the species in which they are known to occur based on literature references examined to date. MetaCyc contains extensive commentary and literature citations. Applications of MetaCyc include pathway analysis of genomes, metabolic engineering and biochemistry education. MetaCyc is queried using the Pathway Tools graphical user interface, which provides a wide variety of query operations and visualization tools. MetaCyc is available via the World Wide Web at http://ecocyc.org/ecocyc/metacyc.html, and is available for local installation as a binary program for the PC and the Sun workstation, and as a set of flatfiles. Contact metacyc-info

    Keywords: Comparative Study ; Database Management Systems ; *Databases Protein ; Enzymes_chemistry ; Enzymes_*metabolism ; Genome ; Human ; Information Storage and Retrieval ; Internet ; *Metabolism


  4. Karp, P.D., Riley, M., Paley, S.M., and Pellegrini-Toole, A.. "EcoCyc: an encyclopedia of Escherichia coli genes and metabolism." Nucleic Acids Res. 24 (1). 1996. pp. 32-9.
    [ .pdf ] [ PubMed ]

    The encyclopedia of Escherichia coli genes and metabolism (EcoCyc) is a database that combines information about the genome and the intermediary metabolism of E.coli. It describes 2034 genes, 306 enzymes encoded by these genes, 580 metabolic reactions that occur in E.coli and the organization of these reactions into 100 metabolic pathways. The EcoCyc graphical user interface allows query and exploration of the EcoCyc database using visualization tools such as genomic map browsers and automatic layouts of metabolic pathways. EcoCyc spans the space from sequence to function to allow investigation of an unusually broad range of questions. EcoCyc can be thought of as both an electronic review article, because of its copious references to the primary literature, and as an in silico model of E.coli that can be probed and analyzed through computational means.

    Keywords: Computer Communication Networks ; *Databases Factual ; Enzymes_metabolism ; Escherichia coli_enzymology ; Escherichia coli_*genetics ; Escherichia coli_*metabolism ; *Genome ; Bacterial ; Information Storage and Retrieval ; Software ; User-Computer Interface


  5. Karp, P.D., Riley, M., Paley, S.M., Pellegrini-Toole, A., and Krummenacker, M.. "EcoCyc: Enyclopedia of Escherichia coli Genes and Metabolism." Nucleic Acids Res. 25 (1). 1997. pp. 43-51.
    [ .pdf ] [ PubMed ]

    The Encyclopedia of Genes and Metabolism (EcoCyc) is a database that combines information about the genome and the intermediary metabolism of Escherichia coli. It describes 2970 genes of E.coli, 547 enzymes encoded by these genes, 702 metabolic reactions that occur in E.coli and the organization of these reactions into 107 metabolic pathways. The EcoCyc graphical user interface allows scientists to query and explore the EcoCyc database using visualization tools such as genomic-map browsers and automatic layouts of metabolic pathways. EcoCyc spans the space from sequence to function to allow scientists to investigate an unusually broad range of questions. EcoCyc can be thought of as both an electronic review article because of its copious references to the primary literature, and as an in silicio model of E.coli metabolism that can be probed and analyzed through computational means.

    Keywords: Amino Acid Sequence ; Base Sequence ; *Databases Factual ; Escherichia coli_*genetics ; Escherichia coli_*metabolism ; *Genes Bacterial ; User-Computer Interface


  6. Karp, P.D., Riley, M., Paley, S.M., Pellegrini-Toole, A., and Krummenacker, M.. "EcoCyc: Encyclopedia of Escherichia coli genes and metabolism." Nucleic Acids Res. 26 (1). 1998. pp. 50-3.
    [ .pdf ] [ PubMed ]

    The encyclopedia of Escherichia coli genes and metabolism (EcoCyc) is a database that combines information about the genome and the intermediary metabolism of E.coli. The database describes 3030 genes of E.coli , 695 enzymes encoded by a subset of these genes, 595 metabolic reactions that occur in E.coli, and the organization of these reactions into 123 metabolic pathways. The EcoCyc graphical user interface allows scientists to query and explore the EcoCyc database using visualization tools such as genomic-map browsers and automatic layouts of metabolic pathways. EcoCyc can be thought of as an electronic review article because of its copious references to the primary literature, and as a (qualitative) computational model of E.coli metabolism. EcoCyc is available at URL http://ecocyc.PangeaSystems.com/ecocyc/

    Keywords: Computer Graphics ; *Databases Factual_trends ; Encyclopedias ; Escherichia coli_*genetics ; Escherichia coli_*metabolism ; *Genes Bacterial ; User-Computer Interface


  7. Karp, P.D., Riley, M., Paley, S.M., Pellegrini-Toole, A., and Krummenacker, M.. "Eco Cyc: encyclopedia of Escherichia coli genes and metabolism." Nucleic Acids Res. 27 (1). 1999. pp. 55-8.
    [ .pdf ] [ PubMed ]

    The EcoCyc database describes the genome and gene products of Escherichia coli, its metabolic and signal-transduction pathways, and its tRNAs. The database describes 4391 genes of E.coli, 695 enzymes encoded by a subset of these genes, 904 metabolic reactions that occur in E.coli, and the organization of these reactions into 129 metabolic pathways. The EcoCyc graphical user interface allows scientists to query and explore the EcoCyc database using visualization tools such as genomic-map browsers and automatic layouts of metabolic pathways. EcoCyc has many references to the primary literature, and is a (qualitative) computational model of E. coli metabolism. EcoCyc is available at URL http://ecocyc. PangeaSystems.com/ecocyc/

    Keywords: Classification ; *Databases Factual ; Enzymes_genetics ; Enzymes_metabolism ; Escherichia coli_*genetics ; Escherichia coli_*metabolism ; *Genes Bacterial ; Genome Bacterial ; Information Storage and Retrieval ; Internet ; Signal Transduction ; User-Computer Interface


  8. Karp, P.D., Riley, M., Saier, M., Paulsen, I.T., Paley, S.M., and Pellegrini-Toole, A.. "The EcoCyc and MetaCyc databases." Nucleic Acids Res. 28 (1). 2000. pp. 56-9.
    [ .pdf ] [ PubMed ]

    EcoCyc is an organism-specific Pathway/Genome Database that describes the metabolic and signal-transduction pathways of Escherichia coli, its enzymes, and-a new addition-its transport proteins. MetaCyc is a new metabolic-pathway database that describes pathways and enzymes of many different organisms, with a microbial focus. Both databases are queried using the Pathway Tools graphical user interface, which provides a wide variety of query operations and visualization tools. EcoCyc and MetaCyc are available at http://ecocyc.PangeaSystems.com/ecocyc/

    Keywords: Database Management Systems ; *Databases Factual ; Escherichia coli_genetics ; Genome ; Bacterial


  9. Karp, P.D., Riley, M., Saier, M., Paulsen, I.T., Collado-Vides, J., Paley, S.M., Pellegrini-Toole, A., Bonavides, C., and Gama-Castro, S.. "The EcoCyc Database." Nucleic Acids Res. 30 (1). 2002. pp. 56-8.
    [ .pdf ] [ PubMed ]

    EcoCyc is an organism-specific pathway/genome database that describes the metabolic and signal-transduction pathways of Escherichia coli, its enzymes, its transport proteins and its mechanisms of transcriptional control of gene expression. EcoCyc is queried using the Pathway Tools graphical user interface, which provides a wide variety of query operations and visualization tools. EcoCyc is available at http://ecocyc.org/.

    Keywords: Database Management Systems ; *Databases Genetic ; Escherichia coli_*genetics ; Escherichia coli_*metabolism ; Escherichia coli Proteins_*genetics ; Escherichia coli Proteins_*physiology ; Gene Expression Regulation Bacterial ; *Genome Bacterial ; Information Storage and Retrieval ; Internet ; Protein Transport ; Signal Transduction


  10. Karp, P.D.. "Pathway databases: a case study in computational symbolic theories." Science. 293 (5537). 2001. pp. 2040-4.
    [ .pdf ] [ PubMed ]

    A pathway database (DB) is a DB that describes biochemical pathways, reactions, and enzymes. The EcoCyc pathway DB (see http://ecocyc.org) describes the metabolic, transport, and genetic-regulatory networks of Escherichia coli. EcoCyc is an example of a computational symbolic theory, which is a DB that structures a scientific theory within a formal ontology so that it is available for computational analysis. It is argued that by encoding scientific theories in symbolic form, we open new realms of analysis and understanding for theories that would otherwise be too large and complex for scientists to reason with effectively.

    Keywords: Artificial Intelligence ; *Computational Biology ; Culture Media ; *Databases Factual ; Escherichia coli_enzymology ; Escherichia coli_*genetics ; Escherichia coli_growth and development ; Escherichia coli_*metabolism ; *Genome Bacterial ; Internet ; Software


  11. Yeh, I., Karp, P.D., Noy, N.F., and Altman, R.B.. "Knowledge acquisition, consistency checking and concurrency control for Gene Ontology (GO)." Bioinformatics. 19 (2). 2003. pp. 241-8.
    [ PubMed ] [ WebSite ]

    Motivation: A critical element of the computational infrastructure required for functional genomics is a shared language for communicating biological data and knowledge. The Gene Ontology (GO; http://www.geneontology.org) provides a taxonomy of concepts and their attributes for annotating gene products. As GO increases in size its ongoing construction and maintenance becomes more challenging. In this paper, we assess the applicability of a Knowledge Base Management System (KBMS), Protege-2000, to the maintenance and development of GO. Results: We transferred GO to Protege-2000 in order to evaluate its suitability for GO. The graphical user interface supported browsing and editing of GO. Tools for consistency checking identified minor inconsistencies in GO and opportunities to reduce redundancy in its representation. The Protege Axiom Language proved useful for checking ontological consistency. The PROMPT tool allowed us to track changes to GO. Using Protege-2000, we tested our ability to make changes and extensions to GO to refine the semantics of attributes and classify more concepts. Availability: Gene Ontology in Protege-2000 and the associated code are located at http://smi.stanford.edu/projects/helix/gokbms/. Protege-2000 is available from http://protege.stanford.edu. Contact: russ.altman