0-centered scale: implies that the numerical data of the Omics Viewer data file can contain positive and negative values. The value 0 is considered to be the center of the numerical values provided in the data file.
1-Centered Scale, Omics Viewer
1-centered scale: implies that any negative or zero values in the data file should be skipped. Moreover, the data is centered around the value 1 using a log scale. For example, the value 0.1 is considered to be at the same distance to 1 as the value 10. So, a logarithm of base 10 is applied to the data before the linear coloring mapping is applied.
A term that identifies one end of a single-stranded nucleic acid molecule. The 3' end is that end of the molecule which terminates in a 3' phosphate group. The 3' direction is the direction toward the 3' end. Nucleic acid sequences are written with the 5' end to the left and the 3' end to the right, in reference to the direction of DNA synthesis during replication (from 5' to 3'), RNA synthesis during transcription (from 5' to 3'), and the reading of mRNA sequence (from 5' to 3') during translation. See the Figure at NHGRI. See also:
A term that identifies one end of a single-stranded nucleic acid molecule. The 5' end is that end of the molecule which terminates in a 5' phosphate group. The 5' direction is the direction toward the 5' end. Nucleic acid sequences are written with the 5' end to the left and the 3' end to the right, in reference to the direction of DNA synthesis during replication (from 5' to 3'), RNA synthesis during transcription (from 5' to 3'), and the reading of mRNA sequence (from 5' to 3') during translation. See the Figure at NHGRI. See also:
A detailed sequence of actions to perform to accomplish some task. Technically, an algorithm must reach a result after a finite number of steps, thus ruling out brute force search methods for certain problems, though some might claim that brute force search was also a valid (generic) algorithm. The term is also used loosely for any sequence of actions (which may or may not terminate).
A molecule of the general formula NH2-CHR-COOH, where "R" is one of a number of different side chains. Amino acids are the building blocks of proteins. The sixty-four codons of the genetic code allow the use of twenty different amino acids (the primary amino acids) in the synthesis of proteins. Other nonprimary amino acids occur in proteins by enzymatic modification of amino acids in mature proteins, and as metabolic intermediates. See the Figure at NHGRI. For Figures showing the structure of each of the twenty primary amino acids, see Figure 1 and Figure 2 from "Molecular Biology of the Cell" by Alberts et al. at Access Excellence.
A term that identifies one end of a protein molecule. The amino terminus is that end of the molecule which terminates in a free amino group. See the Figure at NHGRI. See also:
1. n. Database entries that provide supplementary information
about a biological entity, such as annotation of pathways or regulatory sites.
2. v. The analysis process used to create annotations.
See also Sequence Annotation.
American National Standards Institute. ANSI is a private, non-profit organization that administers and coordinates the U.S. voluntary standardization and conformity assessment system. The Institute's mission is to enhance both the global competitiveness of U.S. business and the U.S. quality of life by promoting and facilitating voluntary consensus standards and conformity assessment systems, and safeguarding their integrity. For further information, see the web site for ANSI.
In molecular biology, that strand of a DNA molecule whose sequence is complementary to the strand represented in mRNA.
In molecular biology, an RNA molecule complementary to the strand normally processed into mRNA and translated.
Programmed cell death, that is, the death of cells by a specific sequence of events triggered in the course of normal development (e.g., cells between digits in the limb bud) or as a means of preserving normal function (e.g., in response to viral infection).
Individual primary recombinant clones (hosted in phage, cosmid, YAC, or other vector) that are placed in two-dimensional arrays in microtiter dishes. Each primary clone can be identified by the identity of the plate and the clone location (row and column) on that plate. Information gathered on individual clones from various experimental techniques is entered into a relational database and used to construct physical and geneticlinkage maps simultaneously; clone identifiers serve to interrelate the multilevel maps. See also library.
American Standard Code for Information Interchange. The basis of character sets used in almost all present-day computers.
In general, the qualitative or quantitative analysis of a substance. In MGI, an assay is a type of experiment that is designed to detect the level of gene expression of a particular gene, or to determine the pattern of expression of a gene among different tissue types, anatomical structures, or developmental stages. The assay may detect one or more of the RNAtranscripts of a gene or one or more of its protein products. Assay types in MGI include:
The sequence defined by an interval along a chromosome in the mouse genome assembly. In MGI, assembly sequences are defined as the interval between the start position of the first exon and the end position of the last exon of a gene as annotated by the NCBI or Ensembl gene models. See NCBI Gene Model and Ensembl Gene Model for further information.
American Type Culture Collection. A large collection of microbial stocks, including microbes containing mammalian DNA segments. See the ATCC Home Page for further information.
A property inherent in an entity or associated with that entity for the purpose of managing a database.
BAC/YAC end refers to sequences at the end of foreign DNA inserts in a BAC or YAC. These sequences are a source of STSs to determine the extent of overlap between BACs or YACs and to aid in the alignment of sequence contigs.
Biomolecular Interaction Network Database. A collection of records documenting molecular interactions. The contents of BIND include high-throughput data submissions and hand-curated information gathered from the scientific literature.
This Pathway Tools ontology class consists of reactions in which no covalent modification of the substrates takes place, but in which the net effect is that one substrate noncovalently binds to or unbinds from another molecule via weak bonds (e.g. hydrogen bonds).
BioCyc is a collection of hundreds of Pathway/Genome Databases
at URL BioCyc.org created by the group of Dr. Peter D. Karp at SRI International. BioCyc includes two
very highly curated PGDBs:
MetaCyc, a PGDB containing more than 1,200 experimentally
elucidated metabolic pathways from more than 1,500 organisms, and EcoCyc, a PGDB for Eschericahia coli K-12.
These pathways are primarily involved in biomass conversion and biofuels production.
The application of computer technology to the management of biological information. Specifically, it is the science of developing computer databases and algorithms to facilitate and expedite biological research, particularly in genomics.
Refers to a broad category of biological tasks accomplished via one or more ordered assemblies of molecular functions. Usually there is some temporal aspect to it, although a process event may be essentially instantaneous. It often involves transformation, in the sense that something goes into a process and something different comes out of it. Examples of biological processes included in this category are cell growth and maintenance, signal transduction, pyrimidine metabolism, and cAMPbiosynthesis. In the GO Project vocabularies, Biological Process is a primary class of terms. See the GO Consortium site for further information.
BioPAX is an OWL RDF/XML-based format for exchange of pathway data.
Synthesis of chemical compounds by enzymatic processes in living organisms.
BioVelo is a concise query language that has been developed to query biological databases created with Pathway Tools. BioVelo is based on a simple mathematical concept (list comprehension) and is simpler than the concepts used by SQL (Structured Query Language).
Basic Local Alignment and Searching Tool. A sequence comparison algorithm optimized for speed used to search sequence databases for optimal local alignments to a query sequence. There is a description of the specific algorithm used, and additional information, at NCBI.
Refers to an expression that must evaluate to a value of true or false, named for the British mathematician George Boole. In MGI and other databases, Boolean refers to the kind of logical relationship among search terms. Boolean operators include AND, OR, and NOT. For example, searching for all markers of the type "Gene" on Chromosome 2 is equivalent to identifying the union of the two sets:
all markers of the type "Gene" (Type Gene?=true) AND
all markers on Chromosome 2 (Chromosome 2=true).
In NCBI, the end result of processes used to assemble genomic sequence data, annotate features, and provide a dataset of assembled genomic sequence, RNAs, and proteins. Such processes are complex; NCBI continues to refine them so that regions not fully represented will improve with subsequent builds. See also: assembly sequence.
Cyclic AMP. A form of the nucleotide adenosine monophosphate that serves as a signaling molecule within and between cells.
A term that identifies one end of a protein molecule. The carboxyl terminus is that end of the molecule which terminates in a free carboxyl group. See the Figure at NHGRI. See also:
Refers to subcellular structures, locations, and macromolecular complexes. Some examples are nucleus, telomere and origin recognition complex. In the GO Project vocabularies, Cellular Component is a primary class of terms. See the GO Consortium site for further information.
The Cellular Overview diagram is a representation of all metabolic pathways and reactions, signaling pathways, membrane proteins, and transporters defined for the current organism.
In mammalian genetics, the primary constriction of a chromosome separating it into the short arm (p) and the long arm (q). The centromere is the chromosomal region over which the kinetochore is organized. See the Figure at NHGRI. Mouse chromosomes have centromeres close to one end and have essentially no short arm. See the idiogram of the mouse karyotype at the Department of Pathology at the University of Washington.
Chemically Induced Mutation
A mutation induced by treatment with a chemical mutagen, for example, ENU (ethyl nitrosourea) or chlorambucil.
This Pathway Tools ontology class defines reactions for which at least one substrate molecule is chemically modified, meaning that either a chemical bond (covalent, ionic or coordination) is formed and/or broken, or that a redox modification has occurred.
In MGI, this term refers to terms in a hierarchicalcontrolled vocabulary like ones containing Gene Ontology (GO) terms. A "child" of a term is a term any number of levels below it in the hierarchy that is a descendant of the term. For example, the GO term alcohol dehydrogenase [GO:0004022] is a child of the GO term enzyme [GO:0003824]. See also:
An animal formed from two different animals, that is from two different embryonic sources. In mouse genetics, targeted mutations produced in embryonic stem cells are recovered by breeding chimeric mice resulting from the mixture of ES cells with a genetically-distinct blastocyst.
A chimeric pathway comprises reactions from multiple organisms, and most commonly does not occur in its entirety in a single organism. Chimeric pathways are always superpathways. Chimeric pathways are intended to depict a set of related pathways across multiple organisms.
Choke Point Reaction Finder
Choke-point reaction finder is a tool to find choke point reactions in a PGDB.
A chokepoint reaction is a reaction that either uniquely consumes a specific reactant or uniquely produces a specific product in a metabolic network.
A kind of mutation in which there is a change in the arrangement of the genome into chromosomes; this term usually applies to those changes that are visible cytogenetically. Classes of chromosome rearrangements include:
An organism derived from a founding individual by asexual means that is genetically identical to the founding individual.
A DNAclone whose structure does not accurately represent genomic or mRNA sequence, due to errors in the cloning process. For example, two noncontiguous genomic fragments may be joined by ligation prior to being incorporated into the cloning vector.
A DNA construct capable of replication within a bacterial or yeast host that can harbor foreign DNA, facilitating experimental manipulation of that DNA segment.
One of a series of terms applied to the phenotypic effect of a particular allele in reference to another allele (usually the standard wild-type allele) with respect to a given trait. An allele "a" is said to be codominant with respect to the wild-type allele "A" if the A/a heterozygote fully expresses both of the phenotypes associated with the a/a and A/A homozygotes. An example of codominance is the ABO blood type antigens in humans, where AA individuals are type A, BB individuals are type B, and AB individuals are type AB. See also:
A common name is the name that is used in BioCyc to refer to an object. In many cases, a single object may have many synonyms, but only one of those can be designated a common name.
A single-stranded nucleic acid that would bind to a given single-stranded nucleic acid by following base pairing rules (A pairs with T and C with G). The complementary sequence to GTAC for example, is CATG.
The appearance of a wild-typephenotype in an individual that is the hybrid offspring of two mutant individuals homozygous for recessivemutations. Complementation shows that the two parental mutant individuals have mutant alleles of different genes, even if they are phenotypically similar. For example, a cross between two light gray mice(ash/ash and d/d) would produce a black mouse.
In MGI, the marker type "Complex/Cluster/Region" refers to any of the following:
A segment of the mouse genome defined by comparison to an orthologous segment in the genome of another species, or by some specific characteristic, such as loss of heterozygosity.
A marker repository for information pertaining to a specific gene family, where such information lacks precise family member resolution.
A heterozygote where different mutantalleles are present at the two copies of a given locus. In MGI, a compound heterozygote occurs when the two opposing alleles are associated with different MGI markers.
A genotype that is dependent on the presence of some other factor (often a DNAsequence that expresses a protein functioning in specific recombination events). In MGI, conditional is most commonly used to represent Cre-mediated excision of genomic sequences flanked by loxP sites. In most instances, excision of the endogenous genomic sequence flanked by loxP sites is accompanied by simultaneous insertion of some selectable marker (e.g. neomycin). The excised genotype, which is often associated with a phenotype, is dependent (or conditional) on the presence of the Cre-expressing construct.
With respect to nucleic acids, "cross-hybridization" refers to the formation of double-stranded DNA, RNA, or DNA/RNA hybrids by complementary base pairing between two molecules that are not identical in sequence. Cross-hybridization may be observed between nucleic acids derived from orthologous or paralogous genes.
A person who performs curation on a database. See Curation.
The process of manually updating and refining a bioinformatics
database. Literature-based curation
involves updates database based on information found in the
scientific literature. Curation involves updating both structured database
fields, and English text such as mini-review summaries that capture information
not captured by the highly structured sections of the database. For example, curators may use the free-text comment sections to capture information such as similarity to other proteins or data from functional complementation experiments. The comment section is also to be used to note cases in which the published reports present contradictory results. In such cases, both viewpoints will be presented with proper attribution. This approach assures that no information is lost.
Refers to the correlation of genetic and cytological information through the microscopic analysis of stained preparations of chromosomes, including those from individuals carrying mutations.
One of the subregions of a chromosome visible microscopically after special staining.
A data structure that stores metadata, i.e. data about data. More generally, an organized collection of information.
Database Management System (DBMS)
A collection of computer programs that allow storage, modification, and extraction of information from a database. There are many different types of DBMSs, ranging from small systems that run on personal computers to huge systems that run on mainframes. The following are examples of database applications:
The separation of the two strands of a double-stranded nucleic acid caused by treatments that overcome hydrogen bonding, e.g., heat.
A usually irreversible change in the conformation of a protein caused by treatments that overcome hydrogen bonding, hydrophobic interactions, or other chemical forces that maintain the structure of proteins, e.g., heat.
A monomer unit of DNA, consisting of a purine or pyrimidine base, a deoxyribose sugar molecule, and phosphate group(s).
Having two forms.
Having twice the chromosome number normally found in a gamete. Normal mice are diploid, having a chromosome set from the maternal gamete (the egg) and a chromosome set from the paternal gamete (the sperm). See also Haploid.
Indicates that the expression of the reporter gene was detected using a direct method, such as fluorescence or using an enzymatic substrate.
One of a series of terms applied to the phenotypic effect of a particular allele in reference to another allele (usually the standard wild-type allele) with respect to a given trait. An allele "A" is said to be dominant with respect to the allele "a" if the A/A homozygote and the A/a heterozygote are phenotypically identical and different from the a/a homozygote. See also:
Database of Transcribed Sequences. The DoTS human and mouse transcript index is created from all publicly available transcript sequences. Input sequences are clustered and assembled to form the DoTS Consensus Transcripts that make up the index.
The shape that two linear strands of DNA assume when hydrogen-bonded together.
A number assigned to a type of enzyme according to a scheme of standardized enzyme nomenclature developed by the Enzyme Commission of the Nomenclature Committee of the International Union of Biochemistry and Molecular Biology (IUBMB). EC numbers may be found in ENZYME, the Enzyme nomenclature database, maintained at the ExPASy molecular biology server of the Geneva University Hospital and the University of Geneva, Switzerland.
A type of DNA construct containing a reporter gene sequence downstream of a promoter that is capable of integrating into random chromosomal locations. Integration of the enhancer trap near an enhancer allows the expression of a new mRNA encoding the reporter gene. The reporter gene is therefore expressed in the cells and developmental stages where the enhancer is active. See also Gene Trap.
Enrichment analysis aids in determining whether a set of entities is statistically over-represented for another set of entities of interest. For example, is a set of genes enriched in known metabolic pathways or Gene Ontology terms? Is a set of metabolites enriched for known metabolic pathways? Enrichment analysis can identify phenomena underlying gene-expression data, for example.
Ensembl Gene Model
A description of a gene associated with a genome assembly from a genome sequencing project. The gene model includes the coordinates of each intron and exon, as well as the beginning and end of the transcript.
Frames in the class Enzymatic-Reactions describe attributes of an enzyme with respect to a particular reaction that it catalyzes.
A protein (or rarely, RNA) that catalyzes a chemical reaction.
Refers to factors affecting the development or function of an organism other than the primary sequence of the target genes. See also Imprinting.
Masking of a phenotypictrait through the action of a mutantallele. For example, albinism (absence of pigment) is epistatic to coat color genes that determine black vs. brown fur in animals.
Expressed Sequence Tag. A partial sequence of a randomly chosen cDNA, obtained from the results of a single DNA sequencing reaction. ESTs are used both to identify transcribed regions in genomic sequence and to characterize patterns of gene expression in the tissue that was the source of the cDNA.
Unique identifiers that describe the types of evidence that support and justify the inclusion
of information within a database. In PGDBs, evidence codes typically justify the
existence of biological entities such as genes and pathways.
Evidence codes are denoted by visual icons in the upper-right corner of pages such as pathway and enzyme pages.
For example, a flask icon denotes experimental evidence [example]. Clicking on evidence icons will produce a more detailed explanation of
the evidence type, and often a citation to the source of the evidence.
Change of the genes of a population over time, resulting in new species.
The presence of similargenes, portions of genes, or chromosome segments in different species, reflecting both the common origin of species and an important functional property of the conserved element.
The relative constancy of the phenotype of individuals of a given genotype. Mutations said to have variable expressivity show a relatively large amount of phenotypic variation among individuals having the same genotype. See also Penetrance.
An FBA (Flux Balance Analysis) model predicts the steady-state flux rates of metabolic reactions given a set of nutrients, secretions, and biomass metabolites (the end products of biosynthesis). The Pathway Tools MetaFlux module provides FBA model development and solving capabilities through the locally installed version of Pathway Tools on the Linux and Mac platforms.
FFAQP : Free Form Advanced Query Page
FFAQP is a form that allows a user to enter the text form of a bioVelo query and submit that query for execution. The FFAQP allows users to enter more advanced queries than does the Structured Advanced Query Page because the full BioVelo query language is accessible from the Free Form page.
In a relational database, an item of information, such as a chromosome number, or the centimorgan length on a genetic map. Some fields are numeric, while others are textual; some are long, while others are short. In addition, every field has a name, called the field name. In database management systems, a field can be required, optional, or calculated. A required field is one in which you must enter data, while an optional field is one you may leave blank. A calculated field is one whose value is derived from some formula involving other fields. You do not enter data into a calculated field; the system automatically determines the correct value. A collection of fields is called a record. See also parameter.
Sequence of genomicDNA in which: a) bases are identified to an accuracy of no more than one error in 10,000 bases, b) there is no ambiguity about the order or orientation of any segment, and c) there are few if any gaps. See also Draft Genome Sequence.
A locus in the cytoplasmic or nuclear genome that is necessary and sufficient to express the complete complement of functional products derived from a unit of transcription.
A locus in the cytoplasmic or nuclear genome identified by hybridization to a nucleic acid segment derived from another species, where the segment used as probe represents some portion of a functional unit of transcription in the cytoplasmic or nuclear genome of the other species.
An exon-encoding segment of the germ-line nuclear genome located within a region that undergoes somatic rearrangement.
A locus in the cytoplasmic or nuclear genome that is within an intron of (but not, itself, an exon of) a unit of transcription, which gives rise to a functional product upon transcriptprocessing of the host unit.
A type of nonreciprocal recombination event in which a recipient strand of DNA receives information from another strand having an allelic difference. The recipient strand has its original allele "converted" to the new allele as a consequence of the event.
This mode of MetaFlux predicts which genes (or reactions) are essential and which genes (or reactions) are not essential. It can also predict which biochemical reactions are essential. Multiple simultaneous knockouts of reactions and genes (i.e., single, double, and so on) are provided.
A type of DNA construct containing a reporter gene sequence downstream of a splice acceptor site that is capable of integrating into random chromosomal locations. Integration of the gene trap into an intron allows the expression of a new mRNA containing one or more upstream exons followed by the reporter gene. The reporter gene is therefore expressed in the same cells and developmental stages as the gene into which the gene trap has inserted. See also Enhancer Trap.
The total genetic information of a cell or organelle. In eukaryotes, "genome" usually refers to nuclear DNA rather than to mitochondrial or chloroplast DNA.
The comprehensive study of whole sets of genes and their interactions rather than single genes or proteins.
A description of the genetic information carried by an organism. In the simplest case, genotype may refer to the information carried at a single locus, as in A/A, A/a, or a/a.
The genetic constitution of an organism, as distinguished from its physical appearance (its phenotype).
A genome browser is a graphical interface used to examine the layout of genes and other features within a replicon (chromosome or plasmid). The comparative genome browser can be used to examine several replicons (chromosomes or plasmids) simultaneously side by side. This allows easy visual comparison of related organisms to observe similarities and differences in their gene arrangements.
The Genome Overview shows in one screen all the genes in an organism's genome, as well as additional information about their transcription units and products.
A poster-size depiction of the genome map (replicons) of an organism, generated from a PGDB by the Pathway Tools genome browser.
The part of the genome characterized by relatively low gene density and the presence of highly repetitive sequences. Heterochromatin is more highly condensed than euchromatin.
The X chromosome that is highly condensed in a mammalian cell that has undergone X inactivation. The inactive X chromosome resembles heterochromatin as defined above with respect to their state of condensation and genetic inactivity, although there is no change in the DNA sequence as a consequence of inactivation.
Producing two types of euploidgametes with respect to chromosomal content. This term is applied to one of the sexes in species with chromosomal sex determination; in mammals, males are heterogametic. See also:
A polymer composed of different subunits. Some multimeric proteins are normally heteropolymers. Heteropolymers can also be made experimentally, using subunits derived from different species, as a test of homology. Formation of a functional multimeric protein product using subunits from different species is a demonstration of homology.
The state of a diploidlocus in which different alleles are present at the two copies of that locus (usually one is normal and the other is abnormal).
A description of a structure in which things are organized into a hierarchy.
An organization with few things, or one thing, at the top and with several things below each other thing. An inverted tree structure. An example in computing is a directory hierarchy where each directory may contain files or other directories. It may refer to terms in a controlled vocabulary such as ones containing Gene Ontology (GO) terms.
The Pathway Tools Pathway Hole Filler provides a computational method for combining evidence from homology data, operon-based data, and pathway context to identify missing enzymes in a Pathway/Genome Database (see also: Pathway Hole).
Producing a single type of euploidgametes with respect to chromosomal content. This term is applied to one of the sexes in species with chromosomal sex determination; in mammals, females are homogametic. See also:
One of a pair of chromosomes that segregate from one another during the first meiotic division.
A gene related to a second gene by descent from a common ancestral DNA sequence. The term, homolog, may apply to the relationship between genes separated by the event of speciation (see ortholog) or to the relationship betwen genes separated by the event of genetic duplication (see paralog).
A morphological structure in one species related to that in a second species by descent from a common ancestral structure.
Reciprocal recombination between DNA sequences that have a high degree of similarity and that are located at corresponding positions on homologouschromosomes.
The relationship of any two characters that have descended from a common ancestor. This term can apply to a morphological structure, a chromosome or an individual gene or DNA segment. See the Figure at NCBI. See also:
Hypertext Markup Language. An authoring language for creating and sharing electronic documents over the Internet. You can view the HTML source code for a web page by selecting the Source view from one of your browser's menus.
Literally, "water-loving"; polar or charged compounds that are soluble in water.
Literally, "water-fearing"; nonpolar compounds that are immiscible with water. The side chains of some amino acids are nonpolar, and hence protein sequences rich in these amino acids tend to locate to the interior of the protein in its native state, away from the solvent.
A method of detecting the presence of specific proteins in cells or tissues. Fixed cells or tissue on a microscope slide, made permeable if necessary with a detergent, are reacted with a primary antibody to the specific protein to be assayed. The preparation is then treated with a secondary antibody that has been coupled to an enzyme and which is directed against the primary antibody (e.g., goat anti-rabbit antibody). The preparation is then treated with a chromogenicsubstrate. Microscopic examination reveals the presence of staining, and hence of the specific protein to be detected.
The binding of an antibody to a protein that is different from the protein against which the antibody was raised. This result demonstrates sequence or structural similarity between the two proteins and can be evidence of homology.
An epigenetic modification of genes that identifies a given gene as having been inherited from the maternal or paternal parent. In mammals, some genes are expressed primarily from the maternally-inherited or paternally-inherited alleles as a consequence of imprinting.
A strain that is essentially homozygous at all loci. In mice, a strain produced from brother-sister matings for at least 20 sequential generations. C57BL/6J is a widely-used inbred strain of mouse.
A cross between two identically homozygous individuals (A/A X A/A). See also:
The study of the application of computer and statistical techniques to the management of information. In genome projects, Informatics includes the development of methods to search databases quickly, to analyze DNA sequence information, and to predict protein sequence and structure from DNA sequence data.
A method of detecting the presence of specific nucleic acid sequences within a cytological preparation. A DNA or RNAprobe is labeled radioactively or chemically and hybridized to a cytological preparation to detect RNA or to a denatured cytological preparation to detect DNA. The hybridization is detected by autoradiography (for radioactive probes) or by chromogenic reactions or fluorescence (for chemically-labeled probes). See also FISH.
A type of mutation in which a length of DNA is broken in two positions and repaired in such a way that the medial segment is now present in reverse order. Inversions range in size from those large enough to be visible cytogenetically to those involving only a few base pairs.
Literally, "in glass", meaning a reaction, process or experiment in a metaphorical test tube rather than in a living organism. See also:
In molecular biology, a "library" is a complex mixture of recombinant DNA molecules in a suitable cloning vector representing either the entire genome of an organism (a genomic library) or the messenger RNA population of a whole organism, cell type, or tissue type (a cDNA library).
Literally, "place". The location of a gene or set of genes on a chromosome.
A statistical estimate of whether two loci are likely to lie near each other on a chromosome and are therefore likely to be inherited together. Lod stands for "log of the odds ratio." In this case, the odds ratio is the likelihood that two markers are linked divided by the likelihood that they are not linked. A LOD score of three or more is generally taken to indicate that the two loci are close.
A pair of nuclear divisions forming gametes wherein the number of chromosomes is reduced from the diploid to the haploid number; resulting cells normally contain one member of each pair of homologous chromosomes.
A phospholipid bilayer that forms a hydrophobic barrier around and within cells.
That type of inheritance in which a specific trait is affected by a set of alleles of a single gene.
That type of inheritance in which genetic information is transmitted by one or more nuclear genes, as opposed to cytoplasmic or epigenetic mechanisms.
A metabolic cluster is a Pathway Tools term for a set of biochemical reactions that are biologically related, but are largely unconnected, and therefore do not constitute a pathway in the traditional sense of the word.
A poster-size depiction of the Cellular Overview diagram generated from a PGDB by Pathway Tools.
The Metabolite Tracing facility of Pathway Tools permits users to graphically trace the path of a metabolite through the metabolic network.
A Pathway/Genome Database describing
experimentally elucidated metabolic pathways and enzymes. MetaCyc is
both an online reference source on metabolic pathways, and is a
reference database for computational prediction of metabolic pathways.
MetaCyc is available at URLMetaCyc.org.
MetaCyc is a member of the BioCyc Database Collection.
Data about data. In data processing, metadata is definitional data providing information about or documentation of other data managed within an application or environment, for example, data about
data elements or attributes (name, size, data type, etc.),
records or data structures (length, fields, columns, etc.) and
data (where located, how associated, what ownership, and so on).
Metadata may include descriptive information about the context, quality and condition, or characteristics of the data.
An array of hundreds or thousands of spots containing specific DNA sequences for the analysis of gene expression by hybridization. Microarrays are used to detect changes in gene expression by comparing radioactively- or chemically-labeled cDNA prepared from the total mRNA of an experimental sample to that of a control sample. The relative intensity of the signal corresponding to each spot in the microarray reveals whether the expression of a particular gene is increased, decreased, or unchanged in the experimental mRNA sample compared to the control mRNA sample.
A cytoskeletal element of eukaryotic cells that is a long, generally straight, hollow tube with an external diameter of 24 nm, consisting of polymerized monomers of tubulin. Microtubules make up the bulk of the spindle.
The organelles that generate energy in eukaryotic cells. Mitochondria have their own genome encoding a subset of the proteins found in mitochondria; the mitochondrial genome uses an alternate genetic code.
A gene contained within the mitochondrial genome of a eukaryote, transmitted independently of the nuclear genome. The mitochondrial genome is transmitted maternally (from the female parent).
The division of the replicated chromosomes of a eukaryotic cell into two daughter nuclei that are genetically identical to that of the original cell. See the Figure at NHGRI.
This mode of MetaFlux generates an accurate and feasible FBA model for a PGDB (i.e., an organism).
Model Organism Database
A database that describes
the genome of an organism, plus other information. A Pathway/Genome Database
is a type of model-organism database.
The organism may
or may not be a model for studying some biological process --- the
term "organism-specific database" could be used equally well, but "model-organism database"
(MOD) has been used historically in bioinformatics. MODs provide a central
resource where new computational and experimental findings about a
genome can be integrated and reviewed by experts for that organism.
MODs serve as a distributed collaboration framework that allows a set
of experts to collaboratively develop a complex DB that serves as a
platform for disseminating the genome and its analyses to the
scientific community. MODs are indispensable resources in the
day-to-day work of experimentalists for a given organism, who use them
as reference sources for genome information, gene annotations,
pathways, and regulatory information.
Refers to the tasks or activities characteristic of particular gene products. For example, transcription factor refers to one of a number of proteins performing similar tasks. In the GO Project vocabularies, Molecular Function is a primary class of terms. See the GO Consortium site for further information.
An antibody produced by cultured cells that have their origin in a single antibody-producing cell and which is therefore of a single molecular type, in contrast to the polyclonal antibodies normally found in the serum of an immunized animal.
An individual consisting of cells of two or more genotypes. One example is that of a normal female mammal heterozygous for different alleles of X-chromosomegenes; because of the process of X-inactivation, such females consist of two cell types, each with a different X chromosome inactivated. This is an unusual example because there is no actual difference in genotype between the two cell types, but rather there is an epigenetic difference.
An assay that detects specific RNA molecules using a DNA or RNA probe with sequence similarity. Samples are subjected to electrophoresis on a slab gel. A replica of the gel is then made on a membrane by capillary transfer. Specific RNA sequences are then detected on the membrane with a radioactively- or chemically-labeled probe. See also:
DNA or RNA. Each of these compounds consists of a backbone of sugar molecules ribose for RNA and deoxyribose for DNA linked by single phosphate groups. Attached to the sugars of the backbone are any of four nitrogenous bases, A, T, C or G for DNA and A, U, C or G for RNA. See the Figure at NHGRI.
A monomer unit of nucleic acid, consisting of a purine or pyrimidine base, a sugar molecule (ribose or deoxyribose), and phosphate group(s).
Nucleotide Repeat Expansion
A type of mutation in which a set of tandemly repeated sequences replicates inaccurately to increase the number of repeats. An example of this kind of mutation in humans is the FMR1 gene.
The organelle in a eukaryotic cell that contains the chromosomes. In most types of eukaryotic cells, the nucleus breaks down as chromosomes condense during cell division. See the Figure at NHGRI.
In mathematics, a set with no members or of zero magnitude. If a field has a value of null, it means that the value is unknown. A null value is not the same as a value of zero. (To appreciate the difference, consider the terms "free" and "priceless." If something is free, it has a price of zero. If something is priceless, it has no known price. The difference between null and zero can be crucial; for example, when calculating the average value of a field among many records where one row contains a zero, the zero gets factored into the average. If the field has a null value, it does not get factored into the average.)
The Pathway Tools omics viewer uses the Cellular, Regulatory and Genome Overviews to illustrate the results of high-throughput experiments in a global metabolic and genomic context by painting experimental data onto these diagrams.
Ontologies are used to structure biological knowledge such that the knowledge
can be manipulated computationally. One type of ontology is a controlled vocabulary,
meaning a list of biological terms, each of which contains an English definition,
and a unique identifier. Databases refer to that term by its unique identifier to
avoid ambiguity, such as the ambiguity that occurs when the same word has multiple meanings.
Another type of ontology is an "is-a" hierarchy in which terms in a controlled
vocabulary are arranged in a generalization hierachy (or hierarchical classification) such that one term is a
parent of another term if the first term denotes a more general concept than the second term.
For example, the MetaCyc database contains a hierarchical classification of metabolic pathways.
Hierarchical classification systems allow users to retrieve
related sets of objects within the database, and to drill down from more general
to more specific classes of information.
Database schemas and knowledge representation systems are also types of ontologies.
A unit of genetic material that is expressed in a coordinated manner by means of an operator, a promoter, and one or more structural genes that are transcribed together.
One of a number of different kinds of membrane-bound substructures within a eukaryotic cell. Examples include the nucleus, mitochondria, and chloroplasts.
One of a set of homologous genes that have diverged from each other as a consequence of speciation. For example, the alpha globin genes of mouse and chick are orthologs. See the Figure at NCBI. See also:
P1 Artificial Chromosome. A type of cloning vector derived from bacteriophage P1 that allows foreign DNA segments to be cloned in bacteria. The capacity of a PAC is up to 100 kb of foreign DNA.
The inference component of the Pathway Tools software. PathoLogic contains four predictors:
A predictor of metabolic pathways; a predictor of missing enzymes in metabolic pathways
(the pathway hole filler); an operon predictor; and a program that predicts transport
reactions from transporter functional descriptions. See the Pathway Tools Overview for more information.
An interconnected set of biochemical reactions, where reactions are connected by sharing common reactants and products. In metabolic pathways, reactants and products are typically low-molecular-weight chemical compounds. In signaling pathways, reactants and products are typically proteins.
A Pathway/Genome Database (PGDB) is a database managed by SRI's Pathway Tools software
that describes an information space ranging from genomes to pathways.
A PGDB such as EcoCyc describes the genome of an organism, the product of
each gene, the biochemical reaction(s) catalyzed by each gene product,
the substrates of each reaction, and the organization of reactions
into pathways. The schema of a PGDB can also describe the regulatory
network of an organism.
Pathway Tools Overviews
A genome-scale depiction of information within a PGDB. There are three different Overviews: the Cellular Overview depicts the metabolic network, the Regulatory Overview depicts the regulatory network, and the Genome Overview depicts the full genome.
Pathway Tools Posters
Pathway Tools can generate postscript and/or PDF files of a poster-size depiction of a genome map and of a metabolic map from a PGDB.
One of a set of homologous genes that have diverged from each other as a consequence of genetic duplication. For example, the mouse alpha globin and beta globin genes are paralogs. The relationship between mouse alpha globin and chick beta globin is also considered paralogous. See the Figure at NCBI. See also:
An item of information such as a name, a number, or a selected option passed to a program by a user or another program. Parameters affect the operation of the program receiving them. Parameters are values that you select or enter in the query form fields.
Pathway Evidence Report
Pathologic can generate a “pathway evidence report” Web page that lists all pathways it has predicted in an organism and the evidence supporting each predicted pathway. This report provides a convenient way for a scientist to review the evidence for each pathway.
A pathway hole is a pathway reaction thought to occur in an organism for which no corresponding enzyme has been identified in the genome.
The SRI software system used to
construct, update, visualize, query, and analyze Pathway/Genome Databases (PGDBs).
Pathway Tools powers the BioCyc website and other similar websites. It can also be installed
locally as a desktop application.
Pathway Tools has four components: (1) The Pathway/Genome
Navigator supports querying, visualization, and analysis of
(2) The PathoLogic program supports
automated creation of a PGDB and performs several computational
inferences including prediction of the metabolic pathway
complement and operons of an organism.
(3) MetaFlux is used to develop steady-state metabolic flux models from
PGDBs using flux-balance analysis.
(4) The Pathway/Genome Editors support interactive updating
and refinement of PGDBs.
Pathway Tools was developed
by the group of Dr. Peter D. Karp at SRI International.
In BioCyc, this term refers to terms in a hierarchicalcontrolled vocabulary such as those containing Gene Ontology (GO) terms. A "parent" of a term is one any number of levels above it in the hierarchy from which it is descended. For example, the GO term enzyme [GO:0003824] is a parent to the GO term alcohol dehydrogenase [GO:0004022]. See also:
Polymerase Chain Reaction. A method of amplifying specific DNA segments based on hybridization to a primer pair. A DNA sample is denatured by heating in the presence of a vast molar excess of short single-stranded DNA primers (around 20 nucleotides) whose sequence is chosen based on the target sequence. The reaction mixture also contains a thermostableDNA polymerase, dNTPs, and buffer. The primer sequences are selected so that they:
are derived from opposite strands of the target sequence,
are separated by a length of DNA that can be reliably synthesized in vitro.
The sample is then cooled to a temperature that allows primer annealing and in vitro replication. The sample is subjected to multiple cycles of denaturation and cooling to allow multiple rounds of replication. The quantity of the target sequence doubles during each cycle, causing the target sequence to be amplified, while other DNA sequences in the sample remain unamplified. See the Figure at Access Excellence.
A type of cloning vector derived from a bacteriophage, usually capable of carrying an amount of foreign DNA that is at the upper range of that carried by a plasmid.
A type of cloning vector derived from a phage and a plasmid. Phagemids are capable of carrying an amount of foreign DNA comparable to a plasmid, but have some special feature such as the ability to produce single-stranded DNA.
The condition of an individual resembling that of a phenotype produced by a particular mutation by some experimental treatment other than the presence of that mutation, e.g., drug treatment.
A description of the observable state of an individual with respect to some inherited characteristic. Often, individuals with different genotypes display the same phenotype. See dominant and recessive.
The detection of radioactivity using "phosphor" compounds that emit visible light when exposed to radiation. Phosphorimaging instruments produce images of, for example, Southern blots and Northern blots, that are comparable to those produced by autoradiography, with superior quantitation.
A type of cloning vector derived from autonomously-replicating extrachromosomal circular DNAs in bacteria. The amount of foreign DNA that can be carried in a plasmid is small, ranging up to about 20 kb.
The process by which a series of adenosine (A) ribonucleotides is added to the 3' end of a splicedRNA to make a mature mRNA. This addition to the RNA is sometimes referred to as a poly-A tail, and commonly contains several hundred bases.
An instance of genotypic variation within a population.
Metabolism is the term used to describe all of the chemical reactions and interactions
that take place in a biological system. Primary metabolism encompasses reactions involving
those compounds which are formed as a part of the normal anabolic and catabolic processes
which result in assimilation, respiration, transport, and differentiation. These processes
take place in most, if not all, cells of the organism. Common examples of primary compounds
are sugars, amino acids, nucleotides etc. Primary metabolism is simply defined as the metabolism of primary compounds.
A single-stranded nucleic acid that can "prime" replication of a template. More specifically, a single-stranded nucleic acid capable of hybridizing to a template single-stranded nucleic acid in such a way as to leave part of the template to the 3' end of the primer single-stranded. DNA polymerase can then synthesize a new strand starting from the 3' end of the primer and adding nucleotides to the growing strand by base complementarity to the template. See also PCR.
In molecular biology, a nucleic acid that has been labeled either radioactively or chemically that allows the detection of nucleic acids with sequence similarity in a sample by hybridization. Probes are used to detect DNA on membranes in Southern blots, to detect RNA on membranes in Northern blots, and either DNA or RNA in cytological preparations for in situ hybridization.
Cell or organism lacking a membrane-bound, structurally discrete nucleus and other subcellular compartments. Bacteria are prokaryotes. See also eukaryote.
A region of a protein responsible for a particular function, as recognized experimentally and by the occurrence of similar segments in other proteins sharing that function, e.g., a DNA binding domain.
A gene whose product is a protein.
A method of detecting a particular enzyme in a cell or tissue sample. A sample of cells or tissue is fixed, then treated with a chromogenicsubstrate for the enzyme to be detected. Microscopic examination reveals the presence of staining, and hence of the specific protein to be detected.
A heritable genetic region that affects a measurable characteristic of the animal (e.g., body weight or blood pressure).
The type of marker described by statistical association to quantitative variation in a particular phenotypic trait that is thought to be controlled by the cumulative action of alleles at multiple loci.
A request for information submitted to a computerized database. See also:
Electromagnetic energy: gamma rays, X rays, ultraviolet light, visible light, infrared light, microwaves and radio waves. In mouse genetics, this term generally refers to gamma rays and X rays.
Subatomic particles emitted by the decay of unstable isotopes: electrons (beta particles) and helium nuclei (alpha particles). Common unstable isotopes in molecular biology are tritium (3H),which emits low-energy beta particles, 35S, which emits beta particles of moderate energy, and 32P, which emits high-energy beta particles.
Subatomic particles from a particle accelerator, such as protons, neutrons, and electrons.
Radiation Hybrid Mapping
A type of genetic mapping providing resolution between relatively low-resolution linkage analysis and high-resolution physical mapping by the assembly of contiguous cloned DNA segments. The method consists of fusing irradiated cultured cells of one species with cultured cells of a different species. A panel of hybrid cells is then tested for the occurrence of pairs of markers. The closer two markers are to each other, the more likely that both are present in an individual hybrid cell.
Radiation Induced Mutation
A mutation induced by irradiation, in mouse usually gamma-ray or X-ray.
Given a starting set of metabolites (called the nutrients), the Pathway Tools Reachability Analysis tool determines which reactions can fire, and which other metabolites are produced as a result of this qualitative simulation, in an automated and iterative manner.
One of a series of terms applied to the phenotypic effect of a particular allele in reference to another allele (usually the standard wild-type allele) with respect to a given trait. An allele "a" is said to be recessive with respect to the allele "A" if the A/A homozygote and the A/a heterozygote are phenotypically identical and different from the a/a homozygote. An example is the nonagouti (a) allele of the mouse. A(+)/A(+) and (+)/a mice have identical agouti banding of individual hairs in the coat, while a/a mice have hairs of uniform color. See also:
Transfer of information from one DNA molecule to another. Recombination may be reciprocal, in which case the products are equivalent to breakage of the two DNA molecules and rejoining of the broken ends to form new molecules. Recombination may also be nonreciprocal, in which case the product is equivalent to transfer of information from the donor DNA molecule to the recipient DNA molecule, with no change in the donor DNA molecule. Reciprocal recombination events are also called crossovers.
Redox half reactions are elementary reactions in which explicitly stated electrons are reducing an oxidized molecular species. These reactions do not stand alone, because electrons do not occur freely. Instead, a half reaction must be paired with another half reaction to form a complete, overall transformation.
The numerical values in the data file include positive and negative values.
The process of synthesizing a copy of a DNA molecule from nucleotides using information contained within one strand of a template DNA molecule. The new strand of DNA is synthesized from the 5' end to the 3' end. See the Figure at NHGRI.
A gene whose product is easily detected and not ordinarily present in an organism or cell type under study that is expressed as part of a DNA construct introduced experimentally. Bacterial beta-galactosidase, whose activity can be detected using a staining reaction, is a commonly used reporter gene, as is green fluorescent protein. See also:
A mutation event that alters an allele conferring a mutantphenotype into one conferring a wild-type phenotype. The mutation need not restore the gene to its original nucleotide sequence to be considered a reversion event.
Ribonucleic acid. A nucleic acid that is the primary product of gene expression. Chemically, it differs from DNA by the substitution of ribose for deoxyribose in the sugar-phosphate backbone and by the substitution of the base uracil for thymine. See the Figure at NHGRI. See also:
A method of detecting the presence of a specific RNA in a sample. A radioactively-labeled RNA probe is prepared by transcribing the antisense strand of a DNA construct. The labeled probe is hybridized to the sample. The sample is then treated with RNAse, which is specific to single-stranded RNA. The sample is then subjected to electrophoresis and autoradiography. The presence of full-length probe that has not been cleaved by RNAse indicates the presence of the sense strand, and hence gene expression, in the sample.
A particular type of translocation in which the breakpoints in the two chromosomes occur at or near the centromere, followed by centric fusion such that the long arms now form a metacentric chromosome with a single centromere. Any small fragments generated in the exchange are usually lost. See also Translocation.
Ribosomal RNA. The RNA molecules that are a structural and catalytic component of the ribosome.
Reverse-Transcription PCR. A method of amplifying mRNA by first synthesizing cDNA with reverse transcriptase, then amplifying the cDNA using PCR. A positive result is evidence of a particular mRNA, and hence of gene expression, in a sample.
SAM Output File
The Omics Viewer can import gene expression data from a spreadsheet generated by the SAM (Significance Analysis of Microarrays) Microsoft Excel plug-in. This package combines multiple expression experiments to produce a list of statistically significant positively and negatively regulated genes. The Omics Viewer displays the positively regulated genes in one color, and the negatively regulated genes in another color.
SAQP: Structured Advanced Query Page
The SAQP is a graphical user interface to formulate a query to a PGDB without knowing the underlying query language (BioVelo).
An underlying organizational pattern or structure; conceptual framework.
A collection of items that model part or all of a real world object, particularly in the context of a database, i.e., a database schema.
The structure of a database system, described in a formal language supported by the database management system (DBMS). In a relational database, the schema defines the tables, the fields in each table, and the relationships between fields and tables. Schemas are generally stored in a data dictionary. Although a schema is defined in text database language, the term is often used to refer to a graphical depiction of the database structure.
In computer science, a description of the logical organization, structure, and content of a database.
Secondary metabolism is the metabolism of secondary
compounds, defined simply as compounds other than primary compounds. A compound is
classified as a secondary metabolite if it does not seem to directly
function in the processes of growth and development. Even though
secondary compounds are a normal part of the metabolism of an
organism, they are often produced in specialized cells, and tend to be
more complex than primary compounds. Examples of secondary compounds
include antibiotics, and plant chemical defenses such as alkaloids and
The separation of different alleles of the same gene during meiosis.
One of a series of terms applied to the phenotypic effect of a particular allele in reference to another allele (usually the standard wild-type allele) with respect to a given trait. An allele "A" is said to be semidominant with respect to the allele "a" if the A/A homozygote has a mutant phenotype, the A/a heterozygote has a less severe phenotype, while the a/a homozygote is wild-type. An example is Pmp22(Tr-J) in mouse. Pmp22(Tr-J)/Pmp22(Tr-J) animals display a myelination defect associated with a "trembler" phenotype, while Pmp22(Tr-J)/Pmp22(+) animals are less severely affected, and Pmp22(+)/Pmp22(+) animals are wild-type. See also:
v. The analysis process used to create sequence annotations. The process relies heavily on the homology principle, whereby similarity to known genes is used to help identify new genes and propose functions for them.
Sequence ID (SeqID)
Sequence accession identifier. A unique alphanumeric character string that unambiguously identifies a sequence record in a database. Examples of genomic sequence providers are NCBI and Ensembl; examples of sequence IDs from these providers are 16590 and ENSMUSG00000053869, respectively. See also:
The sequencing of a large DNA segment through the sequencing of randomly-derived subsegments whose order and orientation within the large segment is unknown until the assembly of overlapping sequences. The method works if all positions in the large segment are covered by multiple overlapping subsegments. See also Whole-genome shotgun sequencing.
This term refers to terms in a hierarchical controlled vocabulary such as those containing Gene Ontology (GO) terms.
A "sibling" of a term is a term at the same level of the hierarchy sharing at least one ancestor. For example, the GO term
alcohol dehydrogenase [GO:0004022] is a sibling to the GO term aldehyde oxidase [GO:0004031]; they share the ancestor term
A protein component of RNA polymerase that determines the specific site on DNA where transcription begins.
Signal transduction pathway
Pathways that describe the chain of events, such as protein phosphorylation, that occurs during the propagation of a signal in a cell. These pathways start with the binding of ligand by a trans-membrane receptor, and proceed through a series of intermediate molecules until final regulatory molecules, such as transcription factors, are modified in response.
In comparison of nucleic acid sequences, the extent to which two nucleic acid sequences have identical bases at equivalent positions, usually expressed as a percentage.
In comparison of protein sequences, the extent to which the amino acid sequences of two proteins have identical or functionally similar amino acids at equivalent positions, usually expressed as a percentage.
This mode of MetaFlux determines the correct fluxes of reactions given a set of nutrients, secretions, and biomass metabolites to produce a feasible FBA Model.
Cells in an animal other than those that constitute the germ line.
Somatic Cell Hybrid
A type of mapping experiment permitting the assignment of markers to chromosomes. The method consists of fusing cultured cells of one species with cultured cells of a different species. The hybrid cells are unstable in karyotype during growth, with most chromosomes from one species typically being lost. Among clonal populations of hybrid cells following growth, different chromosomes are retained from one species. A panel of hybrid cell cultures can be assayed for which mouse chromosomes (for example) are retained, and simultaneously assayed for the presence of particular markers. The correlation of the presence of a particular marker across the panel with the presence of a particular mouse chromosome allows that marker to be assigned to that chromosome. See also Radiation Hybrid Mapping.
An assay that detects specific DNA molecules using a DNA or RNAprobe with sequence similarity. Samples are subjected to electrophoresis on a slab gel. A replica of the gel is then made on a membrane by capillary transfer following denaturation. Specific DNA sequences are then detected on the membrane with a radioactively- or chemically-labeled probe. See the Figure from Alberts, et al., Molecular Biology of the Cell. See also:
As a type of mutation, one that has occurred in the absence of any experimental mutagenic treatment, such as irradiation or treatment with chemical mutagens.
Structured Query Language. SQL is used to communicate with a database. According to ANSI (American National Standards Institute), it is the standard language for relational database management systems.
SQL statements perform tasks such as updating data in or retrieving data from a
database. Some common relational database management systems that use SQL are:
Oracle, Sybase, Microsoft SQL Server, Access, Ingres, etc. Although most
database systems use SQL, most of them also have their own additional
proprietary extensions that are usually only used on their system.
Simple Sequence Length Polymorphism, a type of polymorphism that results from variation in the length of an SSR.
Simple Sequence Repeat, a DNA sequence consisting largely of a tandem repeat of a specific k-mer (such as (CA)15). Many SSRs are polymorphic and have been widely used in genetic mapping.
Strain is a low-level taxonomic rank used in three related ways. In Microbiology, a strain is a genetic variant or subtype of a microorganism
(e.g. virus or bacterium or fungus). In plants, a strain is a designated group of offspring that have descended from a modified plant,
produced either by conventional breeding or by biotechnological means or result from genetic mutation. In rodents, a strain is a group of
animals that is genetically uniform.
A protein that functions as a structural element of cells rather than as an enzyme, for example, collagen.
Structured data are data that have been represented in a manner that allows computation
with those data. Data become structured when they are carefully dissected and
assigned to distinct fields of a database with clearly defined meanings,
so that the data are independently queryable and computable. Therefore, we can
ask questions across the data such as "find all enzymes that use magnesium as a
cofactor" or "find all pathways in which pyruvate is an input substrate". See also Unstructured Data.
Structured Query Language
Structured Query Language (SQL) is used to communicate with a database. According to ANSI (American National Standards Institute), it is the standard language for relational database management systems. SQL statements perform tasks such as updating data in or retrieving data from a database. Some common relational database management systems that use SQL are: Oracle, Sybase, Microsoft SQL Server, Access, Ingres, etc. Although most database systems use SQL, most of them also have their own additional proprietary extensions that are usually only used on their system. The Query Forms at MGI extract information from databases by generating instructions in SQL.
Sequence Tagged Site. A short segment of unique sequence derived from genomicDNA. A large collection of STSs can be used to assemble a physical map of the genome from a collection of genomic clones (e.g., BACs or YACs) by testing each clone for the presence of each STS. Two clones that contain one or more STSs in common must overlap. For examples, see the physical maps of the mouse genome at MGI.
Superpathways are a class of PGDB metabolic pathways that are constructed by combining and connecting individual pathways (which can be shown separately) to depict relationships between them. In some cases those individual pathways start from a common precursor, or produce a common product, but they can have other relationships as well. Superpathways can have individual reactions as their components in addition to other pathways. Superpathways can be defined recursively, that is, the component pathway of a Superpathway can be a base pathway or can itself be a superpathway. Most superpathways will have an additional parent class within the pathway ontology to define their biological role.
A curated protein sequence database. See the SWISS-PROT site for more details.
A synonym is one of several names that are, or have been used, in the scientific
literature or in public databases to refer to one object. For example,
2-phosphoglyceric acid, 2-PGA, and glycerate 2-phosphate are all synonyms of the
compound 2-phosphoglycerate. In BioCyc, one of the synonyms is designated as the
The state of being on the same chromosome. A gene is also said to be syntenic to a particular chromosome if it is known to be located on that chromosome but is otherwise unmapped. See also Conserved Synteny.
The data dictionary of a DBMS. The system catalog stores metadata including the schemas of the databases. It is a mini-database, and is usually stored using the DBMS itself in special tables called system tables. It maybe referred to as being "on line", as it is active, and users can query it like any other table.
A text file that uses tabs to separate adjacent fields. It is a common format for downloading information into a spreadsheet.
Refers to data arranged in rows and columns. A spreadsheet, for example, is a table. In relational databases, all information is stored in the form of tables.
A type of mutation in which a chromosomal gene is altered by the substitution of a DNA construct assembled in vitro.
The constructs are usually designed to eliminate gene function; such targeted mutations are often casually referred to as knock outs. Some DNA constructs are designed to alter gene function; such targeted mutations are often casually referred to as knock ins.
A specialized structure at the ends of linear chromosomes in eukaryotes. Telomeres confer stability on chromosome ends. Chromosome ends lacking telomeres, such as those generated from interstitial sites by chromosome breaks, are reactive, often fusing with other broken ends to generate chromosome rearrangements. Telomeres also permit the ends of linear chromosomes to replicate fully. See the Figure at NHGRI.
A DNA sequence that signals the end of transcription.
A type of cross in which individuals whose genotype
with respect to one or more genes is unknown are crossed to a test strain
homozygous for a recessiveallele at the genes under study. For example, a cross of an individual that
was A/A or A/a (identical in phenotype) to a/a would reveal the genotype of the
individual being tested, because if the individual being tested were A/A, all of the progeny would show the
dominant phenotype, while if the individual being tested were A/a, half of the
progeny would show the dominant phenotype and half would show the
The location at the 5′ end of a gene, adjacent to the promoter, at which the RNA polymerase complex binds to the DNA and initiates the process of transcription of that gene into mRNA. The precise context of the TSS depends on the gene, its host organism, the type of polymerase involved, and other factors.
A transcription unit is a sequence of nucleotides in DNA that codes for a single RNA molecule, along with the sequences necessary for its transcription; normally a transcription unit contains a promoter, an RNA-coding sequence, and a terminator. Similar to operons, however, operons containing multiple promoters and/or terminators correspond to multiple transcription units.
Any DNA sequence or combination of sequences that has been introduced via a construct into the germ line of the animal by random integration.
A mouse that contains a stably inherited DNA which has been inserted randomly into the genome.
The inserted gene sequence (the transgene) may or may not be derived from mouse sequence.
A type of mutation in which two nonhomologous chromosomes are each broken and then repaired in such a way that:
the resulting chromosomes each contain material from the other chromosome (areciprocal translocation), see the Figure at NHGRI)
one of the chromosomes contains an insertion of material from the other chromosome, with the other chromosome containing a deletion (an insertional translocation, see the Figure at NHGRI) or
the two chromosomes, each with breaks near the centromere, fuse to form a single chromosome with a single centromere (a Robertsonian translocation).
This Pathway Tools ontology class defines reactions in which at least one species is transported (passively or actively) across a membrane. The species may or may not be chemically modified in the course of the reaction. A transport reaction is assumed to occur physiologically in the direction written; if it proceeds in the reverse direction, this fact should be indicated in the enzymatic-reaction for a given transporter.
A type of mobile genetic element consisting of DNA that moves to new genomic locations conservatively (without replicating itself) or replicatively (moving a copy of itself).
A type of point mutation in which a purine is substituted for a pyrimidine or a pyrimidine for a purine. These substitutions include C or T for A, C or T for G, A or G for C, and A or G for T. See also Transition.
A protein sequence database that contains all the translations of EMBL nucleotide sequences. See the TrEMBL site for more details.
The condition of having three chromosomes of a particular type. Down Syndrome in humans is a trisomy for chromosome 21. See also Monosomy.
A try-set is a mechanism used within the Pathway Tools MetaFlux module to allow MetaFlux to explore potential modifications to an FBA model. A try-set defines a set of reactions or metabolites that can be added to a base model that is considered incomplete. Try-sets can be specified for reactions, nutrients, secretions, and biomass metabolites.
A type of database link that links an object in one database to an object
in another database that represents the same biological object.
An experimental system for automatically partitioning GenBank sequences into a non-redundant set of gene-oriented clusters. Each UniGene cluster contains sequences that represent a unique gene, as well as related information such as the tissue types in which the gene has been expressed and map location. See the UniGene page at NCBI.
The inheritance, in a diploid organism, of both copies of a single chromosome from one parent. This may result from the union of a gamete bearing two copies of one chromosome with a gamete bearing no copy of that chromosome, or from the union of a gamete bearing two copies of one chromosome with a normal gamete, followed by the loss of one chromosome through an error in mitosis. Because of imprinting, uniparental disomy can have phenotypic consequences in mammals. See, for example, Prader-Willi Syndrome.
Data which is not structured, such as the free-text comments within a database. Such
comments are not structured because the computer cannot
compute with the data. Computers cannot read text, therefore they cannot
extract individual data elements from large text blocks such that the meanings of those data elements
are reliably known. See also Structured Data.
Uniform Resource Locator. An Internet address giving the protocol to be used for obtaining resources on the Internet such as "ftp:" for an FTP site or "http:" for a World Wide Web page. It also includes the server name and sometimes the path to the resource. The URL for BioCyc is "http://www.biocyc.org" target="_blank".
Vertebrate Genome Annotation. The VEGA database is a central repository for high quality, frequently updated, manual annotation of vertebrate finished genome sequence. VEGA developed within Ensembl as a joint project between EMBL-EBI and the Sanger Institute.
A noncellular biological entity that requires a host cell for reproduction. Viruses consist of a nucleic acidgenome that is either DNA or, in the case of retroviruses, RNA. The viral genome is covered with a protein coat; some viruses have a host-derived membrane over the protein coat.
A suite of programs and databases for comparative analysis of genomic sequences. Users can either submit sequences and alignments for analysis or examine precomputed whole-genome alignments of different species. See http://genome.lbl.gov/vista/index.shtml.
An assay that detects specific proteins within a protein mixture. Samples are subjected to electrophoresis on a slab gel. A replica of the gel is then made on a membrane by electrophoretic transfer. Specific proteins are then detected on the membrane using antibody staining. See also:
The sequencing of the entire genome of an organism through the sequencing of randomly-derived subsegments whose order and orientation is unknown until the assembly of overlapping sequences is performed computationally. The method works if all positions in the genome are covered by multiple overlapping subsegments. See also:
The phenotype with respect to a given inherited characteristic that is considered to be the "normal" type commonly found in natural populations.
The allele of a particular gene that confers the phenotype considered to be the "normal" type commonly found in natural populations. N.B.: Because some DNA sequence polymorphisms do not produce different phenotypes, there can be multiple "wild-type" alleles of a gene.
Wild Type Allele
One of many possible versions of a gene that functions normally, as opposed to versions of a gene that are functionally abnormal (i.e., mutantalleles).
With respect to gene nomenclature, a withdrawn symbolname was once the approved symbol or name for a marker; there is currently a different approved symbol or name for that marker.
One of pair of chromosomes that is sexually dimorphic in mammals. Normal female mammals have two X chromosomes, while normal male mammals have an X chromosome and a Y chromosome.
The condensation of all but one of the X chromosomes of a mammal into a heterochromatic state, eliminating gene expression from all but the active X chromosome. This process ensures that male and female mammals have the same level of gene activity of X-chromosome genes.
Yeast Artificial Chromosome.
One of pair of chromosomes that is sexually dimorphic in mammals. Normal female mammals have two X chromosomes, while normal male mammals have an X chromosome and a Y chromosome.