modENCODE

  • Home
  • Drosophila
  • C. elegans
  • Transcription
  • Chromatin
  • Genomics
  • Bioinformatics
  • Glossary

Navigation

[Teacher's Guide]
[Site Authors/Contributors]

Model Organisms and Modern BiologyAn educational portal from the modENCODE Project

The Modules

Module 1: Model Organisms in Biology

Why we study flies and worms to learn about humans -- and what we've learned so far. [More info]

Drosophila >>

The lowly fruit fly -- easy to grow and maintain, with giant chromosomes and some surprising genetic similarities to humans -- has been at the core of biological research for more than 100 years.

C. elegans >>

Adult C. elegans worms have precisely 959 cells, move through a rapid life span, are transparent at every stage, and share many similarities with humans -- which has made them ideal for studies of cell death, development, and aging.

Module 2: Model Organisms and Gene Expression

How studies of the fly and the worm, and a large scientific project, have shed light on the ways genes drive biological processes. [More info]

Transcription Regulation >>

How do genes "know" when they're needed -- when to turn on and when to turn off? Part of the answer lies in a class of proteins called transcription factors.

Chromatin >>

To fit into the nucleus, DNA must be packaged up with proteins called histones into chromatin, and that packaging creates different patterns of modifications of the genome.

Module 3: The Role of Computers and Technology

Increasingly, biology is not just about doing benchtop experiments, as scientist rely on computers and cutting-edge techniques to generate and make sense of massive data sets. [More info]

Genomics >>

Sequencing a genome used to take years of painstaking work. New "parallel-processing" and "high-throughput" techniques have slashed that time to months or even days -- and dramatically expanded the horizons of molecular biology.

Bioinformatics >>

The volumes of data churned out by modern sequencing techniques require sophisticated computer analysis to find patterns in the data. That's spawned a new field combining biology and computer science -- bioinformatics.

[Image credits]

Welcome to a look at how some rather common invertebrate organisms can lead to some very uncommon insights into human biology, health, and disease.

By drilling down into the three modules on this Web site, you will:

  • Explore the significance of two classic model organisms, the fruit fly (Drosophila melanogaster) and the roundworm (Caenorhabditis elegans), in studies of human biology and conditions ranging from alcoholism to aging.
  • Investigate how genes are activated, transcribed, and translated into proteins that do the biological heavy lifting in cells and systems -- and how the six feet of DNA in each of your cells is packed to fit into a microscopic nucleus.
  • Find out how new technologies and the use of computers to crunch massive amounts of data are creating new frontiers and growth areas in the biological sciences.

The site examines these topics through the lens of a highly successful recent example of international scientific collaboration -- the Model Organism Encyclopedia of DNA Elements (modENCODE) project.

About modENCODE

The instructions to create and maintain a living organism are recorded by the sequence of bases in its DNA. In recent years scientists have decoded the sequences of the genomes (all the DNA within each cell) of many organisms, including humans. But what are the instructions encoded in the sequence of a genome and how do these thousands of genetic instructions work together to create a multicellular organism? Our understanding of the language of DNA is only rudimentary, and we can only partly read its instructions. What are our genes and how do they result in cells as different as a muscle cell and a nerve cell? The answer lies in differences in gene expression -- the amazingly intricate work of turning genes on and off and regulating their action within cells.

Understanding the instructions in a genome and how they are used to make an organism is the core purpose of the ENCODE (Encyclopedia of DNA Elements) and modENCODE (model organism ENCODE) projects, which aim to define all of the functional elements in the human and the worm and fly genomes. This includes precisely defining the genes that code for proteins and non-coding RNAs, and identifying functionally important elements that direct gene expression, DNA replication, and chromosome inheritance. The two model organisms, the fruit fly Drosophila melanogaster and the nematode worm Caenorhabditis elegans, that are the focus of modENCODE have been central to understanding the biology of multicellular organisms.

The use of worms and flies in modENCODE has allowed the study of tens of thousands of individual genes in the living organisms, as well as cells in tissue culture. The ability to manipulate these animals experimentally allows any conclusions to be tested directly. The project has relied on some cutting-edge lab methods to examine those many genes in many different stages of the life cycle and has exploited powerful computing techniques to make sense of it all. That makes this project a great place to start looking at how and why scientists use simple organisms to study the intricate world of gene expression, how technology is shaping and reshaping biology as a science (and as a career), and how this work ultimately informs the study of complex human traits and diseases.

Let's Get Started!

Within each section of this site, you will find case studies, laboratory results, videos, exercises, and assessment questions. Work through the two sections in each of the three modules, completing the assessments before moving on to the next section. (Your teacher will let you know whether these will be graded assignments or self-assessment exercises.) The modules can be completed in any order, but we strongly suggest that you move through them in the order presented. You'll find in-line pop-ups to give further information on unfamiliar terms, as well as a comprehensive glossary you can consult at any point.

So let's dive in, and start by finding out what fruit flies and roundworms can tell us about ourselves . . .

Contributors

Following are some of the people who contributed to this Web site.

  • Sarah C R Elgin, Washington University in St. Louis, MO
  • Cheryl Bailey, Univeristy of Nebraska - Lincoln and Howard Hughes Medical Institute
  • James Bedard, University of the Fraser Valley, BC
  • Martin Burg, Grand Valley State - Michigan
  • Jennifer Roecklein-Canfield, Simmons College, MA
  • Christopher Jones, Moravian College, PA
  • Nighat Kokan, Cardinal Stritch University, WI
  • Susan Parrish, McDaniel College, MD
  • Cathy Silver Key, North Carolina Central University
  • Philip Meneely, Haverford, PA
  • Bob Waterston, University of Washington
  • Michelle Smith, University of Maine
  • Valerie Reinke, Yale, CT
  • Roger Alexander, Yale, CT
  • Gidi Shemer, University of North Carolina
  • Kelly Hogan, University of North Carolina
  • Peter Park, Harvard, Boston, MA
  • Ivy McDaniel, UC Berkeley
  • Sue Celniker, UC Berkeley
  • Ben Brown, UC Berkeley
  • Rebecca Lowdon, NIH/ MD
  • Trupti Kawil, Stanford
  • Barry Starr, Stanford
  • Caroline Kelly, NHGRI
  • Laura Zahn, AAAS/Science, San Diego, CA
  • Melissa McCarthy, AAAS/Science, Washington, DC

More Info Terms

Model Organisms

Model organisms are animals (or plants, or fungi, or bacteria) that scientists have found convenient to use in the lab. Animals that are:

  • easy and inexpensive to keep,
  • eat cheap food,
  • are small and hardy,
  • have well-characterized mutations (for example, mutations that lead to certain disease symptoms) for genetic studies, and
  • breed easily, with short generation times
make good model organisms. A few organisms meeting these criteria are now used by many scientists.

In this module, we explore two model organisms that were used for modENCODE studies -- the fruit fly Drosophila melanogaster and the worm Caenorhabditis elegans. Both organisms have contributed to fundamental discoveries on how animals develop from a single cell, how that cell knows which genes it should express, how the different cells in the body communicate with each other, and how the animal behaves in different situations. This knowledge has been vital to improving our understanding of human biology in health and disease.

Transcription, Chromatin, and Gene Expression

All of the instructions for a living organism are recorded in its DNA. All cells need some of that information -- for example, information on how to maintain an energy balance, how to generate RNA and proteins, how to replicate when appropriate, and other "housekeeping" functions. But many genes provide information that's needed by only a subset of cell types or at specific stages. This information needs to be made available only to the right cells at the right time. Genes are transcribed (expressed) by a protein called a RNA polymerase to produce an RNA molecule. One mechanism for regulating which genes are transcribed in different cell types is via other proteins, called transcription factors. These proteins bind near a gene and activate the RNA polymerase to transcribe RNA copies of the gene. If the gene codes for a protein, this transcript will be processed to produce mRNA, which in turn is translated into protein. We explore the different steps in this process, and how recent work with the fly and worm has shed light on it, in this module.

Our genomes are very large -- and all of the DNA within a cell must be packaged into chromatin to fit inside a very small nucleus. This packaging can also function in regulating gene expression, and plays a key role in determining which DNA sequences are accessible to the transcription apparatus. The same DNA can be packaged in different ways in different cell types, and the form of packaging can be passed on. The "memory" of the distinctive chromatin packaging, which is superimposed on DNA, is one form of what is termed "epigenetics." Chromatin packing can be altered in response to environmental factors, such as diet; as a consequence, epigenetics is sometimes described as "where the environment meets the genome"! You'll learn more about this kind of gene expression regulation in this module.

Genomics and Bioinformatics

When scienctists first began to study our genes at a molecular level, we were limited to looking at one gene at a time. But humans have ~22,000 genes, and changes in the expression of one gene (for example, by a mutation), or a change in the environment of a cell, can cause changes in the expression of thousands of genes! Thus, we need to look at what is happening to all of the genes at the same time. To do this, scientists have developed "high-throughput" and "next-generation" genomic technologies that allow analysis of most or all of the genome at the same time. To deal with the multitude of genes and their complex interactions and the volumes of data required to track them, a whole new discipline of computational biology, or bioinformatics, has developed and grown dramatically in recent years. This module looks at the modENCODE project as an example of the impact of new technologies and the computational methods that have been developed.

Image Credits

Fly: André Karwath/Wikimedia Commons

Worm: WormClassroom

Transcription: European Bioinformatics Institute/Wikimedia Commons

Chromatin: P. J. Horn and C. L. Peterson, Science 297, 1824-1827 (2002)

Genomics: modENCODE Consortium et al., Science 330, 1787-1797 (2010)

Bioinformatics: Fort Collins Museum & Discovery Science Center

Glossary Terms

TEST Monocytes

A cell of the immune system, which circulates through the blood, bone marrow, and spleen, and plays an important role in antimicrobial defense.
More on Wikipedia

Agar

A gelatinous substance used to grow, or culture, microorganisms. Agar provides the microorganisms a surface to grow on and contains nutrients.
More on Wikipedia

Antibody

A protein in the immune system that recognizes a specific target, or antigen, on a foreign molecule, allowing these two structures to bind together with precision.
More on Wikipedia

Autosome

A chromosome for which there is an equal number of copies in males and females -- that is, a chromosome that is not a sex chromosome.

Activator

A protein which, by binding to an operator sequence, promotes transcription of a gene.

Base Pairing

The "genetic code" in a DNA molecule is written in four bases -- adenine (A), cytosine (C), guanine (G) and thymine (T) -- that are arrayed along each strand of the twisted, two-stranded molecule (the famous "double helix"). Each base is chemically tuned to pair, via hydrogen bonds, with a corresponding base on the opposite strand -- A with T, G with C. The size of an organism's genome is usually given by the number of base pairs.
More on Wikipedia

ChIP-Seq

A technique for analyzing interactions of proteins with DNA, consisting of chromosomal immunoprecipitation, followed by massively parallel DNA sequencing.
More on Wikipedia

Chromatin

The combination of DNA and proteins found in the nucleus of a cell, which makes up chromosomes. Chromatin helps fold DNA so it will fit into the cell and is involved in both gene expression and DNA replication.
More on Wikipedia

Chromosome

A structure of coiled DNA and proteins that organizes the genetic material in the cell's nucleus.
More on Wikipedia

Differentiation

The scientific term for cells becoming more specialized throughout development.
More on Wikipedia

DNA

DNA encodes the genetic blueprint of an organsim. This genetic material is composed of deoxyribonucleotides -- individual units combining a sugar, a base, and a phosphate group -- that each have different chemical properties, and are referred to by the different base names adenine (A), cytosine (C), guanine (G) and thymine (T). Combinations of these bases "spell out" the code of a given gene.

DNA-Binding Protein

Proteins with a specific or general affinity for DNA.
More on Wikipedia

DNA Sequencing

The process of determining the specific order of nucleotide bases in a DNA molecule.

Enzymes

Biological molecules (mainly proteins) that catalyze, or increase the rate of, a chemical reaction

Epigenetics

The study of changes in gene expression resulting from mechanisms other than changes in the underlying DNA sequence.
More on Wikipedia

Eukaryotes

The class of organisms, composed of one or more cells, containing a membrane-enclosed nucleus and packaging its DNA with histones in a nucleosome array. Eukaryotic cells typically have complex organelles, such as mitochondria.
More on Wikipedia

Exon/Intron

An exon is a contiguous segment of a gene found both in the initial transcript and in the final product; the introns are those segments found in the initial transcript which are removed during processing, and so are not found in the finished product.

Gene

In molecular biology, a gene is the molecular unit of inheritance for a single function or phenotype -- or, more precisely, the full sequence of bases within a section of the genome that is necessary and sufficient for the synthesis of a functional product. Usually that product is a polypeptide (a section of a protein), but in some cases it is an RNA molecule.

Genome

The full complement of genetic information recorded in the chromosomal DNA (or, for some organisms, RNA).
More on Wikipedia

Genome Annotation

Within a genome sequencing project, annotation is the process of identifying biologically relevant elements within the genome sequence (e.g., genes), and adding information to the sequence on how those elements function.
More on Wikipedia

Genotype

The specific genetic encoding, or allele of a gene, that leads to an observable characteristic in an organism.
More on Wikipedia

Green Fluorescent Protein (GFP)

A protein first isolated from the jellyfish Aequorea victoria that exhibits bright green fluorescence when exposed to ultraviolet blue light. The GFP gene can be introduced into organisms and used by scientists to "see" gene expression.
More on Wikipedia

Hermaphrodite

An organism containing both male and female reproductive organs within the same individual.
More on Wikipedia

Histones

The small, basic proteins used to package the DNA in chromatin. The core histones (H2A, H2B, H3, and H4) are highly conserved over evolution, while histone H1 is more variable.
More on Wikipedia

Histone Code Hypothesis

The hypothesis that combinations of chemical modifications to histone proteins in the chromatin form a complex, separate mechanism for regulating transcription and, thus, gene expression.
More on Wikipedia

Histone Deacetylase

An enzyme that removes acetyl groups from the ends of histone proteins.
More on Wikipedia

Homolog

A gene found in an organism that shares an ancestral sequence with that of another organism. Homologs are often identified based on the retention of shared genetic or protein-level identities between two different species that share a common evolutionary history.

High-Occupancy Target (HOT) Region

Genomic regions where 15 or more independent transcription factors bind.

Immunoprecipitation

The process of isolating and concentrating a specific protein of interest by trapping an antibody that binds to that protein, using any of a number of lab techniques.
More on Wikipedia

Instar

The intermediate developmental stages that an insect (such as Drosophila) undergoes between molts until it reaches sexual maturity.

Metabolome

The full complement of metabolites -- small molecules produced by cellular or organismal metabolism -- that characterize a cell, cell population, tissue, or organism.
More on Wikipedia

Micro-RNA (miRNA)

Post-transcriptional regulators that bind to complementary sequences on target messenger RNAs, usually leading to gene silencing.
More on Wikipedia

Model Organism

A non-human species used in experimental biology to study biological processes that might illuminate workings of the same processes in other organisms, for which the same experiments might be infeasible or unethical.
More on Wikipedia

Molting

Molting, or ecdysis, is the periodic shedding of the outer skeleton, or exoskeleton, that accompanies the growth of most arthropods, including insects.
More on Wikipedia

Morphology

The form and structure of an organism.
More on Wikipedia

Next-Generation Sequencing (NGS)

A family of techniques for DNA sequencing that rely on massively parallel processing of many millions of DNA fragments, followed by analysis and re-assembly of those fragments using computer techniques.
More on Wikipedia

Nanometer

A unit of length equal to one billionth of a meter.

Nucleosome

The basic unit of DNA packaging, consisting of DNA wound around histones.
More on Wikipedia and in detailed glossary

Null Hypothesis

The null hypothesis generally corresponds to what we expect if nothing "interesting" is happening. If you flip a coin many times, and generally get roughly 50% heads and 50% tails, that is consistent with the null hypothesis that the coin is fair. If you flip a coin many times and get 99% heads, the coin may be unfair, and hence you may have cause to reject the null hypothesis that it is fair.

Ortholog

A gene that has similar sequence in each species in which it's found because the species have a common ancestor during evolutionary time. For example, the alcohol dehydrogenase and Malic Enzyme 1 genes are similar in both Drosophila melanogaster and Homo sapiens. Normally, orthologous genes have the same function in each species in which they are found; therefore, studying the function of a gene in a model organism can provide good evidence for the function of the orthologous gene in humans
More on Wikipedia

Petri Dish

A shallow cylindrical dish, made of glass or plastic, used to grow, or culture, cells, bacteria, and other microorganisms.
More on Wikipedia

Phenotype

The observable characteristics or traits of an organism, resulting from the interaction of the expression of the organism's genes with the influence of environmental factors.
More on Wikipedia

Pheromone

A type of nonverbal communication, usually a chemical or hormone secreted by an animal, which often influences the behavior of other members of the same species. Pheromones are used to establish territory and attract mates.
More on Wikipedia

Polytene Chromosomes

Giant chromosomes formed by some cells that undergo multiple rounds of DNA replication without actual cell division. The salivary glands of Drosophila contain examples of such chromosomes. Their size makes them especially convenient for work in the lab.
More on Wikipedia

Posttranslational Modification

Any of a variety of additional changes to a protein after translation that can modify its behavior and thus affect in gene expression.
More on Wikipedia

Progeny

A scientific term for offspring.
More on Wikipedia

Prokaryotes

The class of single-cell organisms, including the eubacteria and archaea, that lack a true membrane-limited nucleus and other organelles.
More on Wikipedia

Protein

Biological compounds made up of one or more polypeptides (a chain of amino acids) typically folded into a 3-D form. The sequence of amino acids in a protein is defined by the sequence of a gene.

Protein Purification

A variety of processes used to isolate a particular protein from a biological tissue or culture, and thereby to allow the further characterization of the protein's structure and function.
More on Wikipedia

Proteome

The full complement of proteins expressed by a genome, cell, tissue, or organism.
More on Wikipedia

Repressor

A protein which, by binding to an operator sequence, prevents transcription of a gene.

Reference Genome

A genome sequence assembled from the experimentally obtained sequences of a number of individuals in a species, designed to serve as a representative example of the "typical" gene sequence of that species.
More on Wikipedia

Regulatory Region

Segments of DNA where transcription factors bind preferentially.
More on Wikipedia

Reverse Transcriptase

An enzyme that uses RNA as a template to transcribe single-stranded DNA -- thereby reversing the more familiar information flow from DNA to RNA. In addition to its use in the lab, RT has been extensively studied in retroviruses (particularly HIV) that have an RNA genome but must produce double-stranded DNA that becomes integrated into the host cell genome as part of their replication cycle.
More on Wikipedia

Ribosome

The complex molecules that catalyze protein synthesis within the cell.
More on Wikipedia

RNA

RNA is composed of nucleotides, just like DNA -- three major differences between the two: (1) RNA contains the sugar ribose, while DNA contains the slightly different sugar deoxyribose (2) RNA has the nucleobase uracil, while DNA contains thymine; (3) unlike DNA, most RNA molecules are single-stranded.
More on Wikipedia

RNA Interference (RNAi)

The silencing or reduction of RNA expression (which generally correspondes to protein production) for a given gene in a cell or organism. It occurs as a natural process within living cells, but is also a powerful technique for studies of gene expression in the lab.
More on Wikipedia

RNA Polymerase I, II, and III

Enzymes in eukaryotic cells that manage the synthesis of a strand of RNA based on the sequence encoded in the DNA.
More in the Glossary

RNA-Seq

A high-throughput technique for sequencing an organism's "transcriptome" -- the RNA transcribed from the genome under investigation.
More on Wikipedia

Sequence Read

The sequence of a small fragment of DNA, obtained as part of a high-throughput sequencing experiment.

Sex Chromosomes

A pair of chromosomes, usually designated X or Y, in the germ cells of most animals and some plants, that combine to determine the sex and sex-linked characteristics of an individual.
More on Wikipedia

Small Interfering RNA (siRNA)

Short, 20-to-25-nucleotide, double-stranded RNA fragments that interfere with the expression of a specific gene.
More on Wikipedia

Single-nucleotide polymorphism (SNP)

A difference in a single base pair in a given gene sequence, between two or more individuals, or between an individual and a reference genome, that is associated with a difference in phenotype or expressed trait.
More on Wikipedia

Stasis

The state of equilibrium or inactivity, analogous to hibernation.

Transcription

During transcription, a DNA sequence is read by RNA polymerase and a complementary RNA copy of the DNA sequence is created.
More on Wikipedia

TATA Box

A consensus sequence, TATA(A/T)A, found about 25 base pairs upstream from the start site of a group of eukaryotic genes encoding messenger RNA -- often those that can be transcribed at a high rate. The TATA box binds the TATA box binding protein (TBP), a subunit of TFIID, initiating the process of RNA polymerase II assembly at the promoter in vitro, and plays a key role as a recognition sequence for RNA polymerase II in eukaryotic organisms.
More on Wikipedia

Transcription Factor

A protein that binds to a DNA sequence and controls (increases or decreases) the rate of transcription (the flow of genetic information from DNA to RNA).
More on Wikipedia

Transcriptome

The full complement of RNA molecules produced in a given cell or cell population.
More on Wikipedia

Transcription Preinitiation Complex

A group of proteins necessary for the start of protein transcription in eukaryotic organisms.
More on Wikipedia

Translation

The process in which RNA, produced during transcription, is decoded to produce an amino acid chain (polypeptide) that will then fold into an active protein.
More on Wikipedia

Wet Lab

Slang term for the domain of classic lab experiments handling actual and analyzing actual biological materials, as opposed to experiments and work performed using computer analysis.

Wild-Type

This is the typical or most common form, appearance, or strain of an organism that exists in the wild, as opposed to the lab. It can also refer to the normal, non-mutated form of a gene that's common in nature.

Zygote

The earliest developmental stage of the embryo, occurring when two gamete cells are joined by means of sexual reproduction.
More on Wikipedia

© Copyright 2021 GSA.