We have seen how a human cell manages to get all 6 feet of its genome into its microscopic nucleus. But why is your genome six feet long?
The fact is, only 1.5% of your genome consists of protein-coding genes. Another 3.5% of your genome is made up of sequences that affect how active a gene is. These sequences are called regulatory regions.
50% of your genome consists of repeats - short, repetitive strings of nucleotides that do not code for protein. Some repetitive sequences have expanded due to replication errors, while others are what remains from ancient viruses or transposable elements that were inserted into our ancestors' genomes.
The DNA in your genome is not all equal. So how does your cell know what codes for protein and what does not? Why don't your genes get lost in the mix of sequence, like needles in a DNA haystack?
The answer lies in the proteins that are binding your genome - the chromatin! One way to organize the genome is to alter the chromatin structure. Cells can coil up and compact some DNA sequences, like repetitive elements, so they cannot be transcribed. Alternatively, the cell may keep the genes that need to be active in a decompacted state, referred to as an open chromatin structure. We call the compacted, inaccessible chromatin heterochromatin, and we call the open chromatin that contains most active genes euchromatin.
Flies organize their chromatin into large domains. Much of the noncoding DNA, including repetitive sequences and ancient viruses, is in the heterochromatin domains, which are broad regions of the genome near the centromeres on most fly chromosomes.
The four chromosomes of Drosophila melanogaster are pictured above. The "C" marks the centromere. The "L" and "R" delineate the left and right arms of the chromosome, which are on either side of the centromere. The regions near the centromeres are mostly heterochromatin (grey), and the euchromatin where most active genes are found tends to be near the ends of the arms. In flies the fourth chromosome (as well as the Y chromosome in males) is almost ENTIRELY heterochromatin.
There are many differences between euchromatin and heterochromatin. Euchromatin consists of the more gene-dense regions of the genome. In addition, euchromatin and heterochromatin have different proteins that bind to the DNA. Some histone modifications are more often associated with euchromatic sequences, and others are more common at heterochromatic parts of the genome. At active regions of the genome, an acetyl group will often be added to histones that bind that sequence. At heterochromatin, different types of chemical modifications are made to the histones -- for example, methylation of the lysine at position 9 in histone H3.
In the very bottom of the figure at left is a nucleosome – you can see that DNA is coiled around a core made of histone proteins. There are four different histones, each of which has a tail that can interact with the DNA.
You may remember from the previous page that these tails can be modified. Sometimes they are acetylated. This means an acetyl group is attached to an amino acid (usually a lysine) on the histone tail. This chemical modification affects the charge on the tail. Normally histone tails are very positively charged, but adding an acetyl group can reduce that. The tail binds less tightly to the DNA when it is acetylated, and the chromatin is more open.
The black discs on the figure at left represent acetyl groups – you can see that histones are decorated with acetyl groups more often in the open chromatin. The numbers on the discs at the bottom of the figure display the different amino acids that can be acetylated (for example, the 9th amino acid of histone H3, or the 16th amino acid of histone H4).
The chart on the right shows all of the different possible modifications that can be made to the tails of histones H2A, H2B, H3, and H4. There are many different types of modifications, but the ones that are best understood right now are acetylation (ac) and methylation (me).
When writing about histone modifications scientists use a specific nomenclature. For example: histone 3, di-methylated on lysine 9 is written as H3K9me2.
The first part (H3) refers to the histone that is modified. The second part (K9) tells you which amino acid is modified - in this case, it is the 9th amino acid of the tail, a lysine. Most modifications occur at lysines, which typically are positively charged. Lysine is normally abbreviated with a K (a chart with abbreviations for all the amino acids is available here). The last part of the nomenclature tells you which chemical group has been attached to the tail, and how many of them. In this case, two methyl groups have been attached(me2). ac = acetylation; me = methylation; ph = phosphorylation; ub1 = ubiquitination
The modENCODE consortium characterized the chromatin over the entire genome to determine what types of chromatin were associated with different sequences. What histone modifications and chromatin-binding proteins are associated with repetitive DNA? How does chromatin change at any specific gene as the fly develops from embryo to adult, and is the gene expressed differently at different stages of development? Are the histone modifications over genes different than those near promoters, the sites where RNA polymerase begins to transcribe a gene?
The modENCODE consortium used techniques called ChIP-chip and ChIP-seq to answer these questions. This means they used antibodies to bind and pull down the protein they were interested in, bringing along the DNA bound to that protein. The microarrays and sequencing technology identified specific genes and DNA sequences bound by that protein (see Genomics page). This allowed the consortium to create "maps", which showed where and how frequently a protein was binding over the whole genome.
The picture at left shows the maps for two different histone modifications. One is for H3K9me2 (red). The other is for H3K4me2 (green).
Imagine that the genome sequence is laid out horizontally across the X axis. Different genes are encoded within the sequence, on both strands of the DNA - sometimes two different genes will overlap! The different genes are represented by the black bars near the bottom of the picture. This picture covers over 3 million basepairs of DNA sequence, so there are a lot of genes in this region
The height of the peaks represents how likely it is that you will find the mapped histone modification (or protein) at that specific DNA sequence. Notice that there are a lot of histone proteins with the H3K4me2mark spread over the gene-dense euchromatin. In contrast, the heterochromatin contains many fewer genes and the histones are most often decorated with the H3K9me2 modification.
The modENCODE consortium created a map for the whole Drosophila genome, of almost EVERY histone modification that has been characterized to date (about 25 so far).
They also created maps for proteins which bind to histones, recognizing the different marks and functioning to close or open the chromatin, as well as other DNA-binding proteins of interest, like transcription factors, and RNA Polymerase II -- about 50 chromosomal proteins and more transcription factors.
From this work, modENCODE has discovered that certain histone modifications and DNA-binding proteins are found at the same sequences, and seem to work together. Certain histone modifications are found at the same regions in the genome, while others are not. In fact, the combination of histone modifications and proteins that are found binding a given sequence can tell you a lot about the function of that sequence. For example, promoter regions of active genes have different types of histone modifications binding them than the exons or introns of genes.
The modENCODE consortium realized that certain histone modifications were usually found together at the same sequences, and that these sequences had many other characteristics in common. They realized that the genome could be subdivided into different chromatin states. Using statistical methods, the modENCODE consortium calculated which histone modifications and proteins were likely to be found together, identified associated proteins, and defined nine different states of Drosophila chromatin.
The picture on the left shows the promoter of gene Xe.7 enclosed in a red box. The maps that modENCODE created with ChIP-chip tell us that this sequence has:
- low levels of H3K27ac (teal)
- high H3K4me3 (red)
- medium H3K4me2 and H3K9ac (rose and green)
The arrows point to a box, which has been colored to indicate how strongly that modified histone is associated with the Xe.7 promoter. Red boxes indicate high levels of enrichment of the protein, and purple indicate low levels of enrichment. Thus, if any modification or protein is shown with a purple box above the name, that means this histone modification or protein was very UNLIKELY to be found at promoters, compared to the rest of the genome. For example, towards the right side of the figure we can see that the linker histone H1 did not bind in this region. In fact, H1 (which binds to DNA between the beads) does not usually bind to promoter regions.
The chart on the right displays all of the 9 different states of Drosophila chromatin, including state 1 from above. Listed across the top are the notations representing the different histone modifications that were mapped on the genome and used to create this model. Each row shows the histone modifications present in one of the 9 different states which these marks helped characterize. (Note that the number of states is somewhat arbitrary; one can make a more detailed map with more states, but this map is sufficient to define the major domains.) The states show the following associations: 1 (red) transcription start sites; 2 (dark pink) transcript elongation; 3 and 4 (brown and light pink) regulatory regions; 5 (green) very active male X chromosome; 6 (dark grey) Polycomb; 7 (dark blue) centromeric heterochromatin; 8 (light blue) other heterochromatin; 9 (light grey) other regions.
The box for H3K9me2 and H3K9me3 are bright red in states 7 and 8 only. The sequences in these states are mostly heterochromatic sequences. The box for acetylation on the 9th lysine of H3 (H3K9ac) is purple in state 7 - that means that genome sequences in state 7 are very unlikely to be bound by nucleosomes with this modification. Since acetylation is associated with ACTIVE genes, it makes sense that this mark is not often found in the heterochromatin.
Almost all Drosophila chromatin over the whole genome can be classified as in one of the 9 different states. Of course, this varies across different cell types and different stages of development. A gene which is silent in early development may be in state 6 at first - however, when it becomes active, the chromatin state of that gene may change to states 1, 2, and 3!
The modENCODE consortium has mapped the different states onto the four chromosomes of Drosophila. Again, this map changes from cell type to cell type, and in different stages of development - but in one type of cell, it is an accurate depiction of what types of chromatin are found at different regions of the genome. For a reminder of which state is represented by the different colors, see the above picture.
In fact, the modENCODE consortium has created tools so that you can see which of the nine chromatin states is present at any region of the genome you want!
One way to use the modENCODE data is through an online database called Flybase. Flybase is a very useful resource for studies using the fly - from that site, you can access all that is known about any Drosophila gene, including what modENCODE has learned through its experiments. A helpful video on YouTube, which shows you how to use Flybase, is available here.
A quick tutorial on how to find the "chromatin state" of your favorite gene is pictured below!
First, go to the Flybase page for the gene you are interested in. For this example, we will go to page for the Drosophila homolog of the gene related to Fragile X Syndrome. To find this gene's page, you can search for Fmr1 on the Flybase homepage, or just click here.
Click on the button titled "GBrowse 2" (see green arrow on the image above), which is to the left of the Gbrowse picture that is displayed on every Flybase gene page.
Now you should be at the Flybase Gbrowse, which can be used to view different annotated elements of the whole Drosophila genome. You should start zoomed in on the gene you are interested in, but you can adjust to zoom in and out as much as you want.
Now you should be at the modENCODE Gbrowse, which can be used to view different annotated elements of the wholeDrosophila genome. You should start zoomed in on the gene you are interested in, but you can adjust to zoom in and out as much as you want.
You need to specify what data you want displayed on Gbrowse. To do this, click on "Select Tracks."
There is a lot of different data and annotated genome elements that you can view on this Gbrowse. Since we are interested in the different chromatin states, open "Chromatin Structure" and then "Chromatin States" from the submenu.
Select the cell type that you want to view the chromatin states for. There is an alternate model, which has 30 very specific types of chromatin states. The 9-state model is simpler and still provides enough information for most research purposes.
The 9-state model should now appear as a separate track in your Gbrowse window. You can see here that the promoter and 5' UTR region of Fmr1 is in chromatin state 1 (dark red) in this cell type, while the body of the gene is in chromatin states 2 and 3 (pink and brown, respectively).
Which chromatin state is your favorite gene in? Is it different in a different cell type?
- Do you think transcription factors typically bind more frequently in the heterochromatin, or the euchromatin? Justify your answer.
- Your chromatin structure can change over development. Do you think there are any stages where your entire genome is heterochromatic? Justify your answer.
- Your friend believes his chromatin structure was set when he was a fertilized egg, and it has been the same in every cell ever since. Is he right? If chromatin structure never changed, what might that mean for the genes expressed in all our cells?
- 4. Expert Bonus: Your friend is now trying to tell you that there is not a SINGLE active gene in the heterochromatin domains of Drosophila. Is there any kind of evidence you could search the modENCODE chromatin state data set for to indicate he might be wrong?