Bacterial genome size & base composition


Browsing through the currently available microbial genomes in Genbank I noticed that it is remarkably easy to make some highly illustrative graphs. E.g. plotting the G+C percentage of a complete genome against the total size of the genome reveals a remarkable trend.
In the early sixties it was first noted that “Among bacteria the mean G+C content of DNA varies approximately from 25 to 75%, and this range extends over the range of the mean G+C content of DNA of higher organisms” (Sueoka, 1962).
Currently, the mean G+C content for sequenced bacterial genomes deposited in Genbank is 48.3% and ranges from 74.9% (Anaeromyxobacter dehalogenans 2CP-C) to 16.6 % (Candidatus Carsonella ruddii PV).
The figure above shows the “linear” correlation between bacterial genome size and base composition.
An adaptive explanation for this apparent correlation could be the fact that synthesis of GTP and CTP is energetically more expensive than ATP and TTP (Rocha & Danchin, 2002). In addition, the central role of ATP in cellular metabolism results in a greater availability relative to other nucleotides. Under resource limiting conditions, this could drive microbial genomes to drift towards AT-richness.
Another possible explanation is DNA repair. Small (reduced) genomes often lack genes involved in DNA repair machineries (Moran, 2002) allowing accumulation of point mutations. Experiments have shown that the most frequent random mutation occurring in cells is C to T (or G to A), due to the deamination of Cytosine to form Uracil, which is subsequently replicated as Thymidine (Glass et al., 2000). Thus in the absence of DNA repair mechanisms, genomes might become more AT-rich.

References
Glass JI, Lefkowitz EJ, Glass JS, Heiner CR, Chen EY, Cassell GH. 2000. The complete sequence of the mucosal pathogen Ureaplasma urealyticum. Nature 407:757-762.
Moran NA. 2002. Microbial minimalism: Genome reduction in bacterial pathogens. Cell 108:583-586.
Rocha EPC, Danchin A. 2002. Base composition bias might result from competition for metabolic resources. Trends Genet 18:291–294.
Sueoka N. 1962. On genetic basis of variation and heterogeneity of DNA base composition. PNAS 48:582-592.

2 c:

Thomieh said...

Interesting plot. I have been wondering about the GC content of bacterial genomes for quite some time now. Yesterday I found another interesting paper on the matter..
http://www.biomedcentral.com/1471-2164/11/464/abstract

Guus said...

Hi Thomieh,
Thanks for the link.
I think the GC distribution within a genome is very interesting indeed. The fact that there is often a "GC skew" from the origin of replication down might actually be useful in the process of genome assembly.