Opening subject page...
Loading your content
Understanding how genetic information flows from DNA to RNA to protein, and the clinical consequences when this process goes awry.
The question of how hereditary information is stored, copied, and executed at the molecular level has driven some of the most transformative discoveries in biomedical science. Before the mid-twentieth century, most biologists assumed that proteins—with their twenty amino acid building blocks—must be the hereditary molecule, because DNA's four-nucleotide alphabet seemed too simple to encode the complexity of life. The series of experiments that overturned this assumption established molecular genetics as a discipline and ultimately gave rise to the central dogma of molecular biology—the directional flow of genetic information from DNA → RNA → protein.
These milestones collectively framed the central question that molecular genetics seeks to answer: How does a linear sequence of nucleotides in DNA direct the synthesis of the thousands of distinct proteins required for cellular function, and what happens clinically when any step in this process is disrupted? For healthcare students preparing for the USMLE, understanding each step—from DNA replication through transcription, RNA processing, and translation—is essential for interpreting genetic disorders, pharmacologic interventions, and diagnostic tests.
Gene expression encompasses the entire pathway by which the information encoded in a segment of DNA is converted into a functional product—usually a protein, but sometimes a functional RNA. This pathway is governed by a set of foundational principles that apply across prokaryotic and eukaryotic systems, though the mechanistic details differ substantially. The following concepts constitute the intellectual scaffold upon which all of molecular genetics rests.
The diagram above captures the unidirectional flow of genetic information that defines molecular genetics. Notice that DNA replication is a self-referential loop: the molecule reproduces itself faithfully using the same base-pairing rules that underlie transcription. The pre-mRNA stage is unique to eukaryotes; prokaryotes couple transcription and translation simultaneously in the cytoplasm, with no nuclear envelope to compartmentalize the two processes. This distinction has direct pharmacological relevance—antibiotics such as rifampin target bacterial RNA polymerase without affecting eukaryotic RNA Pol II, while α-amanitin (from Amanita mushrooms) specifically inhibits eukaryotic RNA Pol II, causing hepatotoxicity.
Eukaryotic transcription by RNA polymerase II proceeds through three phases: initiation, elongation, and termination. Initiation requires the assembly of the pre-initiation complex at the promoter. The TATA box (consensus TATAAA, located approximately 25 bp upstream of the transcription start site) is recognized by TFIID via its TBP (TATA-binding protein) subunit. Sequential binding of TFIIA, TFIIB, RNA Pol II (with TFIIF), TFIIE, and TFIIH completes the complex. TFIIH possesses helicase and kinase activity—the kinase phosphorylates the C-terminal domain (CTD) of RNA Pol II at serine 5, triggering promoter clearance and the transition to elongation. Elongation proceeds as RNA Pol II reads the template strand 3′→5′, synthesizing the nascent RNA chain 5′→3′ by adding ribonucleoside triphosphates complementary to the template. Termination involves cleavage of the transcript downstream of the polyadenylation signal (AAUAAA) and addition of the poly-A tail by poly-A polymerase.
| Modification | Mechanism | Function | Clinical Correlation |
|---|---|---|---|
| 5′ Capping | 7-methylguanosine added via 5′→5′ triphosphate linkage by guanylyltransferase | Protects from 5′ exonucleases; required for ribosome recognition and translation initiation | mRNA vaccines (e.g., COVID-19) incorporate synthetic 5′ caps to enhance stability and translatability |
| Splicing | Spliceosome (U1, U2, U4, U5, U6 snRNPs) removes introns at GU (5′ splice site) and AG (3′ splice site) and joins exons | Allows alternative splicing → protein diversity from a single gene; removes non-coding sequence | β-thalassemia can result from splice site mutations; spinal muscular atrophy treated by nusinersen (modifies SMN2 splicing) |
| 3′ Polyadenylation | Poly-A polymerase adds ~200 adenine residues after AAUAAA signal cleavage | Facilitates nuclear export; stabilizes mRNA; enhances translation | Oligo-dT primers exploit poly-A tails to isolate mRNA for cDNA library construction and RT-PCR |
Translation occurs in the cytoplasm (or on the rough ER for secreted/membrane proteins). During initiation, the small ribosomal subunit (40S in eukaryotes, 30S in prokaryotes) binds the mRNA and scans for the start codon AUG in optimal Kozak context (GCC(A/G)CCAUGG). The initiator tRNA (Met-tRNAiMet) occupies the P site, and the large subunit (60S/50S) joins to form the functional ribosome. During elongation, aminoacyl-tRNAs enter the A site, peptide bond formation is catalyzed by the peptidyl transferase center (a ribozyme within the 23S/28S rRNA), and translocation shifts the ribosome one codon along the mRNA (catalyzed by EF-G/eEF-2 using GTP hydrolysis). Termination occurs when a stop codon (UAA, UAG, or UGA) enters the A site and is recognized by release factors (RF1/RF2 in prokaryotes; eRF1 in eukaryotes), triggering hydrolysis of the completed polypeptide from the tRNA.
Gene expression is regulated at multiple levels, from the chromatin architecture down to post-translational modification. At the epigenetic level, DNA methylation (particularly of CpG islands near promoters) silences gene transcription, while histone acetylation by histone acetyltransferases (HATs) loosens chromatin and promotes transcription. Histone deacetylases (HDACs) reverse this effect. At the transcriptional level, enhancers and silencers modulate RNA polymerase activity through transcription factor binding. Post-transcriptionally, microRNAs (miRNAs) bind complementary sequences in the 3′ UTR of target mRNAs, leading to translational repression or mRNA degradation. At the post-translational level, ubiquitination tags proteins for proteasomal degradation, providing a final checkpoint on gene product abundance.
The following example traces a single-nucleotide change in the β-globin gene through each stage of gene expression to its clinical phenotype, integrating the concepts of transcription, splicing, translation, and protein function.
While the fundamental logic of the central dogma applies universally, the implementation details diverge substantially between prokaryotes and eukaryotes. These differences have profound clinical significance because they provide selective drug targets—antibiotics that inhibit bacterial gene expression machinery without affecting human cells. The following table highlights the most USMLE-relevant distinctions.
| Feature | Prokaryotes | Eukaryotes |
|---|---|---|
| RNA Polymerase | Single RNA polymerase (core: α₂ββ′σ) | Three: RNA Pol I (rRNA), Pol II (mRNA), Pol III (tRNA, 5S rRNA) |
| Coupling | Transcription and translation are coupled (simultaneous) | Spatially separated: transcription in nucleus, translation in cytoplasm |
| mRNA Processing | None (no introns in most genes, no 5′ cap, no poly-A tail) | 5′ capping, splicing (intron removal), 3′ polyadenylation |
| Ribosomes | 70S (30S + 50S subunits) | 80S (40S + 60S subunits) |
| Start Codon Context | Shine-Dalgarno sequence (purine-rich) upstream of AUG | Kozak consensus sequence around AUG; 5′ cap-dependent scanning |
| Gene Organization | Operons (polycistronic mRNA encoding multiple proteins) | Monocistronic mRNA (one gene → one mRNA → one protein) |
| Initiator Amino Acid | N-formylmethionine (fMet) | Methionine (Met) |
The foundational principles of molecular genetics connect directly to several advanced topics that appear on USMLE Step 1 and form the basis for understanding modern therapeutics and diagnostics. The table below maps core gene expression concepts to their advanced extensions.
| Core Concept | Advanced Extension | Clinical Application |
|---|---|---|
| DNA replication fidelity | Mismatch repair (MLH1, MSH2), nucleotide excision repair, base excision repair | Lynch syndrome (HNPCC) from mismatch repair defects; xeroderma pigmentosum from NER deficiency |
| Transcription factors | Oncogenes (c-myc, c-fos) and tumor suppressors (p53, Rb) | Gain-of-function mutations in oncogenes and loss-of-function in tumor suppressors drive carcinogenesis |
| RNA splicing | Alternative splicing, antisense oligonucleotide therapy | Nusinersen (Spinraza) for SMA; eteplirsen for DMD (exon skipping) |
| Translation regulation | mTOR pathway, eIF4E phosphorylation, IRES elements | mTOR inhibitors (sirolimus/everolimus) suppress translation in transplant rejection and certain cancers |
| Epigenetic regulation | Genomic imprinting, X-inactivation, CpG island hypermethylation | Prader-Willi / Angelman syndromes (imprinting); aberrant methylation in cancer |
Beyond these classical extensions, the advent of CRISPR-Cas9 gene editing has transformed molecular genetics from a descriptive science into a therapeutic platform. CRISPR exploits a bacterial adaptive immune mechanism: a guide RNA directs the Cas9 endonuclease to a specific genomic locus where it creates a double-strand break. The cell's repair machinery (non-homologous end joining or homology-directed repair) can then be harnessed to knock out, correct, or insert genes. The first FDA-approved CRISPR therapy—exagamglogene autotemcel (Casgevy)—treats sickle cell disease by editing hematopoietic stem cells to reactivate fetal hemoglobin production, demonstrating how understanding gene expression at the molecular level translates directly into curative medicine.
Molecular genetics describes the flow of genetic information as dictated by the central dogma: DNA replication copies the genome semiconservatively using DNA polymerases. Transcription by RNA Pol II produces pre-mRNA from the template strand (3′→5′), synthesizing RNA 5′→3′. Eukaryotic post-transcriptional processing adds a 5′ cap, removes introns via splicing (spliceosome recognizing GU–AG boundaries), and adds a poly-A tail. Translation at the ribosome (80S eukaryotic, 70S prokaryotic) decodes mRNA codons into amino acids, beginning at AUG (methionine) and ending at stop codons (UAA, UAG, UGA).
Mutations disrupt this pipeline: silent mutations change the codon without altering the amino acid; missense mutations substitute one amino acid for another (e.g., sickle cell HbS); nonsense mutations create premature stop codons; and frameshift mutations alter every downstream codon. Gene expression is regulated at the epigenetic (DNA methylation, histone modification), transcriptional (enhancers, silencers, transcription factors), post-transcriptional (miRNAs, alternative splicing), and post-translational (ubiquitination, phosphorylation) levels. Differences between prokaryotic and eukaryotic gene expression provide selective targets for antibiotics—rifampin (bacterial RNA polymerase), aminoglycosides/tetracyclines (30S ribosomal subunit), and macrolides/chloramphenicol (50S ribosomal subunit)—and emerging gene therapies like CRISPR-Cas9 now enable direct correction of disease-causing mutations at the DNA level.