Home

Tutoring

Subjects

Live Classes

Study Coach

Essay Review

On-Demand Courses

Colleges

Games

Opening subject page...

Loading your content

  1. USMLE Step 1
  2. Molecular Genetics And Gene Expression

USMLE STEP 1 • GENETICS

Molecular Genetics And Gene Expression

Understanding how genetic information flows from DNA to RNA to protein, and the clinical consequences when this process goes awry.

SECTION 1

Historical Context & Motivation

The question of how hereditary information is stored, copied, and executed at the molecular level has driven some of the most transformative discoveries in biomedical science. Before the mid-twentieth century, most biologists assumed that proteins—with their twenty amino acid building blocks—must be the hereditary molecule, because DNA's four-nucleotide alphabet seemed too simple to encode the complexity of life. The series of experiments that overturned this assumption established molecular genetics as a discipline and ultimately gave rise to the central dogma of molecular biology—the directional flow of genetic information from DNA → RNA → protein.

1944
Avery–MacLeod–McCarty Experiment
Oswald Avery and colleagues demonstrated that purified DNA, not protein, was the transforming principle in Streptococcus pneumoniae, providing the first strong evidence that DNA carries genetic information.
1953
Watson–Crick Double Helix
James Watson and Francis Crick, aided by Rosalind Franklin's X-ray diffraction data, proposed the double-helical structure of DNA with antiparallel strands joined by complementary base pairing (A–T, G–C), immediately suggesting a mechanism for replication.
1958
The Central Dogma
Francis Crick articulated the central dogma: sequence information flows from nucleic acid to protein but not in reverse. This framework became the conceptual backbone of molecular genetics.
1961
Cracking the Genetic Code
Marshall Nirenberg and Heinrich Matthaei used cell-free translation systems to assign the first codon (UUU → phenylalanine), launching the effort that deciphered all 64 codons by 1966.
1977
Discovery of Introns and Splicing
Philip Sharp and Richard Roberts independently discovered that eukaryotic genes are interrupted by non-coding sequences (introns), revealing the critical post-transcriptional step of RNA splicing.

These milestones collectively framed the central question that molecular genetics seeks to answer: How does a linear sequence of nucleotides in DNA direct the synthesis of the thousands of distinct proteins required for cellular function, and what happens clinically when any step in this process is disrupted? For healthcare students preparing for the USMLE, understanding each step—from DNA replication through transcription, RNA processing, and translation—is essential for interpreting genetic disorders, pharmacologic interventions, and diagnostic tests.

SECTION 2

Core Principles of Molecular Genetics

Gene expression encompasses the entire pathway by which the information encoded in a segment of DNA is converted into a functional product—usually a protein, but sometimes a functional RNA. This pathway is governed by a set of foundational principles that apply across prokaryotic and eukaryotic systems, though the mechanistic details differ substantially. The following concepts constitute the intellectual scaffold upon which all of molecular genetics rests.

1

Complementary Base Pairing

Adenine pairs with thymine (or uracil in RNA) via two hydrogen bonds, while guanine pairs with cytosine via three hydrogen bonds. This complementarity underlies replication, transcription, and translation fidelity.
2

Semiconservative Replication

Each daughter DNA double helix retains one parental strand and one newly synthesized strand, as demonstrated by the Meselson–Stahl experiment (1958). DNA polymerase III (prokaryotes) and DNA polymerases δ and ε (eukaryotes) extend the new strand exclusively in the 5′→3′ direction.
3

Transcription: DNA → RNA

RNA polymerase reads the template (antisense) strand 3′→5′ and synthesizes mRNA in the 5′→3′ direction. In eukaryotes, RNA polymerase II transcribes mRNA precursors; RNA pol I and III handle rRNA and tRNA/5S rRNA, respectively.
4

Post-Transcriptional Processing

Eukaryotic pre-mRNA undergoes 5′ capping (7-methylguanosine), splicing (removal of introns by the spliceosome), and 3′ polyadenylation. These modifications protect mRNA from degradation, facilitate nuclear export, and regulate translation efficiency.
5

Translation: RNA → Protein

Ribosomes decode mRNA codons into amino acid sequences. Initiation begins at AUG (methionine), elongation adds amino acids via peptidyl transferase activity (in the large subunit rRNA), and termination occurs at stop codons (UAA, UAG, UGA) recognized by release factors.
✦ KEY TAKEAWAY
Think of gene expression as a manufacturing pipeline in a factory. The DNA is the master blueprint locked in the vault (nucleus). You never send the original to the factory floor; instead, you make a working copy—the mRNA—that is edited, quality-checked (processing), and carried to the assembly line (ribosome), where raw materials (amino acids delivered by tRNA) are assembled into the finished product (protein). A defect at any station shuts down or corrupts the final output—the molecular basis of genetic disease.
SECTION 3

The Central Dogma: A Visual Overview

Central Dogma of Molecular BiologyDNADouble helixA-T, G-C base pairsReplicationTranscriptionRNA Pol IIpre-mRNAIntrons + ExonsSingle-strandedProcessingMature mRNA5′ cap + poly-A tailExons onlyTranslationRibosomesProteinPolypeptide chainFolds → functionKey Enzymes & Factors at Each StepReplicationHelicase, Primase,DNA Pol III/δ/ε, LigaseTranscriptionRNA Pol II, TFs (TFIID),Mediator complexProcessingSpliceosome (snRNPs),Poly-A polymerase, CstFTranslation40S/60S subunits, eIFs,aminoacyl-tRNA synthetase
The central dogma illustrated from left to right: DNA replication copies the genome, transcription produces pre-mRNA, processing generates mature mRNA, and translation yields a functional protein. The lower panel summarizes key enzymes at each step.

The diagram above captures the unidirectional flow of genetic information that defines molecular genetics. Notice that DNA replication is a self-referential loop: the molecule reproduces itself faithfully using the same base-pairing rules that underlie transcription. The pre-mRNA stage is unique to eukaryotes; prokaryotes couple transcription and translation simultaneously in the cytoplasm, with no nuclear envelope to compartmentalize the two processes. This distinction has direct pharmacological relevance—antibiotics such as rifampin target bacterial RNA polymerase without affecting eukaryotic RNA Pol II, while α-amanitin (from Amanita mushrooms) specifically inhibits eukaryotic RNA Pol II, causing hepatotoxicity.

SECTION 4

Mechanistic Deep Dive: Transcription and Translation

Transcription in Detail

Eukaryotic transcription by RNA polymerase II proceeds through three phases: initiation, elongation, and termination. Initiation requires the assembly of the pre-initiation complex at the promoter. The TATA box (consensus TATAAA, located approximately 25 bp upstream of the transcription start site) is recognized by TFIID via its TBP (TATA-binding protein) subunit. Sequential binding of TFIIA, TFIIB, RNA Pol II (with TFIIF), TFIIE, and TFIIH completes the complex. TFIIH possesses helicase and kinase activity—the kinase phosphorylates the C-terminal domain (CTD) of RNA Pol II at serine 5, triggering promoter clearance and the transition to elongation. Elongation proceeds as RNA Pol II reads the template strand 3′→5′, synthesizing the nascent RNA chain 5′→3′ by adding ribonucleoside triphosphates complementary to the template. Termination involves cleavage of the transcript downstream of the polyadenylation signal (AAUAAA) and addition of the poly-A tail by poly-A polymerase.

RNA Processing: The Three Modifications

The three post-transcriptional modifications of eukaryotic mRNA
ModificationMechanismFunctionClinical Correlation
5′ Capping7-methylguanosine added via 5′→5′ triphosphate linkage by guanylyltransferaseProtects from 5′ exonucleases; required for ribosome recognition and translation initiationmRNA vaccines (e.g., COVID-19) incorporate synthetic 5′ caps to enhance stability and translatability
SplicingSpliceosome (U1, U2, U4, U5, U6 snRNPs) removes introns at GU (5′ splice site) and AG (3′ splice site) and joins exonsAllows alternative splicing → protein diversity from a single gene; removes non-coding sequenceβ-thalassemia can result from splice site mutations; spinal muscular atrophy treated by nusinersen (modifies SMN2 splicing)
3′ PolyadenylationPoly-A polymerase adds ~200 adenine residues after AAUAAA signal cleavageFacilitates nuclear export; stabilizes mRNA; enhances translationOligo-dT primers exploit poly-A tails to isolate mRNA for cDNA library construction and RT-PCR

Translation: Initiation, Elongation, and Termination

Translation occurs in the cytoplasm (or on the rough ER for secreted/membrane proteins). During initiation, the small ribosomal subunit (40S in eukaryotes, 30S in prokaryotes) binds the mRNA and scans for the start codon AUG in optimal Kozak context (GCC(A/G)CCAUGG). The initiator tRNA (Met-tRNAiMet) occupies the P site, and the large subunit (60S/50S) joins to form the functional ribosome. During elongation, aminoacyl-tRNAs enter the A site, peptide bond formation is catalyzed by the peptidyl transferase center (a ribozyme within the 23S/28S rRNA), and translocation shifts the ribosome one codon along the mRNA (catalyzed by EF-G/eEF-2 using GTP hydrolysis). Termination occurs when a stop codon (UAA, UAG, or UGA) enters the A site and is recognized by release factors (RF1/RF2 in prokaryotes; eRF1 in eukaryotes), triggering hydrolysis of the completed polypeptide from the tRNA.

💡 High-Yield: Wobble Base Pairing
The third position of a codon can tolerate non-standard pairing with the first position (5′ end) of the anticodon, explaining why fewer than 61 tRNAs are needed to read all 61 sense codons. Inosine (derived from adenine deamination) in the anticodon wobble position can pair with U, C, or A in the codon, further reducing the number of required tRNA species.
SECTION 5

Mutations, Regulation, and Clinical Correlations

Types of Mutations and Their Consequences

Types of Point Mutations & Their Effects on ProteinSilent (Synonymous)DNA: ...GCA...→ DNA: ...GCG...Codon changes but aminoacid stays the sameAla → AlaNo phenotypic effectMissenseCodon: GAG→ Codon: GUGSingle nucleotide change→ different amino acidGlu → ValSickle cell disease (HbS)NonsenseCodon: CGA→ Codon: UGACreates prematurestop codonArg → STOPTruncated / absent proteinFrameshift (Insertion / Deletion)Normal: AUG | GCA | UUC | GAA | ...Insert A: AUG | GAC | AUU | CGA | A...Insertion or deletion of nucleotides notdivisible by 3 shifts the reading frameAll downstream amino acids alteredE.g., Duchenne muscular dystrophy (dystrophin)Splice Site MutationNormal: Exon1-GU...intron...AG-Exon2Mutant: Exon1-AU...intron...AG-Exon2Disrupts consensus splice sequences→ intron retention or exon skippingAberrant mRNA / Truncated proteinE.g., some forms of β-thalassemiaRemember: Silent ≈ no effect | Missense = amino acid swap | Nonsense = premature STOP | Frameshift = reading frame destroyed
Classification of the major point mutations and their downstream effects on protein structure. Note how frameshift mutations are generally the most deleterious, altering every amino acid downstream of the insertion or deletion.

Regulation of Gene Expression

Gene expression is regulated at multiple levels, from the chromatin architecture down to post-translational modification. At the epigenetic level, DNA methylation (particularly of CpG islands near promoters) silences gene transcription, while histone acetylation by histone acetyltransferases (HATs) loosens chromatin and promotes transcription. Histone deacetylases (HDACs) reverse this effect. At the transcriptional level, enhancers and silencers modulate RNA polymerase activity through transcription factor binding. Post-transcriptionally, microRNAs (miRNAs) bind complementary sequences in the 3′ UTR of target mRNAs, leading to translational repression or mRNA degradation. At the post-translational level, ubiquitination tags proteins for proteasomal degradation, providing a final checkpoint on gene product abundance.

🏥 Clinical Pearl
HDAC inhibitors (e.g., vorinostat) are used in certain cancers such as cutaneous T-cell lymphoma. By preventing histone deacetylation, these drugs maintain an open chromatin state that reactivates silenced tumor suppressor genes. Similarly, DNA methyltransferase inhibitors (e.g., azacitidine) are used in myelodysplastic syndromes to de-repress silenced genes.
SECTION 6

Worked Example: Tracing a Mutation to Disease

The following example traces a single-nucleotide change in the β-globin gene through each stage of gene expression to its clinical phenotype, integrating the concepts of transcription, splicing, translation, and protein function.

Sickle Cell Disease: From Nucleotide to Phenotype

Step 1 — Identify the Mutation

The β-globin gene (HBB, chromosome 11) has a point mutation in the sixth codon of exon 1. The normal template DNA strand reads 3′-CTC-5′ (coding strand: 5′-GAG-3′). In sickle cell disease, the template strand becomes 3′-CAC-5′ (coding strand: 5′-GTG-3′). This is a single adenine-to-thymine transversion on the coding strand.
A → T transversion at codon 6 of HBB

Step 2 — Determine the mRNA Codon Change

Transcription reads the template strand 3′→5′ and synthesizes mRNA 5′→3′. The normal template 3′-CTC-5′ yields mRNA codon GAG. The mutant template 3′-CAC-5′ yields mRNA codon GUG. Since this mutation is within an exon and does not disrupt splice sites, RNA processing proceeds normally—the 5′ cap, splicing, and polyadenylation are all unaffected.
mRNA codon: GAG → GUG

Step 3 — Translate the Mutant Codon

Using the standard genetic code table: GAG codes for glutamic acid (Glu, E)—a hydrophilic, negatively charged amino acid. GUG codes for valine (Val, V)—a hydrophobic, nonpolar amino acid. This is a missense mutation because one amino acid is substituted for another.
Glu⁶ → Val⁶ (E6V) — missense mutation

Step 4 — Predict the Protein Consequence

The substitution of a charged residue (Glu) with a nonpolar residue (Val) at position 6 creates a hydrophobic patch on the surface of the β-globin subunit. Under low-oxygen conditions, this patch promotes intermolecular polymerization of deoxygenated hemoglobin S (HbS) into rigid fibers, distorting the red blood cell into a sickle shape.
HbS polymerization → sickling → vaso-occlusive crises, hemolytic anemia

Step 5 — Connect to Clinical Presentation

Patients homozygous for HbS (SS genotype) present with chronic hemolytic anemia, episodic pain crises, acute chest syndrome, splenic infarction (functional asplenia by age 5), and increased susceptibility to encapsulated organisms (Streptococcus pneumoniae, Haemophilus influenzae). Heterozygous carriers (AS, sickle cell trait) are generally asymptomatic but carry a selective advantage against Plasmodium falciparum malaria. Treatment includes hydroxyurea (increases fetal hemoglobin, HbF), penicillin prophylaxis, and potentially gene therapy or bone marrow transplant.
One nucleotide change → altered protein folding → systemic disease
SECTION 7

Prokaryotic vs. Eukaryotic Gene Expression

While the fundamental logic of the central dogma applies universally, the implementation details diverge substantially between prokaryotes and eukaryotes. These differences have profound clinical significance because they provide selective drug targets—antibiotics that inhibit bacterial gene expression machinery without affecting human cells. The following table highlights the most USMLE-relevant distinctions.

Key differences between prokaryotic and eukaryotic gene expression
FeatureProkaryotesEukaryotes
RNA PolymeraseSingle RNA polymerase (core: α₂ββ′σ)Three: RNA Pol I (rRNA), Pol II (mRNA), Pol III (tRNA, 5S rRNA)
CouplingTranscription and translation are coupled (simultaneous)Spatially separated: transcription in nucleus, translation in cytoplasm
mRNA ProcessingNone (no introns in most genes, no 5′ cap, no poly-A tail)5′ capping, splicing (intron removal), 3′ polyadenylation
Ribosomes70S (30S + 50S subunits)80S (40S + 60S subunits)
Start Codon ContextShine-Dalgarno sequence (purine-rich) upstream of AUGKozak consensus sequence around AUG; 5′ cap-dependent scanning
Gene OrganizationOperons (polycistronic mRNA encoding multiple proteins)Monocistronic mRNA (one gene → one mRNA → one protein)
Initiator Amino AcidN-formylmethionine (fMet)Methionine (Met)
💊 PHARMACOLOGIC EXPLOITATION
Differences between prokaryotic and eukaryotic gene expression are the basis for selective antimicrobial therapy. Rifampin inhibits bacterial RNA polymerase (used in TB/meningococcal prophylaxis). Chloramphenicol blocks the 50S peptidyl transferase. Tetracyclines prevent aminoacyl-tRNA from binding the 30S A site. Macrolides (erythromycin, azithromycin) block translocation at the 50S subunit. Memorize each drug's target using the mnemonic: "Buy AT 30, CELL at 50" — aminoglycosides and tetracyclines hit the 30S subunit; chloramphenicol, erythromycin (macrolides), lincomycin (clindamycin), and linezolid hit the 50S subunit.
SECTION 8

Connections to Advanced Topics

The foundational principles of molecular genetics connect directly to several advanced topics that appear on USMLE Step 1 and form the basis for understanding modern therapeutics and diagnostics. The table below maps core gene expression concepts to their advanced extensions.

Core molecular genetics concepts and their advanced clinical extensions
Core ConceptAdvanced ExtensionClinical Application
DNA replication fidelityMismatch repair (MLH1, MSH2), nucleotide excision repair, base excision repairLynch syndrome (HNPCC) from mismatch repair defects; xeroderma pigmentosum from NER deficiency
Transcription factorsOncogenes (c-myc, c-fos) and tumor suppressors (p53, Rb)Gain-of-function mutations in oncogenes and loss-of-function in tumor suppressors drive carcinogenesis
RNA splicingAlternative splicing, antisense oligonucleotide therapyNusinersen (Spinraza) for SMA; eteplirsen for DMD (exon skipping)
Translation regulationmTOR pathway, eIF4E phosphorylation, IRES elementsmTOR inhibitors (sirolimus/everolimus) suppress translation in transplant rejection and certain cancers
Epigenetic regulationGenomic imprinting, X-inactivation, CpG island hypermethylationPrader-Willi / Angelman syndromes (imprinting); aberrant methylation in cancer

Beyond these classical extensions, the advent of CRISPR-Cas9 gene editing has transformed molecular genetics from a descriptive science into a therapeutic platform. CRISPR exploits a bacterial adaptive immune mechanism: a guide RNA directs the Cas9 endonuclease to a specific genomic locus where it creates a double-strand break. The cell's repair machinery (non-homologous end joining or homology-directed repair) can then be harnessed to knock out, correct, or insert genes. The first FDA-approved CRISPR therapy—exagamglogene autotemcel (Casgevy)—treats sickle cell disease by editing hematopoietic stem cells to reactivate fetal hemoglobin production, demonstrating how understanding gene expression at the molecular level translates directly into curative medicine.

SECTION 9

Practice Problems

PROBLEM 1 — CONCEPTUAL
A pharmaceutical researcher is designing a drug that selectively inhibits eukaryotic mRNA synthesis without affecting rRNA or tRNA production. Which RNA polymerase should the drug target, and why would this selectivity matter clinically?
PROBLEM 2 — BASIC
A segment of template (antisense) DNA reads: 3′-TACGGCAAATTTGCAACT-5′. Write the corresponding mRNA sequence and identify the amino acid sequence encoded by the first four codons. Assume standard codon usage.
PROBLEM 3 — INTERMEDIATE
A patient has a mutation in the β-globin gene that changes the splice donor site at the exon 1/intron 1 boundary from GT to AT. Predict the molecular consequence of this mutation and explain why the resulting protein is non-functional.
PROBLEM 4 — APPLIED
A 3-year-old child presents with progressive muscle weakness. Genetic testing reveals a frameshift deletion in the dystrophin gene (exon 50 deleted, disrupting the reading frame). The physician discusses eteplirsen, an antisense oligonucleotide that induces skipping of exon 51. Explain the molecular rationale for this treatment.
PROBLEM 5 — CRITICAL THINKING
A researcher discovers that a cancer cell line has hypermethylation of the promoter CpG island of the MLH1 gene and simultaneously has microsatellite instability (MSI-high). The researcher treats the cells with 5-azacitidine (a DNA methyltransferase inhibitor). Predict the effects on MLH1 expression, microsatellite stability, and tumor growth. What limitations might this approach have in vivo?
SUMMARY

Summary: Molecular Genetics and Gene Expression

Molecular genetics describes the flow of genetic information as dictated by the central dogma: DNA replication copies the genome semiconservatively using DNA polymerases. Transcription by RNA Pol II produces pre-mRNA from the template strand (3′→5′), synthesizing RNA 5′→3′. Eukaryotic post-transcriptional processing adds a 5′ cap, removes introns via splicing (spliceosome recognizing GU–AG boundaries), and adds a poly-A tail. Translation at the ribosome (80S eukaryotic, 70S prokaryotic) decodes mRNA codons into amino acids, beginning at AUG (methionine) and ending at stop codons (UAA, UAG, UGA).

Mutations disrupt this pipeline: silent mutations change the codon without altering the amino acid; missense mutations substitute one amino acid for another (e.g., sickle cell HbS); nonsense mutations create premature stop codons; and frameshift mutations alter every downstream codon. Gene expression is regulated at the epigenetic (DNA methylation, histone modification), transcriptional (enhancers, silencers, transcription factors), post-transcriptional (miRNAs, alternative splicing), and post-translational (ubiquitination, phosphorylation) levels. Differences between prokaryotic and eukaryotic gene expression provide selective targets for antibiotics—rifampin (bacterial RNA polymerase), aminoglycosides/tetracyclines (30S ribosomal subunit), and macrolides/chloramphenicol (50S ribosomal subunit)—and emerging gene therapies like CRISPR-Cas9 now enable direct correction of disease-causing mutations at the DNA level.

Varsity Tutors • USMLE Step 1 • Molecular Genetics And Gene Expression