Notes from MCB 104 Pt.2 Genomics Lectures

High-level Concepts

Sequencing techniques

Genome sequencing

Fragment DNA using
1. Restriction enzyme digestion (e.g. EcoR1)
  - If a restriction enzyme recognizes a greater # of bases, it’s harder to find matches in the sequence, and they generate larger fragments (travel less distance in gel electrophoresis)
2. Sonication
  - More sonication cycles gives us smaller fragments
- Then do gel electrophoresis
Clone DNA fragments into plasmid vector backbone (could also use PCR to amplify sequence)
- Creates a library of plasmids
Sequence the entire library using paired-end sequencing: sequences of both ends of a piece of DNA
- Primers are used on both ends, with our DNA fragment in the middle; the primers are the same sequence on all the plasmid backbones
- Used to assemble contigs (see next step)
- Sequencing process: when ddNTP (fluorescent base) added, it terminates the read, so the reads are different lengths ????
  - DNA to sequence → denature DNA (into single strands) and anneal primer ⇒ primer extension to copy and label DNA: DNA pol, nucleotides ATCG extend the primer, and ddNTPs terminate the extension
- Detect the sequence on a gel to separate fragments based on size; could verify the efficiency of digestion/sonication, and extract DNA fragments of a specific size for further cloning/sequencing/fragmenting
Align overlapping paired-end reads to make contigs: a set of overlapping DNA segments/sequences that overlap in a way to provide a contiguous representation of a genomic region
- After reaching 1x coverage, if we keep increasing coverage, then the number of contigs decreases because we get more overlapping regions, so the contigs assemble into longer ones, but there are fewer distinct contigs
- Before reaching 1x coverage, if we keep increasing coverage, the # of contigs increases
Use paired-end information to group contigs into scaffolds (overlapping contigs create scaffolds): large stretch of sequence built from multiple contigs; may contain gaps

Relationship between scaffolds and contigs

RNA-Seq

High-throughput shotgun cDNA sequencing

Purpose: to quantify gene expression by sequencing RNA present in a sample; used to identify which parts of a genome are being actively transcribed into RNA
RNA-Seq also allows for
- Locations of genes and exons
- Can reveal coding sequence polymorphism

Steps

Harvest mRNA - we will only be able to see mRNA transcripts and cDNA library from the specific cell type we’re isolating the mRNA from
Reverse transcribe mRNA to cDNA using reverse transcriptase
- Also use DNA polymerase to make single-stranded cDNA → double-stranded cDNA
Clone cDNA into vectors
Sequence the 5’ and 3’ ends of each cDNA insert/clone/plasmid (aka paired-end reads) to generate Expressed Sequence Tags (ESTs): a short sub-sequence of cDNA transcript sequence
- cDNA and ESTs reveal exons (no introns!) and transcribed portions of the genome
Align reads to assemble genome

Relationship between genomic DNA, mRNA, cDNA, and ESTs Genomic DNA is transcribed into mRNA mRNA is reverse transcribed into cDNA ESTs are a sub-sequence of cDNA

ATAC-Seq (Assay for Transposase-Accessible Chromatin)

Purpose: determine chromatin accessibility across genome by sequencing regions of open chromatin
- We can use this information to do further testing to reveal putative enhancer regions using ChIP-Seq, for e.g.

Steps

Tn5 transposase (simultaneously fragments chromatin regions and tags the regions with sequencing adapters) performs tagmentation on open chromatin
- Tn5 is a transposase, so it cuts and pastes (vs. restriction enzyme, which only cuts but doesn’t paste)
- Tn5 attaches sequencing adapters so the DNA fragments can be recognized later

Amplify fragmented DNA with adapters using PCR

Use paired-end sequencing to align the fragments to the genome to find out where Tn5 cut, revealing where chromatin is accessible

ChIP-Seq (Chromatin Immunoprecipitation)

Purpose: identifies binding sites** (e.g. enhancer region) of DNA-associated proteins (e.g. transcription factors) and can be used to map global binding sites for a given protein
- Can also tell us the identity of the protein (e.g. tf, histone) binding to the DNA region (bc we have to use a protein-specific antibody)

Steps

Crosslink DNA and proteins (TFs) using formaldehyde - to secure the interactions between the DNA and the TFs, to make sure they’re linked for all the following steps
Sonicate (fragment) DNA
Add antibody specific to protein (TF) and co-immunoprecipitate the TF-bound DNA using the TF-specific antibody
- The mixture would look like:
- The co-immunoprecipitate includes 3 molecules:
  1. Antibody that is specific to the target protein, i.e. the antibody only recognizes the target protein (e.g. TF) linked to DNA
  2. The target protein (e.g. a TF)
  3. DNA (that the target protein is bound to; remember, target protein and DNA were crosslinked in the first step!)
- The co-immunoprecipitate is precipitated out, and we can dump the supernatant
Purify the DNA - isolate the DNA and dissociate the TF and antibodies
Sequence the DNA - purpose of ChIP-Seq is to identify binding sites of DNA-associated proteins (e.g. TFs, histones), so sequencing DNA fragments

Genome arrangement

2 main mechanisms that lead to chromosomal rearrangements

DNA DSBs (double-strand breaks)
Repeated sequences and transposons Ways chromosomes interact with each other
On 1 chromsome
Between homologous chromosomes
Between non-homologous chromosomes

Gain of genetic material

Duplication
- DSB - Between 2 homologous chromosomes - 2 DNA DSBs, but the ends of 1 strand of a chromosome fuse with the other strand of chromosome instead of with itself - leads to simultaneous deletion and duplication
- Repeated sequences in homologous chromosomes - simultaneous deletion and duplication
  - Crossing over in wrong site in chromosome (i.e. if there is another site in the chromosome that happens to have the same sequence)
Extra chromosome

Loss of genetic material

Deletion
- DNA DSBs
  - On 1 chromosome
    - 1 DNA DSB: repairing the DSB via NHEJ (fusing both ends of break while omitting the segment in the middle) leads to deletion
  - Between 2 homologous chromosomes - 2 DNA DSBs, but the ends of 1 strand of a chromosome fuse with the other strand of chromosome instead of with itself - leads to simultaneous deletion and duplication
- Repeated sequences
  - in homologous chromosomes - simultaneous deletion and duplication
    - Crossing over in wrong site in chromosome (i.e. if there is another site in the chromosome that happens to have the same sequence)
  - in 1 chromosome: aberrant crossover event leads to loss of genetic material

Relocation of genetic material

Translocation
- DNA DSB: Between 2 nonhomologous chromosomes: both strands have DNA DSBs, and fuse with each other
- Repeated sequences: crossing over between 2 nonhomologous chromosomes
Inversion
- DNA DSB: on 1 chromosome, repairing the DSB via NHEJ, but the segment in the middle generated by the DSB gets inverted in the process of repair
- Repeated sequences: looping event creates inversion

Transposons

2 classes Like repeated sequences, can lead to chromosomal rearrangements

1. Retrotransposons (copy and paste)

Moves (copies) DNA via reverse transcription into an RNA intermediate
Is flanked by long terminal repeats (LTRs) on left and right ends
Requires reverse transcriptase to produce RNA intermediate

Steps

Retrotransposon (DNA) gets transcribed, giving us mRNA intermediate
mRNA intermediate is reverse transcribed by reverse transcriptase; we obtain ds-cDNA copy of retrotransposon
Meanwhile, endonuclease cleaves the identifying site in the new location in genome where retrotransposon is supposed to be inserted
Insertion of ds-cDNA into new site in genome

2. DNA Transposon (cut and paste)

Moves DNA directly (cuts and pastes DNA)
Is flanked by inverted repeats (IRs) on left and right ends
Requires transposase to recognize IRs and excise the transposon

Steps

Transposase recognizes IRs to locate the DNA transposon to know where to cut
Transposase cuts the DNA transposon, and the DNA transposon
1. can stay in place
2. or can be inserted into new site in genome (and OG position can be repaired by copying homologous chromosome

Promoters

Def: region of DNA upstream of a gene where relevant proteins (e.g. RNA pol, transcription factors) bind to to initiate transcription of that gene; resulting transcription produces an RNA molecule
Location: 5’ adjacent (upstream) to gene’s transcription start site, on the same chromosome (aka “cis-acting”)

Enhancers

aka “regulatory DNA”

Def: region of DNA that positively regulates transcription, often in a spatial and/or temporal specific manner (via differential accessibility and transcription factor binding, also combinatorial logic of multiple enhancers)
Transcription factors (e.g. activators and repressors are classes of tfs) bind to enhancers to regulate transcription
Potential locations: anywhere (upstream 5’ or downstream 3’, close to or far away from the gene) as long as its on the same chromosome

Transcription factors regulate enhancers: activators & repressors

Activators and repressors are categories of transcription factors and are only cell-specific (e.g. activators and repressors specific to the skin or eye)
Enhancer activity is space-specific and time-specific

Reporter assay

Purpose: tools to test enhancer activity, to understand gene regulation
Answers question: Is a specific region of DNA an enhancer or not?

Steps

Clone putative enhancer (or other regulatory DNA sequence of interest, e.g. promoter) into DNA vector with a general basal promoter (gives basal/minimal level of gene expression) and a reporter gene (e.g. GFP, LacZ) whose product is easy to measure
Inject recombinant constructs into host embryos (make transgenic by insertion into germ line)
We assess the enhancer’s (or other regulatory DNA sequence of interest’s) activity by measuring expression level of the reporter gene’s product (e.g. GFP, visualize and quantify gene expression levels using fluorescence microscopy/flow cytometry)

Genetic screens

Genetic screen = procedure used to create and detect a mutant organism

Complementation Testing

Complementation: the ability of 2 mutants in combination to restore a normal phenotype (heterozygosity for loss-of-function mutant recessive alleles for 2 different genes produces a normal phenotype)
Complementation test: a test to determine if 2 mutations…
- complement (are in different genes, so can restore normal phenotype) ”+“
- or don’t complement (are in same gene) ”-” → mutations that don’t complement each other are in the same complementation group
Saturation mutagenesis: systemic approach for finding all the genes in the genome that can mutate to affect a biological process
- We know a genetic screen is saturated when we stop finding new loci/genes, but continue to find more alleles (mutants) of the same loci; i.e. until all identified complementation groups each have multiple alleles, the screen is not saturated

Example problems

Example problem 1
- Ones with (-) in the same row = in the same complementation group
- For each problem, before approaching, identify if a gene is dominant first! If it is, its crosses won’t give us much info about complementation groups
Example problem 2

Forward Genetics

Def: start with phenotype of interest, then stimulate random mutagenesis (with e.g. chemicals, x-ray), finally determine which genes are responsible for phenotype of interest
Application of forward genetic screens in model organisms: they can inform upon human disease, bc when we id key genes and pathways in model organism, it can give us info on conserved functions in human biology

Reverse Genetics

Def: start with gene of interest, then do targeted mutagenesis (only perturb our gene of interest) via e.g. CRISPR-Cas9, then screen in model organism for what the gene’s phenotype is
Strategies for reverse genetics
- Acts on RNA (thus makes temporary changes bc only affects gene expression)
  1. RNAi: manipulates RNA and affects gene expression
- Acts on DNA (thus makes permanent changes in genome)
  1. Homologous Recombination (HR)
  2. CRISPR, ZFNs, TALENs - perturb gene of interest, then screen for phenotype

Genome editing

Double-Strand Break Repair - 2 repair pathways

HR (involves strand invasion etc.)
- Involves using homologous sequence (in nature) or using plasmid with homology domains (same sequence as our endogenous gene locus)
NHEJ (straight up ligate the ends)
- Downside: generates a high frequency of indels at the repair junctions

left off Anki

Programmable Nucleases Allow Genome Engineering

Zinc Finger Nucleases (ZFNs), TALENs, and CRISPR-Cas9 all have a programmable, DNA-sequence-specific binding domain coupled to an endonuclease activity (i.e. they can all induce DSBs in DNA, leading to alterations in genome thru either repair mechanism, HR or NHEJ)
Their differences (extra info kinda)
- ZFNs and TALENs: ZFNs and TALENs are proteins that directly bind to DNA sequences
- CRISPR: uses RNA (e.g. gRNA, tracrRNA, crRNA) to direct the Cas9 enzyme to the target DNA sequence

CRISPR-Cas9

CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats) is a segment of DNA containing short repetitions of base sequences (spacers and repeats)
Origin: found in bacterial immune system

In nature (bacteria)

Components

CRISPR, a region of DNA which consists of spacers and repeats
- 🦠 Spacers: all different sequences, each serves as a genetic memory of past infections; when prev encountered virus tries to infect the bacterium again, spacers help guide Cas protein to the matching DNA to destroy the invader
- Repeats: short, repetitive, palindromic DNA sequences interspersed between spacers; involved in precr-RNA processing: tracrRNA aligns with repeats, allowing RNase III to cut in correct location
- CRISPR (DNA) is then transcribed into crRNA (CRISPR RNA): contains region complementary to target DNA sequence; crRNA is designed based on DNA sequence we want to edit, and complementarity lets crRNA bind to the target DNA sequence so Cas9 can cut there
tracrRNA (trans-activating CRISPR RNA): binds to both the crRNA and the Cas9 protein (enzyme), holds the tracrRNA-crRNA-Cas9 complex together
- Involved in pre-crRNA processing (brings Cas9 cleave long strands of pre-crRNA, resulting in release of individual crRNA units)
- Involved in the actual editing of the invader’s genome
Cas9 gene
- Then transcribed into Cas9 mRNA
- Then translated into Cas9 protein (an endonuclease)
RNase III: processes pre-crRNA into mature crRNA by cleaving the long strand of pre-crRNA at specific sites, resulting in release of individual crRNA units

Steps

Spacer acquisition: bacterium is invaded by a virus, and a small piece of the virus’ DNA is captured and incorporated into the CRISPR locus in the bacterium’s genome. This new DNA fragment is called a new spacer
crRNA biogenesis: CRISPR DNA locus is transcribed into pre-crRNA, which undergoes pre-crRNA processing to turn into short crRNA molecules (1 spacer sequence + partial repeat sequence)
- pre-crRNA processing: involves RNase III, Cas9, and tracrRNA, tracrRNA-Cas9 complex lines up with pre-crRNA, and RNase III cuts the pre-crRNA in the right place
Formation of surveillance complex: crRNA + Cas9 protein + tracrRNA complex forms. The crRNA is the guide that directs the crRNA-Cas 9 protein-tracrRNA complex to the target DNA sequence matching the crRNA’s spacer sequence.
Target DNA recognition and cleavage: once target DNA is recognized (using crRNA’s spacer sequence), Cas9 creates a DSB at the target DNA site
Target DNA repair (for the invader)

In biotech

Components

Instead of needing CRISPR (the whole DNA region with spacers and repeats) and processing pre-crRNA to crRNA, we can generate crRNA directly
Instead of having 2 separate molecules, crRNA and tracrRNA, we can create sgRNA (single-guide RNA) containing crRNA + linker loop + tracrRNA
- Purpose of linker loop: stabilizes tracrRNA to crRNA to provide more efficient editing

Steps (in biotech)

Design sgRNA: sgRNA is complementary to the specific target sequence in genome we want to edit and also includes a sequence that binds to the Cas9 enzyme
Formation of CRISPR-Cas9 complex (sgRNA + Cas9 protein): sgRNA and Cas9 nuclease are introduced into the cell. Inside the cell, sgRNA binds to Cas9 protein.
Target DNA recognition and cleavage: CRISPR-Cas9 complex finds sequence in DNA that matches the sgRNA. sgRNA base pairs with complementary DNA sequence. Cas9 makes a DSB.
DNA DSB repair: 2 main pathways for DSB repair
1. NHEJ
2. HDR

Applications of CRISPR

Treating disease: target disease loci by either repairing a mutated gene or inactivating a dominant-negative (dominant, disease-causing) allele
For a genome-wide screen (forward genetic screen): CRISPR libraries of sgRNAs (each designed to target a different, specific location in the genome) can be used for mutagenesis; after mutations have been induced, we select cells from the population based on our phenotype of interest; then, we sequence the selected cells’ genomes

Linking genetic variation with phenotypic variation

The purpose of QTL and GWAS is to identify genetic variations associated with complex traits. Some definitions:

Linkage equilibrium: random association of alleles
Linkage disequilibrium: non-random association of alleles; i.e. if the occurrence of alleles A and B in a haplotype are dependent on each other/non-random events
- Significance: makes finding linkage in human GWAS more likely (since local SNPs vary in correlation with other SNPs) and tell us where recombination hotspots are
- Higher linkage disequilibrium between alleles = higher likelihood of alleles staying together during crossing over

Genome-wide linkage maps

Linkage maps show the relative positions of genes/genetic markers along chromosomes. Before, we could sequence entire genomes, their full sequences were unknown. But, we had genome-wide linkage maps w/ molecular markers (e.g. SNPs - variations at a single nucleotide position in DNA among individuals)

Genome-wide linkage maps were constructed based on the concept of linkage (tendency of genes close together on a chromosome to be inherited together). Scientists then measure recombination frequency during meiosis. Gene order and recombination frequencies are used to construct the linkage map. (no need to know the details).

Quantitative Trait Loci (QTL)

Definitions

Quantitative genetics: genetic analysis of complex characteristics
Quantitative trait (most traits in nature are quantitative)
- Continuous: varies continuously, falls on a distribution (e.g. human height)
- Meristic: measured in whole numbers (e.g. litter size)
- Threshold: measured by presence or absence (e.g. after accumulating a certain # of mutations)
QTL: chromosome segments containing genes that influence a quantitative trait
QTL mapping: locating genes that affect quantitative traits For quantitative traits, each genotype may produce a range of possible phenotypes. AA and Aa could both show blue eyes, for example, and we wouldn’t be able to tell what genotype someone is (AA vs. Aa) if they have blue eyes.

Steps

Pick 2 parents, both homozygous, but homozygous for contrasting alleles at the same locus (e.g. AA and aa), so they have different phenotypes. Cross the 2 parents to produce F1 (all heterozygous).
Interbreed F1s (both heterozygous so as to give us a spectrum of genotypes and phenotypes in F2) to create F2 progeny.
Phenotype F2 for interested trait.
Genotype each F2 for molecular markers (e.g. sequence SNPs throughout the genome).
Test if any SNPs show linkage to interest trait:
1. Create a linkage map using the genotyping data from F2.
2. Analyze the genotypic data from the markers and phenotypic data from the traits. Goal: identify markers statistically associated with variations in the trait.
3. Identify QTLs: regions of genome significantly associated with the trait variation are identified as QTLs

Genome-wide Association Studies (GWAS)

GWAS def: a test of the association between SNPs across the genome (genotype) and a quantitative trait (phenotype)
GWAS applications
- Provided new possible therapeutic targets to treat/cure disease
- Revealed new biology underlying important traits
- Found common alleles with large effects

Steps

Collect DNA from many (often thousands of) cases (affected by disease) and controls (unaffected by disease)
Genotype everyone for tagSNPs (minimal set of SNPs that tag most common haplotypes) throughout the genome
Test for associations of each SNP genotype with disease status

QTL vs. GWAS

Method	Study Group	Generations of Meioses Used	Probability of Finding Linkage	Genomic Resolution (once linkage found)
QTL	A cross or pedigree (more controlled breeding scheme)	A few (often 2 gens)	High (bc we control and see all phenotypes in our experiments)	Poor (Hundreds of kilobases to megabases of linkage disequilibrium regions)
GWAS	populations of affected and unaffected individuals	Thousands of historical meioses (bc looking at population)	Low (bc big population)	Great (~10 kilo bases of linkage disequilibrium regions) i.e. generates a tight linkage between marker and trait

Developmental genomics

Visualizing gene and protein expression

In situ hybridization

Purpose: visualize endogenous mRNA expression within tissue samples/embryos without the need for sequencing, using a single-stranded sequence-specific RNA probe
I.e. can answer these questions
- Is a specific mRNA present (expressed)?
- If it’s present, where is it present?
  - Anywhere we see precipitate present = is where mRNA is expressed (but remember, where mRNA is expressed =/= where protein is functional bc transcribed =/= translated) Steps

Design a single-stranded probe specific (and complementary) to mRNA of interest: identify mRNA sequence to visualize, then design a single-stranded RNA probe complementary to our target mRNA sequence
Make RNA probe via transcription: single-stranded RNA probe is made by transcribing our mRNA sequence of interest. During transcription, we incorporate UTPs (modified nucleotides), which are conjugated with haptens (molecules that can elicit an immune response and thus can be recognized by specific antibodies)
Hybridize probe to target mRNA endogenously: incubate embryos with mRNA probe; probe will anneal to target mRNA present in sample bc probe and target mRNA have complementary sequences
Wash away unbound probe
Detect probe with an enzyme-conjugated antibody: use a specific antibody that recognizes the hapten conjugated to the probe; the antibody is conjugated to an enzyme, allowing for visualization of probe-target-mRNA complex.
Wash away unbound antibody
Add enzyme substrate
Visualize mRNA expression in light microscope: enzyme attached to antibody catalyzes a reaction with a substrate. The enzyme degrades the tissue and forms a purple precipitate that coalesces around where the antibody is. Location and intensity of the precipitate indicate presence and quantity of target mRNA in sample.

Immunolocalization

Purpose: visualize where a protein is expressed, using an antibody specific to our protein/transcription factor of interest (similar to ChIP-Seq)
Answers these questions
- Is a specific protein/tf present? If so, where is it present? Steps

Express protein of interest in foreign organism: produce protein/tf of interest in an organism different from the one being studied to ensure protein is recognized as foreign
Inject protein of interest into host animal: host animal’s immune system will recognize protein as foreign and produce antibodies against it
Extract and purify antibodies produced against protein: extract blood from immunized animal and purify the antibodies that specifically recognize the protein of interest
Incubate embryos with antibody (antibody binds to protein): incubate purified antibodies with tissue/samples where we want to visualize protein expression. Anywhere protein is translated, the antibody will detect and bind to it.
Wash away unbound antibody
Add a secondary (used for detection and visualization) antibody: introduce a 2nd antibody that recognizes the primary antibody; 2nd antibody is conjugated to a detection marker (e.g. fluorescent dye or an enzyme catalyzing a colorimetric reaction)
Visualize protein expression: use visualization technique (e.g. fluorescence microscopy) to detect signal from secondary antibody. Location and intensity of signal indicate presence and distribution of target protein within the sample.

Necessary vs. sufficient

How do we know if X (a gene, for e.g.) is necessary or sufficient?

Necessary: without X, an effect does not occur
Sufficient: with X, an effect occurs even when it shouldn’t

GAL4/UAS for ectopic expression

GAL4/UAS is a tool to test sufficiency
- GAL4 is a transcription factor; GAL4 is introduced into an organism’s genome under the control of a tissue-specific promoter (so GAL4 will only be made in certain cells in certain tissues, depending on our chosen promoter)
- UAS is the enhancer that GAL4 binds onto; UAS is placed upstream (5’) of a gene of interest that we want to study/over-express, so the binding of GAL4 to UAS prompts the initiation of transcription of the downstream gene, leading to its expression
Ectopic expression = express in an abnormal position (e.g. make eyes grow on legs)

How it works

Design 2 separate genetic constructs
1. GAL4 gene controlled by specific promoter that drives expression in desired tissue/cell type
2. Gene of interest placed downstream (3’) of a UAS enhancer sequence (GAL4 binds to the UAS enhancer sequence) and a minimal promoter
Introduce the 2 genetic constructs into host organism
Tissue-specific GAL4 expression: the tissue-specific promoter drives the expression of GAL4 in desired cells/tissue
UAS activation: GAL4 binds to the UAS sequences upstream of gene of interest in cells/tissues where UAS is expressed
Gene of interest is expressed: the binding of GAL4 to UAS prompts the activation and expression of the downstream gene of interest (specifically in cells/tissues where GAL4 is expressed)
Phenotypic analysis: organism is observed for effects of gene of interest’s expression in targeted cells/tissues. e.g. observe developmental outcomes

Outline

Lec 2 - Annotating protein-coding genes in the genome

Purpose: bc genome sequence alone doesn’t tell you anything about the sequence’s function How to find protein-coding sequences in genomes?

Find long open-reading frames (ORFs)
Look for conserved sequences (comparative, evolutionary genomics)
1. Coding sequences are highly conserved, esp among primates
RNA-Seq

Review Session

DNA sequencing, assembly, and alignment

Building a genomic library

see above

Forward genetics and complementation test

Forward genetics: start with phenotype; interested in a phenotype, want to identify genes involved in that phenotype
- Mutagenic screen: mutagenize many random places in genome, screen for mutations that affect phenotypes of interest, generate homozygous flies for each mutant phenotype (to be able to do complementation tests)
- Complementation tests with mutants: we want to know if the mutations are in the same or separate genes
  - If mutations are on separate genes, they will complement each other so progeny will have wildtype phenotype
- Complementation tests - things to know
  - Are done with homozygous mutants
  - Can tell us ho w many genes are involved in our phenotype of interest but not where the genes are located
  - Mutations that complement each other = WT
  - Mutations that dont’ complement each other = mutant phenotype
  - Crossing mutants to WT flies show if a mutation is dominant or recessive over the WT allele
Reverse genetics: start with genotype

QTL mapping

More loci = more phenotypic classes
Molecular markers: pre-determined regions in genome where there is known variation
QTL alone won’t tell you which specific gene is associated with a phenotype; just will tell you the region that contains the gene

GWAS and linkage

SNPs of interest have been separated by many thousands of meiosis events (vs. QTL only 2 generations, F1 and F2)
- So we need more molecular markers to identify loci of interest since there has been more separation among them
LD vs. LE
- LE (linkage equilibrium): unlinked loci
- LD (linkage disequilibrium): linked loci; presence of an allele at one site can help predict what the allele at the second site will be
- LE vs. LD is on a spectrum, not black and white

Notebook

Explorer

Recent writing

writing in public vs. private

Now

MCB 104 Pt 2 - Genomics

High-level Concepts

Sequencing techniques

Genome sequencing

RNA-Seq

Steps

ATAC-Seq (Assay for Transposase-Accessible Chromatin)

Steps

ChIP-Seq (Chromatin Immunoprecipitation)

Steps

Genome arrangement

Gain of genetic material

Loss of genetic material

Relocation of genetic material

Transposons

1. Retrotransposons (copy and paste)

Steps

2. DNA Transposon (cut and paste)

Steps

Promoters

Enhancers

Transcription factors regulate enhancers: activators & repressors

Reporter assay

Steps

Genetic screens

Complementation Testing

Example problems

Forward Genetics

Reverse Genetics

Genome editing

Double-Strand Break Repair - 2 repair pathways

Programmable Nucleases Allow Genome Engineering

CRISPR-Cas9

In nature (bacteria)

Components

Steps

In biotech

Components

Steps (in biotech)

Applications of CRISPR

Linking genetic variation with phenotypic variation

Genome-wide linkage maps

Quantitative Trait Loci (QTL)

Definitions

Steps

Genome-wide Association Studies (GWAS)

Steps

QTL vs. GWAS

Developmental genomics

Visualizing gene and protein expression

In situ hybridization

Immunolocalization

Necessary vs. sufficient

GAL4/UAS for ectopic expression

How it works

Outline

Lec 2 - Annotating protein-coding genes in the genome

Review Session

DNA sequencing, assembly, and alignment

Building a genomic library

Forward genetics and complementation test

QTL mapping

GWAS and linkage

Graph View

Table of Contents

Backlinks