Question

What's next after GATK variant calling pipeline?

0

Entering edit mode

8 months ago

mgranada3 ▴ 60

I have 63 DNA-seq files which I put through the GATK variant calling pipeline (https://gencore.bio.nyu.edu/variant-calling-pipeline-gatk4/)

This is my first time doing this and I am confused about what my next steps are. How do I know which information I need to create figures like a Mueller plot? Can anyone recommend a good guide (preferably using R) that may be compatible with my output?

My outputs were:

A .csv file with compiled statistics which included: # of Reads, # of Aligned Reads, % Aligned, # Aligned Bases, Read Length, % Paired, % Duplicate, Mean Insert Size, # SNPs, # Filtered SNPs, # SNPs after BQSR, # Filtered SNPs after BQSR, Average Coverage
Annotated SNP and Predicted Effects in a .html and .txt file. In the text file was #GeneName, GeneId, TranscriptId BioType, variants_impact_HIGH, variants_impact_LOW, variants_impact_MODERATE variants_impact_MODIFIER, variants_effect_3_prime_UTR_variant, variants_effect_5_prime_UTR_premature_start_codon_gain_variant, variants_effect_5_prime_UTR_variant, variants_effect_downstream_gene_variant, variants_effect_intron_variant, variants_effect_missense_variant, variants_effect_non_coding_transcript_variant, variants_effect_stop_lost, variants_effect_synonymous_variant, variants_effect_upstream_gene_variant
In the HTML file: Summary, Variant rate by chromosome, Variants by type, Number of variants by impact, Number of variants by functional class, Number of variants by annotation, Quality histogram, InDel length histogram, Base variant table, Transition vs transversions (ts/tv), Allele frequency, Allele Count, Codon change table, Amino acid change table, Chromosome variants plots, Details by gene

GATK pipeline figures DNA-seq • 654 views

ADD COMMENT • link updated 5 months ago by Michael 55k • written 8 months ago by mgranada3 ▴ 60

score 1 · Answer 1 · 2024-05-16

1

Entering edit mode

8 months ago

Michael 55k

After getting variants from HaplotypeCaller, there are a lot of different options for follow-up analyses, but it vastly depends on the scientific question and the organism. Are you interested in single variants and their effects, or genome level analysis? I am just going to sketch some options because I know too little about your samples:

run bcftools stats and generate the plots and report document (should do this all the time)
further filtration by MAF, Hardy-Weinberg, etc.
create summary statistics like number and type of variants, heterozygosity, theta, pi, detect LOH
perform linkage-analysis
detect sites under selection, selective sweeps
population genomics, detect admixture, population history
create phylogenetic trees
Look for known phenotype-associated SNPs in DBSNP, OMIM, etc. (only human)
Look at variants overlapping with your genes of interest

ADD COMMENT • link 6 months ago by Michael 55k

0

Entering edit mode

I am seeing how populations evolve over 200 generations in 3 different conditions. Per my instructor, he really wants to see Muller plots to see gene frequencies over time.

ADD REPLY • link 6 months ago by mgranada3 ▴ 60

0

Entering edit mode

I haven't done that before, but possibly the MullerPlot R-package could be used. You would have to figure out how to translate your genotype data into a population matrix of OTU's. Maybe based on some strain-specific markers? This depends on your settings. If you or your supervisor could show an example of a previous application this would help to figure out how to do this. Your application case seems interesting but I have seen this type of plot only in metagenomics studies so far.

ADD REPLY • link 5 months ago by Michael 55k