What's next after GATK variant calling pipeline?
1
0
Entering edit mode
4 months ago
mgranada3 ▴ 50

I have 63 DNA-seq files which I put through the GATK variant calling pipeline (https://gencore.bio.nyu.edu/variant-calling-pipeline-gatk4/)

This is my first time doing this and I am confused about what my next steps are. How do I know which information I need to create figures like a Mueller plot? Can anyone recommend a good guide (preferably using R) that may be compatible with my output?

My outputs were:

  1. A .csv file with compiled statistics which included: # of Reads, # of Aligned Reads, % Aligned, # Aligned Bases, Read Length, % Paired, % Duplicate, Mean Insert Size, # SNPs, # Filtered SNPs, # SNPs after BQSR, # Filtered SNPs after BQSR, Average Coverage

  2. Annotated SNP and Predicted Effects in a .html and .txt file. In the text file was #GeneName, GeneId, TranscriptId BioType, variants_impact_HIGH, variants_impact_LOW, variants_impact_MODERATE variants_impact_MODIFIER, variants_effect_3_prime_UTR_variant, variants_effect_5_prime_UTR_premature_start_codon_gain_variant, variants_effect_5_prime_UTR_variant, variants_effect_downstream_gene_variant, variants_effect_intron_variant, variants_effect_missense_variant, variants_effect_non_coding_transcript_variant, variants_effect_stop_lost, variants_effect_synonymous_variant, variants_effect_upstream_gene_variant

  3. In the HTML file: Summary, Variant rate by chromosome, Variants by type, Number of variants by impact, Number of variants by functional class, Number of variants by annotation, Quality histogram, InDel length histogram, Base variant table, Transition vs transversions (ts/tv), Allele frequency, Allele Count, Codon change table, Amino acid change table, Chromosome variants plots, Details by gene

GATK pipeline figures DNA-seq • 419 views
ADD COMMENT
1
Entering edit mode
4 months ago
Michael 55k

After getting variants from HaplotypeCaller, there are a lot of different options for follow-up analyses, but it vastly depends on the scientific question and the organism. Are you interested in single variants and their effects, or genome level analysis? I am just going to sketch some options because I know too little about your samples:

  • run bcftools stats and generate the plots and report document (should do this all the time)
  • further filtration by MAF, Hardy-Weinberg, etc.
  • create summary statistics like number and type of variants, heterozygosity, theta, pi, detect LOH
  • perform linkage-analysis
  • detect sites under selection, selective sweeps
  • population genomics, detect admixture, population history
  • create phylogenetic trees
  • Look for known phenotype-associated SNPs in DBSNP, OMIM, etc. (only human)
  • Look at variants overlapping with your genes of interest
ADD COMMENT
0
Entering edit mode

I am seeing how populations evolve over 200 generations in 3 different conditions. Per my instructor, he really wants to see Muller plots to see gene frequencies over time.

ADD REPLY
0
Entering edit mode

I haven't done that before, but possibly the MullerPlot R-package could be used. You would have to figure out how to translate your genotype data into a population matrix of OTU's. Maybe based on some strain-specific markers? This depends on your settings. If you or your supervisor could show an example of a previous application this would help to figure out how to do this. Your application case seems interesting but I have seen this type of plot only in metagenomics studies so far.

ADD REPLY

Login before adding your answer.

Traffic: 1170 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6