I have 63 DNA-seq files which I put through the GATK variant calling pipeline (https://gencore.bio.nyu.edu/variant-calling-pipeline-gatk4/)
This is my first time doing this and I am confused about what my next steps are. How do I know which information I need to create figures like a Mueller plot? Can anyone recommend a good guide (preferably using R) that may be compatible with my output?
My outputs were:
A .csv file with compiled statistics which included: # of Reads, # of Aligned Reads, % Aligned, # Aligned Bases, Read Length, % Paired, % Duplicate, Mean Insert Size, # SNPs, # Filtered SNPs, # SNPs after BQSR, # Filtered SNPs after BQSR, Average Coverage
Annotated SNP and Predicted Effects in a .html and .txt file. In the text file was #GeneName, GeneId, TranscriptId BioType, variants_impact_HIGH, variants_impact_LOW, variants_impact_MODERATE variants_impact_MODIFIER, variants_effect_3_prime_UTR_variant, variants_effect_5_prime_UTR_premature_start_codon_gain_variant, variants_effect_5_prime_UTR_variant, variants_effect_downstream_gene_variant, variants_effect_intron_variant, variants_effect_missense_variant, variants_effect_non_coding_transcript_variant, variants_effect_stop_lost, variants_effect_synonymous_variant, variants_effect_upstream_gene_variant
In the HTML file: Summary, Variant rate by chromosome, Variants by type, Number of variants by impact, Number of variants by functional class, Number of variants by annotation, Quality histogram, InDel length histogram, Base variant table, Transition vs transversions (ts/tv), Allele frequency, Allele Count, Codon change table, Amino acid change table, Chromosome variants plots, Details by gene
I am seeing how populations evolve over 200 generations in 3 different conditions. Per my instructor, he really wants to see Muller plots to see gene frequencies over time.
I haven't done that before, but possibly the MullerPlot R-package could be used. You would have to figure out how to translate your genotype data into a population matrix of OTU's. Maybe based on some strain-specific markers? This depends on your settings. If you or your supervisor could show an example of a previous application this would help to figure out how to do this. Your application case seems interesting but I have seen this type of plot only in metagenomics studies so far.