I have UV mutagenized microorganisms and screened for improved phenotypes. Now I want to identify the causative variants. I've sequenced (200x coverage) a cultivar (mixed population with improved phenotype) and have vcfs with thousands of variants. I was wanting to do GWAS and make some Manhattan Plots (treating each sequencing read like and individual) but it doesn't seem like that will work here. Maybe I should just plot allele frequencies? Is that calculated by doing AF = AD (allele depth) / DP (read depth) from the VCFs? Or what is the best way to find the dominant variants in this mixed population? Any suggestions on how to move forward? Thanks!
If you convert the .vcf to .tsv with e.g. GATK VariantsToTable, you can add in the AF field if its missing then simply sort by value in Excel and you will have the top variants.
usually when I plot allele frequency I do a histogram or box plot but will only show you the distribution of the allele frequencies, not the individual variants.