Hi.
I have a "VCF" file that i've created "de novo". That is, I have NOT mapped my reads to a reference genome, but rather constructed this VCF by utilizing sequence similarity. Within this VCF file, I have roughly 9000 SNPs. A small portion of these - ca 100 - I've identified as "outliers" (using other tools). I want to investigate these SNPs further and see if they are connected to a some genetic function but I don't know how to proceed. My understanding is that I need the DNA sequence where these SNPs are found, so that I may "Blast" them against other sequences in some database.
But how do I actually find the DNA sequences that I need, if I don't the ability to map the reads to a reference genome? I still have my "FASTA" files, of course, where I'm guessing the DNA sequences are found.
Thank you!
Doing alignments if you did not do mapping? Or otherwise how did you do this.
I've used the "de novo" approach described here: https://academic.oup.com/g3journal/article/1/3/171/5986549 It checks for sequence similarity among reads, creates loci and genotypes SNPs that way.