How to find DNA sequence of a SNP without reference genome? (I have a .vcf file)

0

Entering edit mode

18 months ago

Roland ▴ 20

Hi.

I have a "VCF" file that i've created "de novo". That is, I have NOT mapped my reads to a reference genome, but rather constructed this VCF by utilizing sequence similarity. Within this VCF file, I have roughly 9000 SNPs. A small portion of these - ca 100 - I've identified as "outliers" (using other tools). I want to investigate these SNPs further and see if they are connected to a some genetic function but I don't know how to proceed. My understanding is that I need the DNA sequence where these SNPs are found, so that I may "Blast" them against other sequences in some database.

But how do I actually find the DNA sequences that I need, if I don't the ability to map the reads to a reference genome? I still have my "FASTA" files, of course, where I'm guessing the DNA sequences are found.

Thank you!

vcf SNP reference • 546 views

ADD COMMENT • link 18 months ago by Roland ▴ 20

0

Entering edit mode

I have NOT mapped my reads to a reference genome, but rather constructed this VCF by utilizing sequence similarity.

Doing alignments if you did not do mapping? Or otherwise how did you do this.

ADD REPLY • link 18 months ago by GenoMax 147k

0

Entering edit mode

I've used the "de novo" approach described here: https://academic.oup.com/g3journal/article/1/3/171/5986549 It checks for sequence similarity among reads, creates loci and genotypes SNPs that way.

ADD REPLY • link 18 months ago by Roland ▴ 20

Login before adding your answer.