Is it possible to get VCF from two fastq files without reference genome?
1
0
Entering edit mode
7.3 years ago
Charles Yin ▴ 180

Just want to know if I can get Variant Calling directly from two or more reads fastq files without reference genome? I need to compare SNPs between the two fastq files.

next-gen snp • 2.7k views
ADD COMMENT
0
Entering edit mode

You can certainly de-novo assemble them both, map the reads of each to the other's assembly, and get variant calls. Without a reference for coordinates, I'm not sure how useful that would be.

ADD REPLY
0
Entering edit mode

Thanks for answering my question. I just hope that VCF between two testing genomes can be obtained without reference genomes.

ADD REPLY
0
Entering edit mode

You can obtain VCF files using the method that I describe. However, as Dan notes, they will not necessarily be useful.

I think it would be helpful if you explained what organisms you are working with, what kind of data you have (the complete experimental setup), and what you are trying to accomplish. Blinded questions rarely yield useful results.

ADD REPLY
0
Entering edit mode

Yes, thanks for the suggestion. The goal of VCF calling is to find recombinants from two bacterial genomes, which were sequenced as raw fastq files. Since there will be large amounts of genomes in comparision for recombinants, it would be convenient to get recombinant information directly from VCF calling, instead of assembling whole genomes.

ADD REPLY
0
Entering edit mode

So, I'm confused as to why you have only 2 fastq files if you have a large number of recombinants... is this one species? 2 species? Are you combining lots of samples in a single library?

ADD REPLY
0
Entering edit mode

The two fastq files contain multiple reads from sequencing, each covering the whole genome of a sample. Thanks!

ADD REPLY
2
Entering edit mode
7.3 years ago
Dan D 7.4k

No. Variant calls are based on a reference genome sequence. Technically you could assemble the raw reads into a draft reference and variant call from that, but you'll still have to obtain some reference before you can perform variant calling.

EDIT: I may be incorrect in the case that you're working with well-characterized bacterial species and are looking for specific features. Check out this paper describing the KVarQ program. I imagine the same would be true if you're working with something like mitochondrial data.

ADD COMMENT
0
Entering edit mode

Thank you for your suggestions!

ADD REPLY

Login before adding your answer.

Traffic: 2075 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6