extracting fastq's from a vcf file
1
0
Entering edit mode
3.8 years ago
wrab425 ▴ 50

I have been given a set of vcf files and wish to re-extract the original fastq's, both mapped and unmapped. Is this possible and if so what is the best way in which to do this?

William

Assembly genome alignment • 2.2k views
ADD COMMENT
2
Entering edit mode

You can't 'extract' a fastq file from a .vcf, since the fastq contains quality scores that are lost when you make genotype calls.

This seems like a potential xy problem - why do you need the fastq files from the vcf anyway?

ADD REPLY
0
Entering edit mode

Are you sure you mean FASTQ? And not FASTA?

ADD REPLY
0
Entering edit mode

How do you imagine unmapped reads could possibly be stored in a vcf?

ADD REPLY
0
Entering edit mode
3.8 years ago

You cannot extract the reads (fasta or fastq) from the vcf. The reads and mapping information, whether mapped or not, are contained within the bam file used to generate the vcf.

See if you can get your hands on the bam file associated with each vcf

Once you have that you can follow many threads on extracting mapped and unmapped reads (in fastq format) from bam files.

But the jist is to subsample your bam using a tag

samtools view -F 4 sample.bam -o sample.mapped.bam
samtools view -f 4 sample.bam -o sample.unmapped.bam

Then convert to fastq

samtools fastq sample.mapped.bam > sample.mapped.fq

Use the -1 and -2 tags to split up paired end reads into separate fastq files for each sample

ADD COMMENT

Login before adding your answer.

Traffic: 1490 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6