I have been given a set of vcf files and wish to re-extract the original fastq's, both mapped and unmapped. Is this possible and if so what is the best way in which to do this?
William
I have been given a set of vcf files and wish to re-extract the original fastq's, both mapped and unmapped. Is this possible and if so what is the best way in which to do this?
William
You cannot extract the reads (fasta or fastq) from the vcf. The reads and mapping information, whether mapped or not, are contained within the bam file used to generate the vcf.
See if you can get your hands on the bam file associated with each vcf
Once you have that you can follow many threads on extracting mapped and unmapped reads (in fastq format) from bam files.
But the jist is to subsample your bam using a tag
samtools view -F 4 sample.bam -o sample.mapped.bam
samtools view -f 4 sample.bam -o sample.unmapped.bam
Then convert to fastq
samtools fastq sample.mapped.bam > sample.mapped.fq
Use the -1 and -2 tags to split up paired end reads into separate fastq files for each sample
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
You can't 'extract' a fastq file from a .vcf, since the fastq contains quality scores that are lost when you make genotype calls.
This seems like a potential xy problem - why do you need the fastq files from the vcf anyway?
Are you sure you mean FASTQ? And not FASTA?
How do you imagine unmapped reads could possibly be stored in a vcf?