Entering edit mode
5.1 years ago
a.james
▴
240
Hi All,
Say, I have a paired-end fastq
reads from 2 lanes, and vcf
file with variant allele. These I have generated using the varsim
tool.
My VCF file does not have the DP4 field to find the read coverage. I wanna find the read covered for each variant allele in the VCF file using the fastq reads. I do not wanna use any secondary analysis tools (alignment, variant calling, whatsoever).
In this case, can anyone suggest to me how to go forward?
Thank you!
You need the intermediate files (BAM files), otherwise, it is not feasible or will be ambiguous/untrusted results.
The BAM file can be generated using alignment, but I wanna avoid the inherent bias of the alignment tools. The generated
vcf
contains ground truth or simulated ground truth variant allele. For which, I wanna generate the read coverage.Checking
varsim
, looks like the coverage is a parameter of your simulation, it's not clear to me how they are generating the reads, but I guess after populating the reference genome with the "variants", they are fragmenting the genome and creating the Fastq files. Unless you can see in the simulation logs how was these reads generated, I don't see another way to get the real coverage unless you align your reads to the genome.Thanks for the tip, I can check the log files. In general,
varsim
generate the reads from the perturbated genome (by inserting variants into a user-provided reference genome) aka ground truth sequence. After that, reads are then simulated from this perturbed genome.