Hi, Suppose I have a bam file and a vcf file containing variant calling result. I want to extract only reads with their mate that support variant allele in the vcf. It would be nice to get those reads in bam format. I tried googling such tools to do this and found like VariantBAM but it reports both reads that supporting and not supporting variant.
Thanks
I am going to tag Pierre Lindenbaum
He may already have something written to do this.
Hello,
have a look at this thread. It might be useful for you.
fin swimmer
Hello, Yeah, I found this thread however they don't give me the mate reads.
If it gives you the read names that support the variant, you can take this list as an input for FilterSamReads or a simple
grep
. Depending on your aligner the mate have the same read name and should be returned as well.fin swimmer
This would be a nice tool indeed. For visualization, the program I've written, ASCIIGenome, has the option filterVariantReads that gets close to what you need if combined with the print command. It could be scripted and automated but it's not quite what you ask for.
hi Pierre, Excellent!!! This is something I was trying to do a couple of weeks ago and settled for VariantBam. But now I see this discussion, I would like to ask if this is applicable for structural variants as well. As we all know, the vcf for structural variants is a bit different compared to SNVs. Here are how the vcf records look like
Getting the supporting reads for these variants is a little confusing. And at the moment, I am taking the reads from both breakpoints separately and then combining. This is a little laborious way but as I was on a tight deadline, I just did it manually. It would be nice to have a way to extract just supporting reads for such structural variants (helps visualization and validation).
Thanks in advance, Venkatesh Chellappa (Venki)
it's too complicated for now, but anyway, why would you need to extract the variant of a SV ? Isn't visualization enough to validate a variant ?
I need the "reads supporting the variant" to visualize in IGV. I am currently using an "evidence bam" that contains reads that support breakpoints on left and right ends of the structural variants.
so yeah, me manually extracting the reads based on the loci of breakpoints is too laborious and I want to know if anyone has working solution for this.
I am thinking of opening a new discussion!