Extract reads matching variants from VCF file
0
0
Entering edit mode
5.5 years ago
sppearce ▴ 10

I have a vcf file describing a set of SNPs, insertions and deletions. I would like to extract those reads from the bam file which correspond to these variants for some quality control.

For SNPs, I can loop through using samjs and produce one new bam file per SNP, but I don't know how to generalise this for indels (and I want those specific ones that match the vcf file, not just any indel at that base).

Is there any existing tool that will let me do this? Or a way to get samjs to do this.

snp vcf • 2.1k views
ADD COMMENT
1
Entering edit mode

it's the same thing, you 'just' have to test for the cigar elements 'D/N' or 'I'.

ADD REPLY
0
Entering edit mode

That presumably would get me any indel that exists covering the start position, rather than the specific indel in the vcf (I realise that is probably pretty niche).

ADD REPLY
1
Entering edit mode

CIGAR string does not only contain the operations (D, I, etc) but also the length of the operation. By comparing the length with the indel size in VCF, maybe you would be able to count the reads that support the specific indel in question. I usually use pysam for this kind of CIGAR queries.

https://pysam.readthedocs.io/en/latest/

ADD REPLY

Login before adding your answer.

Traffic: 2727 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6