Finding reads having more than one mutation in BAM file
2
1
Entering edit mode
8.7 years ago
marki ▴ 60

Hi,

I have a set of BAM files (pair-end reads) from which I want to identify reads having more than one mutation. I already have the mutation calls for the same data. I am thinking of the following steps to achieve this.

  1. Identify pairs of "nearby" mutations (based on a maximum distance, say 400-500).
  2. Construct a BED file from above containing chromosome, start and end information, where 'start' will be the first and 'end' will be the second of above pair.
  3. Find intersecting regions between above BED and BAM file to filter putative reads.
  4. Query each read for presence of mutation pair

I have two questions. Firstly, I am wondering if I have to somehow take the strand information into account while finding intersected regions (I don't have the strand information in the variant-calling file)? Secondly, is there an easier alternative that I may use for this analysis? Apologies if some of the info is not-correct/too-basic as I am newbie to this kind of analysis.

Thanks!

Ikram

next-gen R exome-sequencing BAM • 3.4k views
ADD COMMENT
2
Entering edit mode

If you know a bit of python, its fairly simple with pysam which supports reading bam and vcf file.

For each SNP ( as you already have the calls), find the overlapping reads ( with pileup function ) and check if the base on read matched with the ref/atl allele, and push the read names to a list/dictionary. Then check if you encounter the same read again for another SNP. There could be other optimised ways.

Note: Be careful that both the read and its mate has the same name.

ADD REPLY
2
Entering edit mode
8.7 years ago

A few alternate paths:

If you don't have to do this for a lot, you could just eyeball in IGV.

If that's not feasible, another way

1) pick a one allele at a variant site 2) identify the reads with the desired allele in one read 3) pick out the mates for each of those reads 4) realign 5) assess which other variant sites are now covered 6) asses the alleles present at those sites

Another way, if the variant frequency isn't too large:

1) make short reference sequences containing each possible permutation (i.e if there are 3 mutations together in your window, that's 8 possible sequences) 2) align to those short sequences 3) see which permutations have reads aligning to them.

ADD COMMENT
1
Entering edit mode
8.7 years ago

I would have a look at the CIGAR string of your mapped reads and look for > 1 mismatches/insertions/deletions

ADD COMMENT
0
Entering edit mode

In a CIGAR string, you can't see mismatches, only indels. Plus, there will be a lot of one-off errors to sift through.

ADD REPLY

Login before adding your answer.

Traffic: 2420 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6