Question

Finding the best alignment from a paternal and maternal BAM file

0

Entering edit mode

5.8 years ago

a.james ▴ 240

Hello All,

Say I have 2 BAM files from 2 diploid reference genome(simulated maternal and paternal one). Now I need to extract the best reads alignments from both the BAMs and merged into oneBAM file. I have sorted both Bam files. I would like to know the best ways to proceed, at the end I need to get the coverage of each variants from the BAM file.

Any suggestion would be greatly appreciated.

RNA-Seq • 1.4k views

ADD COMMENT • link updated 5.8 years ago by Carlo Yague 9.0k • written 5.8 years ago by a.james ▴ 240

score 2 · Accepted Answer · 2019-11-18

2

Entering edit mode

5.8 years ago

Carlo Yague 9.0k

May be it would be easier to map the reads on a reference made of both parental genome, so they compete as template for the alignment. From there, selecting the best alignment is straightforward.

Now if you really have to start from the two bam files, here is a possible strategy:

Remove all secondary/supplementary alignments but keep unaligned reads in each bam file in such a way that the number of reads in each bam file equal the number of reads sequenced.
Sort the bam files by read name.
Compare the mapping quality score (5th field of bam/sam entry) of each entry of one bam file with the corresponding entry on the other one (same line number, and same read name).
Save the entries with highest quality score into a new sam/bam file.

ADD COMMENT • link 5.8 years ago by Carlo Yague 9.0k

0

Entering edit mode

@Carlo Thanks for the detailed answer, I have doubt. For example, the 3rd point comparing based on the MAPQ value and in that case which threshold is good, (sy, MAPQ 30 or 20 etc)? For the 4th point is there a tool to save or extract the reads from the BAM file based on the given list of read_ids?

Thank you

ADD REPLY • link 5.8 years ago by a.james ▴ 240

0

Entering edit mode

which threshold is good for the MAPQ value ?

You don't really need a treshold, just that one score is bigger than the other, right ?

is there a tool to extract reads from the BAM file based on read_ids?

Yes, for instance picard-tools FilterSamReads with the option FILTER=includeReadList

ADD REPLY • link 5.8 years ago by Carlo Yague 9.0k