Question

Question about best way to extraction mates of reads

0

Entering edit mode

7.7 years ago

user191006 ▴ 10

When making read extraction with regions in BAM file, also mates are considered. What is the most efficient way also to collect them?

bam alignment next-gen-sequencing • 2.3k views

ADD COMMENT • link updated 24 months ago by Ram 44k • written 7.7 years ago by user191006 ▴ 10

0

Entering edit mode

Hi,

May I ask have you solve the problem? I also encounter the same problem. I want to extract the mate reads of specific reads even if the mate is unmapped. Actually, I have posted a question at How to extract unmapped mate for specific reads using pysam or other module in python?.

Thank you!

Best regards,

Xi Zeng

ADD REPLY • link 2.2 years ago by zengxi.hada ▴ 90

score 0 · Answer 1 · 2017-05-26

0

Entering edit mode

7.7 years ago

Devon Ryan 105k

Just query-sort the files (samtools sort -n). You don't need to randomly access anything then.

ADD COMMENT • link 7.7 years ago by Devon Ryan 105k

0

Entering edit mode

This is not what I want, sorting is not the problem for me. I focused on mates, my aim is to collect all of the mates of reads, thanks for response.

ADD REPLY • link 7.7 years ago by user191006 ▴ 10

0

Entering edit mode

By sorting on read names mates will be following each other in the sam file.

ADD REPLY • link 7.7 years ago by WouterDeCoster 47k

0

Entering edit mode

If your definition of mates is not what we are expecting then you should spell your definition out. mates, to most, are R1/R2 reads from a fragment.

ADD REPLY • link 7.7 years ago by GenoMax 148k

0

Entering edit mode

I am going over regions in BED file. I just focus on reads in the region at a time, I try to collect their mates even if they are not in the region.

ADD REPLY • link 7.7 years ago by user191006 ▴ 10

0

Entering edit mode

If the mates aren't terribly close then you'll end up doing a lot of linear searches even if you get both mates into the same bgzf block. I suspect it'll be faster to query sort and search for overlaps with the BED file.

ADD REPLY • link 7.7 years ago by Devon Ryan 105k

score 0 · Answer 2 · 2017-05-26

I would suggest performing your alignment step using BWA within the Speed-Seq suite. This will automatically extract discordant mate pairs (that map to different chromosomes, for example) in a separate bam file for you, which will enable you to look at them without having to go through a huge bam file of "normal" reads. You can load these into IGV or another browser and look at your discordant reads without having all of the "normal" reads bogging you down.

Speed-Seq can be downloaded here: https://github.com/hall-lab/speedseq.

Here is a simple example of how to run Speed-Seq alignment that will result in the files I just described: https://github.com/samuelwb/tumor-evolution/blob/master/Alignment/SpeedSeqAlign.sh.