Question about best way to extraction mates of reads
2
0
Entering edit mode
7.5 years ago
user191006 ▴ 10

When making read extraction with regions in BAM file, also mates are considered. What is the most efficient way also to collect them?

bam alignment next-gen-sequencing • 2.2k views
ADD COMMENT
0
Entering edit mode

Hi,

May I ask have you solve the problem? I also encounter the same problem. I want to extract the mate reads of specific reads even if the mate is unmapped. Actually, I have posted a question at How to extract unmapped mate for specific reads using pysam or other module in python?.

Thank you!

Best regards,

Xi Zeng

ADD REPLY
0
Entering edit mode
7.5 years ago

Just query-sort the files (samtools sort -n). You don't need to randomly access anything then.

ADD COMMENT
0
Entering edit mode

This is not what I want, sorting is not the problem for me. I focused on mates, my aim is to collect all of the mates of reads, thanks for response.

ADD REPLY
0
Entering edit mode

By sorting on read names mates will be following each other in the sam file.

ADD REPLY
0
Entering edit mode

If your definition of mates is not what we are expecting then you should spell your definition out. mates, to most, are R1/R2 reads from a fragment.

ADD REPLY
0
Entering edit mode

I am going over regions in BED file. I just focus on reads in the region at a time, I try to collect their mates even if they are not in the region.

ADD REPLY
0
Entering edit mode

If the mates aren't terribly close then you'll end up doing a lot of linear searches even if you get both mates into the same bgzf block. I suspect it'll be faster to query sort and search for overlaps with the BED file.

ADD REPLY
0
Entering edit mode
7.5 years ago
Samuel Brady ▴ 330

I would suggest performing your alignment step using BWA within the Speed-Seq suite. This will automatically extract discordant mate pairs (that map to different chromosomes, for example) in a separate bam file for you, which will enable you to look at them without having to go through a huge bam file of "normal" reads. You can load these into IGV or another browser and look at your discordant reads without having all of the "normal" reads bogging you down.

Speed-Seq can be downloaded here: https://github.com/hall-lab/speedseq.

Here is a simple example of how to run Speed-Seq alignment that will result in the files I just described: https://github.com/samuelwb/tumor-evolution/blob/master/Alignment/SpeedSeqAlign.sh.

ADD COMMENT

Login before adding your answer.

Traffic: 2462 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6