When making read extraction with regions in BAM file, also mates are considered. What is the most efficient way also to collect them?
When making read extraction with regions in BAM file, also mates are considered. What is the most efficient way also to collect them?
Just query-sort the files (samtools sort -n
). You don't need to randomly access anything then.
I would suggest performing your alignment step using BWA within the Speed-Seq suite. This will automatically extract discordant mate pairs (that map to different chromosomes, for example) in a separate bam file for you, which will enable you to look at them without having to go through a huge bam file of "normal" reads. You can load these into IGV or another browser and look at your discordant reads without having all of the "normal" reads bogging you down.
Speed-Seq can be downloaded here: https://github.com/hall-lab/speedseq.
Here is a simple example of how to run Speed-Seq alignment that will result in the files I just described: https://github.com/samuelwb/tumor-evolution/blob/master/Alignment/SpeedSeqAlign.sh.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Hi,
May I ask have you solve the problem? I also encounter the same problem. I want to extract the mate reads of specific reads even if the mate is unmapped. Actually, I have posted a question at How to extract unmapped mate for specific reads using pysam or other module in python?.
Thank you!
Best regards,
Xi Zeng