Entering edit mode
3.9 years ago
ATRX
★
1.1k
Hi,
I would like to know if there is a way to extract paired-end reads (and the genomic region) from the bam file where a certain part of R1 and R2 read overlaps the genomic location. For example, if R1 and R2 are the paired-end read then, I am interested in extracting this paired-end read from the bam file and extract the region that is between **.
R1 -------**----------**>
<----**----------**---- R2
Any advice or suggestions would be very helpful.
Thanks, -Ar
not directly what you ask for but you could first merge overlapping reads (eg with
FLASH
orBBMerge
or such) and then map those and see where they map.Thanks! I will try it.
You can select overlapping mate pairs based on the template length (9th) field of SAM/BAM files. Quoting from How to quantify the overlapping reads in paired-end DNA sequencing to check the sequencing efficiency :
Can you not try
samtools view region
on the aligned BAM file to get the reads? Are you looking for consensus or the region from all read pairs?Yes, I am not interested in a particular region but the regions in the entire genome where the paired-end reads R1 and R2 overlaps.
In that case following @lieven's suggestion will allow you to pre-select reads that overlap. You could then take the merged reads and align them (or identify read headers that merged in the pre-aligned BAM file).