Extract the overlapping paired-end reads
0
0
Entering edit mode
3.9 years ago
ATRX ★ 1.1k

Hi,

I would like to know if there is a way to extract paired-end reads (and the genomic region) from the bam file where a certain part of R1 and R2 read overlaps the genomic location. For example, if R1 and R2 are the paired-end read then, I am interested in extracting this paired-end read from the bam file and extract the region that is between **.

R1 -------**----------**>
     <----**----------**---- R2

Any advice or suggestions would be very helpful.

Thanks, -Ar

samtools paired-end reads • 1.9k views
ADD COMMENT
1
Entering edit mode

not directly what you ask for but you could first merge overlapping reads (eg with FLASH or BBMerge or such) and then map those and see where they map.

ADD REPLY
0
Entering edit mode

Thanks! I will try it.

ADD REPLY
1
Entering edit mode

You can select overlapping mate pairs based on the template length (9th) field of SAM/BAM files. Quoting from How to quantify the overlapping reads in paired-end DNA sequencing to check the sequencing efficiency :

If the fragment length is less than the sum of two reads, it means your paired sequences are overlapped.

ADD REPLY
0
Entering edit mode

Can you not try samtools view region on the aligned BAM file to get the reads? Are you looking for consensus or the region from all read pairs?

ADD REPLY
0
Entering edit mode

Yes, I am not interested in a particular region but the regions in the entire genome where the paired-end reads R1 and R2 overlaps.

ADD REPLY
1
Entering edit mode

In that case following @lieven's suggestion will allow you to pre-select reads that overlap. You could then take the merged reads and align them (or identify read headers that merged in the pre-aligned BAM file).

ADD REPLY

Login before adding your answer.

Traffic: 1644 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6