Extracting paired-end reads on different contigs from bam file
2
0
Entering edit mode
10.1 years ago
Adrian Pelin ★ 2.6k

Hello,

I have 5 strains mixed in culture and propagated and I am trying to measure recombination rates by sequencing total DNA extracted after certain periods of time.

One way would probably be to extract paired end reads that map to different strains (forward reads maps to strain A reverse read maps to strain B). In this case, the recombination (cross over) event happend in the unsequenced part between the paired reads.

I am interested in finding all cases where reads of the same pair mapped to different references. I am sure there are other wise of spotting recombination and I will look into it, but this is a good way to start.

Adrian

recombination bam • 6.8k views
ADD COMMENT
0
Entering edit mode

'Tophat2' has a functionality --fusion-search to report a read pair that map to different chromosome. It will consider them as fusion genes. This mat help you if you treat your strains as different chromosomes. But I have never used 'fusion-search'. You may need to give a try.

ADD REPLY
0
Entering edit mode

Looks promising, I will give it a try.

ADD REPLY
2
Entering edit mode
10.1 years ago

This is a command that implements what Deedee recommended.

samtools view -h input.bam | awk '$7 != "=" || $1 ~ /^@/' | samtools view -bS - > output.bam
ADD COMMENT
1
Entering edit mode
10.1 years ago
Dan D 7.4k

What you're describing is called a "discordant alignment." The SAM specification makes it easy to find these.cases in your alignment data.

For each read in your SAM file, look at the seventh tab-delimited column (RNEXT). If there is an = sign, it means that the other read in the pair mapped to the same contig. Otherwise, the name of the contig to which the other read in the pair mapped will be shown.

Examples:

Here, both reads mapped to the same contig:

HWI-M00700R:112:000000000-AA21K:1:2114:6115:21870       147     chr1    2605319 0       76M     **=**       2605105 -290    CGAGGTGAGCATCTGTCCTCCCGGAGCAGGACCCATACCTCCAGGCGAGCATCTGAACCCATGGAGCAGCACACAC    GGGGGGGFGGGGGGGGGFCGGGGGGGGGGGGGGGGGFCFCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGCCCCC

Here, the sequence mapped to chr2, but its mate mapped to chr9. This is a discordant alignment.

HWI-M00700R:112:000000000-AA21K:1:1101:26942:10190      163     chr2    3666911 60      76M     chr9       3667508 672     GATGATAAATATTTTTTATAAAGTCATATTGGAAAGGATGGGGTCATGAATTTCTGTGGGGAAAAGAAAGAGAGAT    CCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG8F@FGGGGGF    `
ADD COMMENT
0
Entering edit mode

In your second case, doesn't it look like both reads mapped to the same chromosome since there is still an = sign? and it shows chr1. It would be great to be able to extract these from a sam file. Any useful grep command?

ADD REPLY
1
Entering edit mode

Yes, I made a mistake. Been staring at a screen too long today, haha! Good catch! Ashutosh has your solution, but hopefully I provided some good background.

ADD REPLY
0
Entering edit mode

Hi Deedee, This may not be the right place to ask this. But do you work at Vanderbilt?

ADD REPLY

Login before adding your answer.

Traffic: 1862 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6