Fetch pairs of discordantly aligned reads
2
0
Entering edit mode
7.1 years ago
sure ▴ 110

I am trying to get discordant aligned reads from a bam file especially reads that are aligned to two different chromosomes/references. I tried samtools view to fetch reads aligned to one chromosome but this does not pull/include the second read which might have aligned to different chromosome. I am looking for a solution to achieve it. Thanks for your help and time in advance. Sorry if I missed any already posted solution.

alignment genome • 2.8k views
ADD COMMENT
1
Entering edit mode
7.1 years ago

using samjdk: http://lindenb.github.io/jvarkit/SamJdk.html

$ java -jar dist/samjdk.jar -e 'return record.getReadPairedFlag() && !record.getReadUnmappedFlag() && !record.getMateUnmappedFlag() && !record.getReferenceName().equals(record.getMateReferenceName());'  in.bam
ADD COMMENT
0
Entering edit mode

Thanks Pierre, Can I specify reference/chromosome of my first read, such that I can fetch one read aligned to a specific chromosome of the discordant pair and the other anywhere else?

ADD REPLY
0
Entering edit mode

sure do something like (not checked):

$ java -jar dist/samjdk.jar -e 'if(!record.getReadPairedFlag() || record.getReadUnmappedFlag() || record.getMateUnmappedFlag()) return false; final String c1= record.getReferenceName(); final String c2=record.getMateReferenceName(); return (c1.equals("chr4") && c2.equals("chr3")) || (c1.equals("chr3") && c2.equals("chr4"));' in.bam

ADD REPLY
0
Entering edit mode

Thanks again, Will try it and leave a note on how did it go.

ADD REPLY
0
Entering edit mode
7.1 years ago
d-cameron ★ 2.9k

I am looking for a solution to achieve it.

What is the purpose of this analysis? Are you attempting to identify structural variations? If so, why not use an actual SV caller (such as manta or GRIDSS), instead of effectively creating a simple one yourself?

I am trying to get discordant aligned reads from a bam file especially reads that are aligned to two different chromosomes/references.

Some aligners consider reads concordantly aligned if they are on the same chromosome in the expected orentiation _regardless of the inferred fragment size_. That is, two reads aligning 100Mb apart can be considered concordant. Do you want to extract these reads?

My SV caller GRIDSS performs some pre-processing, which includes creating a BAM file that contains only reads indicating the presence of some sort of SV. The gridss.ExtractSVReads utility program has parameters to choose whether to extract discordant read pairs (based on the actual fragment size distribution), split reads, soft clipped reads, one end anchored reads (one read maps, the other doesn't), and unmapped read pairs. All of these can provide support for a putative structural variation.

For inter-chromsomal rearrangements, the gridss.ComputeSamTags utility program can be useful as it ensures the optional read mate tags are populated, something which many aligners don't bother doing. By default, it also populates the R2 and Q2 tags although, depending on what you're doing, these might not be useful to you.

If this is sounding like it could be too much effort, I recommend running GRIDSS - it also performs assembly of the aforementioned reads and can give you both variant calls, supporting assembly contigs, as well as a breakdown of the supporting reads (e.g. 4 discordant read pairs, 2 split reads, 1 assembly contig, 1 one end anchored read pair, 4 soft-clipped reads).

ADD COMMENT

Login before adding your answer.

Traffic: 1874 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6