Detecting junctions from reads spanning large indels
1
0
Entering edit mode
7.1 years ago
Adrian Pelin ★ 2.6k

Hello,

I am mapping my reads with an aligner that allows for large indels (bbmap, other suggestions welcome).

I am than using SamJs to get reads that have large indels >1kb

java -jar ${HOME}/programs/jvarkit/dist/samjs.jar --file get_indel_sam_filter.javascript --samoutputformat SAM -o Fasteris_Copenhagen_PlaquePurified_150903_SND104_B_L007_R1_HXE-3_mapped_Indel-1kb_v2.sam Fasteris_Copenhagen_PlaquePurified_150903_SND104_B_L007_R1_HXE-3_mapped.sort.bam

Java script used, cat get_indel_sam_filter.javascript :

function accept(rec) {
if(rec.getReadUnmappedFlag()) return true; 
var c=rec.getCigar(); if(c==null) return true; 
for(var i=0;i< c.numCigarElements();++i) {
var ce=c.getCigarElement(i);
if(ce.getLength()<1000) continue;
var s=ce.getOperator().name(); 
if( s=="D") return true; 
} return false;
}accept(record)

Now I would like to get a coordinate file with the start and end location of indels. For instance this record:

HWI-D00104:276:C6U65ANXX:7:1207:20052:96123 1:N:0:TTAGGC    0   Copenhagen_M35027   113671  38  89=11532D36=    *   0   0   TTTTTAACAGTTTGAGGTTTTAGATTTTTAGTTACAGAAGTGATATCGAATATTTTATCCAAAAAGAATGAGTAATTAATTGTCTTAGAAACGTCAACAACGAGAAAAAAGATTTAATATTACAG   BBBBBFFFFBFFFFFFFFBFFFFFF/BFF<<FFFB/B/FFFFFF//BFBBBBFFFFFFFFFFBFFFFBB/FBFFFFFFB<FFBFBFF<FF/F/FBFF</B</BFFBFFF/FF/<F////FBB/B7   AM:i:38 NM:i:11532

Has an indel between 113759 and 125292.

I am not sure if there is already a tool that easily finds these. Ideally, the cigar string before and after the indel is an 100% match (like in the example record).

Thanks.

DNA-seq sam • 2.2k views
ADD COMMENT
2
Entering edit mode
7.1 years ago

using bioalcidaejdk: http://lindenb.github.io/jvarkit/BioAlcidaeJdk.html

java -jar dist/bioalcidaejdk -e  'stream().filter(R->!R.getReadUnmappedFlag()).forEach(R->{ int refpos = R.getStart(); for(CigarElement ce:R.getCigar()) { CigarOperator op = ce.getOperator(); int len = ce.getLength(); if(len>=1000 && (op.equals(CigarOperator.N) || op.equals(CigarOperator.D))) { println(R.getContig()+"\t"+(refpos-1)+"\t"+(refpos+len)); } if(op.consumesReferenceBases())  refpos+=len; } }); ' in.bam

later: fixed extra semi-colon after ' if(op.consumesReferenceBases()) ...'

ADD COMMENT
0
Entering edit mode

sweet, goes directly to coordinates. Is there any way to get an intermediary sam file to see which reads were counted for generating coordinates?

ADD REPLY
1
Entering edit mode

using samjdk (faster than samjs)

 java -jar dist/samjdk.jar -e 'if(record.getReadUnmappedFlag()) return false; int refpos = record.getStart(); for(CigarElement ce:record.getCigar()) { CigarOperator op = ce.getOperator(); int len = ce.getLength(); if(len>=1000 && (op.equals(CigarOperator.N) || op.equals(CigarOperator.D))) { return true; } if(op.consumesReferenceBases()) refpos+=len; } return false;' in.bam
ADD REPLY
0
Entering edit mode

sorry: that was a stupid copy+paste, here is a shorter syntax:

$ java -jar dist/samjdk.jar -e 'return !record.getReadUnmappedFlag() && record.getCigar().getCigarElements().stream().anyMatch(C->C.getLength()>=1000 && (C.getOperator()==CigarOperator.N || C.getOperator()==CigarOperator.D));'  in.bam
ADD REPLY
0
Entering edit mode

oh I see. Both still do the same thing though, correct?

ADD REPLY

Login before adding your answer.

Traffic: 2588 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6