Hello,
I am mapping my reads with an aligner that allows for large indels (bbmap, other suggestions welcome).
I am than using SamJs to get reads that have large indels >1kb
java -jar ${HOME}/programs/jvarkit/dist/samjs.jar --file get_indel_sam_filter.javascript --samoutputformat SAM -o Fasteris_Copenhagen_PlaquePurified_150903_SND104_B_L007_R1_HXE-3_mapped_Indel-1kb_v2.sam Fasteris_Copenhagen_PlaquePurified_150903_SND104_B_L007_R1_HXE-3_mapped.sort.bam
Java script used, cat get_indel_sam_filter.javascript :
function accept(rec) {
if(rec.getReadUnmappedFlag()) return true;
var c=rec.getCigar(); if(c==null) return true;
for(var i=0;i< c.numCigarElements();++i) {
var ce=c.getCigarElement(i);
if(ce.getLength()<1000) continue;
var s=ce.getOperator().name();
if( s=="D") return true;
} return false;
}accept(record)
Now I would like to get a coordinate file with the start and end location of indels. For instance this record:
HWI-D00104:276:C6U65ANXX:7:1207:20052:96123 1:N:0:TTAGGC 0 Copenhagen_M35027 113671 38 89=11532D36= * 0 0 TTTTTAACAGTTTGAGGTTTTAGATTTTTAGTTACAGAAGTGATATCGAATATTTTATCCAAAAAGAATGAGTAATTAATTGTCTTAGAAACGTCAACAACGAGAAAAAAGATTTAATATTACAG BBBBBFFFFBFFFFFFFFBFFFFFF/BFF<<FFFB/B/FFFFFF//BFBBBBFFFFFFFFFFBFFFFBB/FBFFFFFFB<FFBFBFF<FF/F/FBFF</B</BFFBFFF/FF/<F////FBB/B7 AM:i:38 NM:i:11532
Has an indel between 113759 and 125292.
I am not sure if there is already a tool that easily finds these. Ideally, the cigar string before and after the indel is an 100% match (like in the example record).
Thanks.
sweet, goes directly to coordinates. Is there any way to get an intermediary sam file to see which reads were counted for generating coordinates?
using samjdk (faster than samjs)
sorry: that was a stupid copy+paste, here is a shorter syntax:
oh I see. Both still do the same thing though, correct?