Hello,
I'm looking for a way to count the number of soft clipped bases and reads in my bam file and calculate their fraction. What's the best way to do it?
Thanks a lot.
fin swimmer
Hello,
I'm looking for a way to count the number of soft clipped bases and reads in my bam file and calculate their fraction. What's the best way to do it?
Thanks a lot.
fin swimmer
using bioalcidaejdk : http://lindenb.github.io/jvarkit/BioAlcidaeJdk.html
and the following code
final long counts[]=new long[4];
stream().forEach(R->{
counts[0]++;
if(R.getReadUnmappedFlag()) return;
Cigar cigar = R.getCigar();
if(cigar==null || cigar.isEmpty()) return;
counts[1]++;
counts[2] += cigar.getReadLength();
counts[3] += cigar.
getCigarElements().
stream().
filter(C->C.getOperator().isClipping()).
mapToInt(C->C.getLength()).
sum();
});
println("NUM-READS:"+counts[0]);
println("NUM-MAPPED-READS:"+counts[1]);
println("SUM-MAPPED-READS-LENGTH:"+counts[2]);
println("SUM-CLIPPING:"+counts[3]);
usage/example:
$ java -jar dist/bioalcidaejdk.jar --nocode -f biostar.code src/test/resources/S1.bam
NUM-READS:1998
NUM-MAPPED-READS:1998
SUM-MAPPED-READS-LENGTH:139860
SUM-CLIPPING:123
The GRIDSS package contains a picard-like tool gridss.CollectCigarMetrics which will calculate and summarise the the frequency of each CIGAR element length and operation. Your fraction can then be trivially calculated using read count, and number of reads with 0 soft clip CIGAR elements.
I've designed it so every step can be run as a stand-alone program. Just run java -cp gridss.jar gridss.analysis.CollectCigarMetrics
. example_pipeline.sh has an example of using CollectGridssMetrics
but CollectCigarMetrics
works the same way.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Hello Pierre,
"SUM-CLIPPING" is the sum of clipped bases? If so, how can I modify the code that I get the sum of clipped reads in addition?
fin swimmer
Works very well.
Thanks a lot.