How to get ungapped single-end reads from the STAR output bam file?
1
0
Entering edit mode
6.4 years ago
caggtaagtat ★ 1.9k

Hello,

After mapping with STAR, I would like to remove all reads (single-end) from my Bam file, which show a gap in their alignment. Can this be done with samtools somehow?

RNA-Seq STAR ungapped single-end • 2.5k views
ADD COMMENT
0
Entering edit mode

Are deletions OK or do you want them filtered too?

ADD REPLY
0
Entering edit mode

I would like to remove deletions as well

ADD REPLY
0
Entering edit mode

GAP = single nucleotide variation + indels ?

ADD REPLY
0
Entering edit mode

I mean the remaining reads should be only continiously mapped to the reference genome, without a gap in their alignment, through splicing or deletions. The occurrence of SNPs was allowed during mapping and should be disregarded at this step.

Edit: Sry, with gap, I meant a splice junction in the reads alignment

ADD REPLY
2
Entering edit mode
6.4 years ago

using samjdk: http://lindenb.github.io/jvarkit/SamJdk.html

$ java -jar dist/samjdk.jar -e 'return !record.getReadUnmappedFlag() && record.getCigar()!=null && record.getCigar().getCigarElements().stream().map(C->C.getOperator()).noneMatch(OP->OP.equals(CigarOperator.N) || OP.equals(CigarOperator.D));'  in.bam
  • !record.getReadUnmappedFlag() read must be mapped
  • record.getCigar()!=null read has cigar
  • record.getCigar().getCigarElements().stream(). get a stream of cigar componenets
  • map(C->C.getOperator()) map to the cigar operator
  • noneMatch(OP->OP.equals(CigarOperator.N) || OP.equals(CigarOperator.D)); no operator can be 'N' or 'D'
ADD COMMENT
1
Entering edit mode

Thank you very much! This is a very interesting tool and it seems to work.

Edit: It definitly worked right away

ADD REPLY

Login before adding your answer.

Traffic: 1770 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6