usage:
$ ls input.bed
$ java -jar dist/samjdk.jar --body -f biostar368754.code input.bam
Dear all,
I have a really basic question regarding how to filter out rRNA reads from my paired end RNA-seq data. Essentially I would like to filter out any read pair aligning within a bed file of rRNA coordinates, which I am doing using "bedtools intersect" with the -v flag. This works fine and I get rid of all reads aligning within this region, yet the issue lies in that if only one of the reads of a pair lies within this interval, then it only filters out the one that is within the interval and retains the other end.
Does anyone know of any tool that does either: a) removes both reads if at least one of them lies within an interval; or b) removes "orphan" reads for which its pair is not present in the bam file. Unfortunately, the FLAG field can't be used since the reads are actually properly paired at the time of aligning.
Thank you!
using samjdk: http://lindenb.github.io/jvarkit/SamJdk.html
usage:
$ ls input.bed
$ java -jar dist/samjdk.jar --body -f biostar368754.code input.bam
private IntervalTreeMap<Interval> treemap = null; | |
@Override | |
public Object apply(final SAMRecord record) { | |
/** first time, fill the interval map */ | |
if(this.treemap==null) { | |
try { | |
/** first call , read the bed */ | |
this.treemap = new IntervalTreeMap<>(); | |
/** read all the lines */ | |
IOUtil.slurpLines(new java.io.File("input.bed")). | |
stream(). | |
filter(L->!L.isEmpty()). /* remove empty lines */ | |
map(L->L.split("[\t]")). /* split tab */ | |
map(T->new Interval(T[0],1+Integer.parseInt(T[1]),Integer.parseInt(T[2]))). /* convert to Interval */ | |
forEach(I->treemap.put(I,I));/* fill the treemap */ | |
} catch(java.io.IOException err) { | |
throw new RuntimeIOException(err); | |
} | |
} | |
if(!record.getReadUnmappedFlag()) { | |
/** the read interval */ | |
final Interval readInterval = new Interval(record.getContig(),record.getStart(),record.getEnd()); | |
if(treemap.containsOverlapping(readInterval) ) return false; | |
} | |
/* mate interval */ | |
if(record.getReadPairedFlag() && !record.getMateUnmappedFlag()) { | |
final int mateEnd= SAMUtils.getMateCigar(record)!=null? | |
SAMUtils.getMateAlignmentEnd(record): | |
record.getMateAlignmentStart() | |
; | |
final Interval mateInterval = new Interval(record.getMateReferenceName(),record.getMateAlignmentStart(),mateEnd); | |
if(treemap.containsOverlapping(mateInterval) ) return false; | |
} | |
return true; | |
} |
2nd answer : I forgot I wrote samviewwithmate http://lindenb.github.io/jvarkit/SamViewWithMate.html
see also Extract reads within given region, and their mates ; How to extract reads in a specific region from a BAM file and also their mates mapped elsewhere ? ;
you would have to use bedtools complement
to filter out the reads.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.