I am trying to use bedtools to pull out the reads falling directly within a set of BED coordinates. While this command does it successfully:
intersectBed -abam mybam.bam -b intervals.gff -wa -wb -f 1 | coverageBed -abam stdin -b intervals.gff
I find that it loses key information that I need. I'd like to get a listing of the BAM reads -- getting at least their ID -- split by exon. In other words, all the read IDs that fall into the first interval in intervals.gff, all the read IDs that fall into the second interval in intervals.gff... ideally, it would also report the CIGAR string for these reads, but I'd settle for just the ID.
Is there a way to report these reads, such that it's easy to tell from the output which set of reads landed in a given interval in the input BED file?
Thanks you.
Wow, groupBy really needs short-cuts like the -g 1-9.
Hi Aaron, I noticed that you have answered quite a few questions in this focus area. So, I thought may be I can post my question here. I am interested in extracting the number of paired-end reads aligned to each member of a large set of regions. I have used BEDTools/IntersectBED tool so far; it works well and returns reads overlapping my regions. But I am only interested in their count; and given the fact that I have quite a large number of regions, the output of intersectBED will be pretty large. Is there a way to prompt intersectBED to only return counts?
Have you tried the
-c
option? http://bedtools.readthedocs.org/en/latest/content/tools/intersect.html#c-reporting-the-number-of-overlapping-featuresThank you so much for the very quick reply! I browsed biostars and SeqAnswers since posting the initial question, and came across coverageBED. Do you know the pros/cons of using -c option vs coverageBED? will they return identical results? and is there a way to make them return the number of paired-end reads as opposed to counting each member of pair once? [ presumably, these counts should be ~2x what I am looking for; but just to be safe!]