Hi all, I am recently working with some genomic coordinate manipulation and RNA-Seq analysis using bedtools.
I am interested in recovering the reads that overlap a specific features with a given percentage. I know that in bedtools we can use the -F
flag to return (quoted from bedtools website: Minimum overlap required as a fraction of B. Default is 1E-9 (i.e., 1bp).).
However there seems to be a weird behaviour which I don't quite get.
In this example files we can make up reads
echo "@HD VN:1.0 SO:unsorted
@SQ SN:chr1 LN:1000
r1 0 chr1 150 200 50M * 0 0 ACGTAGCTAGCTAGCTAGCTAGCTGATCGTAGCTAGCTGATCGTACGATC *
r2 0 chr1 190 340 10M100N40M * 0 0 AGCTAGCTAGCTAGCTAGCTAGCTGATCGTAGCTAGCTGATCGTACGATC *
r3 0 chr1 350 400 50M * 0 0 TCGATCGTACGTAGCTAGCTGATCGTAGCTAGCTGATCGTACGATCGTAC *
r3bella 0 chr1 190 240 50M * 0 0 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA *" > reads.sam
and convert to bam
samtools view -bS reads.sam > reads.bam
and some features file in bed format
echo "chr1 100 200 exon1
chr1 198 201 exIntJunct
chr1 201 299 intron1
chr1 300 400 exon2" > exons.bed
Each read is 50bp long.
However when I do:
bedtools intersect -split -bed -wo -abam reads_sorted.bam -b exons.bed -F 0.5 > outputIntersect_split.bed
I got one single feature, which is not what I expected:
chr1 349 399 r3 255 + 349 399 0,0,0 1 50, 0, chr1 300 400 exon2 50
For example, the feature exIntJunct
is only 3nt long and should be completely overlapped (therefore the 1.0 fraction of it is covered) by r3bella
read. However, this does not appear to be the case..
I am quite confused about what this -F
parameter is actually doing, and how it is calculating this fraction. I tried all possible combinations but nothing seems to work so far.
Hope someone can help!
how could this
cat "@HD VN:1.0 SO:unsorted..." > reads.sam
could even work ?OP has
echo
andcat
confused, I think. They did not actually run the commands they posted here.Sorry you are right, I got confused! I updated the post. The issue however still remains! :)
I can't explain it either.
Strangely, if you remove the
-split
flag it seems to behave as expected (at least for me).It will also report the r3bella/exIntJunct overlap if you reduce the
-F
to a much lower fraction (maybe ~0.3).