Hi,
I have a BED file of features in the genome, for which I am trying to see how many bases overlap exons and how many don't. I used bedtools intersect to get a BED file that gives me the intersection of my features with an exon annotation from gencode, and bedtools sutract to get a BED file for the bases in my features that are not in exons. I then used awk -F'\t' 'BEGIN{SUM=0}{ SUM+=$3-$2 }END{print SUM}'
to get the number of bases covered by each of these files but I am running into the strange issue that the bases of the intersection + the length of the subtraction do not add up to the number of bases in the original file. They add up to slightly more than the original.
Am I making some basic logic error, misunderstanding the intersect and subtract functionality or is something very strange going on?
Thanks,
Adriana
Figured it out. The exons sometimes overlap resulting in the intersection being reported twice. Had to merge the exon annotation and everything worked.