I'd like to count the number of unique genes in a gff file falling within a list of genomic regions. With bedtools I can count the number of regions within the gff which is almost what I want, but not quite.
bedtools intersect -a regions.bed -b my.gff -c
UPDATE:
I should have made my question a bit more specific. I have a modified ensemble style gtf file (not a gff) that has unique transcript IDs. This means that simply selecting unique fields in the 9th column of the gtf file actually counts transcript IDs.
To circumvent this problem I first truncated the gtf file:
cat my.gff | sed -e 's/;.*//' > delete.me.gtf
Then I ran the bedtools map command:
bedtools map -a regions.bed -b delete.me.gtf -c 9 -o count_distinct > counts.genes_in_windows.bed
I almost forgot to delete the intermediate file:
rm delete.me.gtf
There is probably a way to make this a oneliner, without the intermediate file, but I have a dissertation to write!
I don't think that quite works. I'd like to count the number of unique gene_ids that fall with each region. I'll update my question to better explain that.
I see - updated.