Dear all
I am trying to use bedtools to compare a BED file with a GFF file.
The BED file is like below
Chr1 3641 5640 . . +
The GFF file is as follows
Chr1 TAIR10 gene 3631 5899 . + . ID=AT1G01010;Note=protein_coding_gene;Name=AT1G01010
Chr1 TAIR10 mRNA 3631 5899 . + . ID=AT1G01010.1;Parent=AT1G01010;Name=AT1G01010.1;Index=1
Chr1 TAIR10 protein 3760 5630 . + . ID=AT1G01010.1-Protein;Name=AT1G01010.1;Derives_from=AT1G01010.1
Chr1 TAIR10 exon 3631 3913 . + . Parent=AT1G01010.1
Chr1 TAIR10 five_prime_UTR 3631 3759 . + . Parent=AT1G01010.1
I used the command intersectBed -a A.bed -b B.gff -wa -wb
to find overlaps.
However, my problem is: how to define intergenic, intron and intron-exon-overlapping or intergenic-gene-overlapping regions?
Thank you for any of your suggestions!
I don't understand exactly your problem, what are you trying? Do you want to extract the features of the gff file that are included within your bed region?
Yes, exactly as you mentioned, I want to extract features of GFF file according to BED regions. My problem is: the GFF file does not have intron or intergenic features. More complex thing is: BED regions may have exon-intron overlap. THANKS!
You should really be able to figure this out yourself. For example, do you expect purely intergenic regions to have any overlaps (hint: no, you don't).
Hi Devon
Sorry for my unclear description. As airan said, I want to define every BED region. The difficulty is: the GFF file does not have intergenic and intron information. THANKS!