intersection of two GFFs
0
0
Entering edit mode
4.5 years ago
BioDH ▴ 10

I have two GFF files.

277(A.gff) and 1628(B.gff) regions were defined in each GFF.

What I would like to know is how many regions are in the intersection of two GFFs.

I used bedtools as following.

bedtoolls intersect -a A.gff -b B.gff -s -wa | wc -l

124

bedtoolls intersect -a B.gff -b A.gff -s -wa | wc -l

124

I though 124 entries were belonged to both.

But how can the followings be explained?

bedtools intersect -a A.gff -b B.gff -s -v  | wc -l

183

bedtoolls intersect -a B.gff -b A.gff -s -v | wc -l

1505

A.gff has 277, if A ∩ B = 124, A-B has to be 153. however, bedtools intersect -a A.gff -b B.gff -s -v | wc -l" tells 124

How can I get the intersection correctly?

bedtools GFF intersect • 1.7k views
ADD COMMENT
0
Entering edit mode

Take a look at AGAT which is a proper GFF toolkit. You should be able to find overlapping features using one of the programs here.

ADD REPLY
0
Entering edit mode

You can somehow use AGAT for that indeed but I'm not sure it is what @BioDH is looking for. You can use agat_sp_complement_annotations.pl, it will tell you how many features (e.g genes) you have in each annotation and then add in annotation A all genes from annotation B that do not overlap annotation A. You will have then a total of the new annotation A. Then you can deduce how many genes from B were overlapping A (But not the opposite). So you should then perform the opposite too, using B as reference and A as target.

Or maybe using agat_sp_merge_annotations.pl will tell you how many features were overlapping (I'm not remembering what information is provided by the tool).

ADD REPLY
0
Entering edit mode

What do you mean exactly by regions? Could you show a sample of such region?

ADD REPLY

Login before adding your answer.

Traffic: 1583 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6