Extract gene IDS from GFF3 based on coordinates from another file
0
0
Entering edit mode
4.3 years ago
JR • 0

I have determined some regions of interest in my genome and i want to extract the gene annotations that fall into that regions.

First, I have formatted my initial GFF3 file to obtain a bed file like:

Scaffold_1      1451750 1458451 ID=ANN00021
Scaffold_1      3553514 3558618 ID=ANN00054
Scaffold_2      4024794 4032517 ID=ANN00058

And i have other bed file with the genome regions:

Scaffold_1      133745072       133845072
Scaffold_1      133854352       133954352
Scaffold_1      133806326       133906326
Scaffold_1      133912327       134012327
Scaffold_2      64167277        64267277

I have tried with bedtools but I think cannot deal with my question.

I will appreciate any suggestion. Thanks!

genome gene GFF3 • 1.8k views
ADD COMMENT
0
Entering edit mode

I have tried with bedtools

Can you also tell us exactly what you tried and how it failed? I'd recommend you take a look at bedops - the manual helps visualize your expectations very well: https://bedops.readthedocs.io/en/latest/content/overview.html#about-bedops

ADD REPLY
0
Entering edit mode

Thank you for the bedtops, sounds good. I will try to solve with that.

ADD REPLY
0
Entering edit mode

you just need a little python. ie, if range start < annotation start < range end. and same for annotation end. or just make a range and ask if it's in that range... if u are going to have to do this sort of thing regularly u need to learn py.

ADD REPLY

Login before adding your answer.

Traffic: 2010 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6