Extracting features from .gff3 file based on position
0
0
Entering edit mode
4.5 years ago
adamerum • 0

Hi everyone,

Can anyone point me in the direction of how I can extract all features within a given genomic range from a .gff3 file? For example, if I have a list of genomic regions I am interested in, with scaffold ID and start to end position - how can I extract the gene IDs from the gff?

I am extremely new to bioinformatics so please excuse me if this is super straightforward or has been answered elsewhere...

I have found posts where you can use Bioconductor packages to extract features based on ID (i.e. searching for a specific gene) but can't see how to extract all genes in a given range.

The number of tools out there are overwhelming and I am pretty sure this is the kind of straightforward thing someone could easily answer!

Thanks in advance.

gene gff3 • 2.7k views
ADD COMMENT
2
Entering edit mode

You can filter your gff to keep only the region of interest as indicated here.

Then 2 possibilities using AGAT:

  1. You could use something like agat_convert_sp_gff2tsv.pl

    And then extract what you need from the tsv:

    awk '{if( $3=="gene") print $10}' your_file.tsv
    

    where $10 is the 10th column (considering the ID is in the 10th column, check the first line to see in which column the |D is)

  2. Use agat_sp_extract_attributes.pl:

    agat_sp_extract_attributes.pl --gff input.gff -t gene --attribute ID
    
ADD REPLY
0
Entering edit mode

Thank you both for your replies - great to be made aware of bedtools and AGAT which both look very helpful!

ADD REPLY
1
Entering edit mode

You should be able to use bedtools - it works on GFF files. The "list of genomic regions" should be a BED file (0-based).

ADD REPLY

Login before adding your answer.

Traffic: 2756 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6