How to calculate the number of functional proteins in GFF3 file
0
0
Entering edit mode
5.1 years ago
jjeangoh • 0

Hi! I had GFF3 file but I still dunno how to use those data and turn into results as shown below: Number of gene models= 29,552; BLAST annotation (nr)=20,274 ; InterProScan annotation= 22,218 ; GO annotation= 12,287 ; 143 genes are responsible for extracellular functions, 6743 are responsible for transcription etc..

May I know which software can be used for this function? I'm still new to bioinformatics and still figuring out. Thanks!

next-gen annotation gff3 protein • 1.4k views
ADD COMMENT
1
Entering edit mode

Hi, Could you show us the first couple of lines from your file? Most of the information you are looking for is optional for GFF3 files.

ADD REPLY
0
Entering edit mode

tig00001845 maker gene 6983 7357 . + . ID=LCD1_G00009980;Name=LCD1_G00009980;Alias=augustus_masked-tig00001845-processed-gene-0.17;Note=Protein of unknown function; tig00001845 maker mRNA 6983 7357 . + . ID=LCD1_T00009980-RA;Parent=LCD1_G00009980;Name=LCD1_T00009980-RA;Alias=augustus_masked-tig00001845-processed-gene-0.17-mRNA-1;_AED=0.51;_QI=0|-1|0|1|-1|1|1|0|124;_eAED=0.48;Note=Protein of unknown function; tig00001845 maker exon 6983 7357 . + . ID=LCD1_T00009980-RA:exon:6201;Parent=LCD1_T00009980-RA; tig00001845 maker CDS 6983 7357 . + 0 ID=LCD1_T00009980-RA:cds;Parent=LCD1_T00009980-RA; tig00001845 maker gene 23959 25066 . + . ID=LCD1_G00009986;Name=LCD1_G00009986;Alias=snap_masked-tig00001845-processed-gene-0.30;Note=Protein of unknown function; tig00001845 maker mRNA 23959 25066 . + . ID=LCD1_T00009986-RA;Parent=LCD1_G00009986;Name=LCD1_T00009986-RA;Alias=snap_masked-tig00001845-processed-gene-0.30-mRNA-1;_AED=0.45;_QI=0|0|0|1|1|1|2|0|348;_eAED=0.51;Note=Protein of unknown function; tig00001845 maker exon 23959 24075 . + . ID=LCD1_T00009986-RA:exon:6202;Parent=LCD1_T00009986-RA;

ADD REPLY
1
Entering edit mode

If you have the time, you can write your own parser in any language and analyse the fields having a '=' .

Alternatively, you can use Juke-34's suggestion or from the UCSC's admin tools the gff3ToGenePred.

The tool scans the gff3 and transforms it into a genePred format. While doing so and providing the parameter -attrsOut , you'll get a file with all attributes per mRNA.

Denote, you'll only get what is annotated. Your two example genes are coding for

Note=Protein of unknown function;

If you don't have any GO annotation, you'll need to get the ID and mapp it on the GO term using a the appropriate database.

ADD REPLY
1
Entering edit mode

You can give a try to the script agat_sp_functional_statistics.pl from the AGAT.

agat_sp_functional_statistics.pl --gff input.gff -o output

You will get functional annotation statistics but it will not give you such information: 143 genes are responsible for extracellular functions, 6743 are responsible for transcription, etc...
You will get a txt file for each type of dbxref attached e.g. one file with all GO terms. Based on that you can retrieve the GO term linked to transcription for example and count how many do you have.

ADD REPLY

Login before adding your answer.

Traffic: 1963 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6