Entering edit mode
5.2 years ago
jjeangoh
•
0
Hi! I had GFF3 file but I still dunno how to use those data and turn into results as shown below: Number of gene models= 29,552; BLAST annotation (nr)=20,274 ; InterProScan annotation= 22,218 ; GO annotation= 12,287 ; 143 genes are responsible for extracellular functions, 6743 are responsible for transcription etc..
May I know which software can be used for this function? I'm still new to bioinformatics and still figuring out. Thanks!
Hi, Could you show us the first couple of lines from your file? Most of the information you are looking for is optional for GFF3 files.
tig00001845 maker gene 6983 7357 . + . ID=LCD1_G00009980;Name=LCD1_G00009980;Alias=augustus_masked-tig00001845-processed-gene-0.17;Note=Protein of unknown function; tig00001845 maker mRNA 6983 7357 . + . ID=LCD1_T00009980-RA;Parent=LCD1_G00009980;Name=LCD1_T00009980-RA;Alias=augustus_masked-tig00001845-processed-gene-0.17-mRNA-1;_AED=0.51;_QI=0|-1|0|1|-1|1|1|0|124;_eAED=0.48;Note=Protein of unknown function; tig00001845 maker exon 6983 7357 . + . ID=LCD1_T00009980-RA:exon:6201;Parent=LCD1_T00009980-RA; tig00001845 maker CDS 6983 7357 . + 0 ID=LCD1_T00009980-RA:cds;Parent=LCD1_T00009980-RA; tig00001845 maker gene 23959 25066 . + . ID=LCD1_G00009986;Name=LCD1_G00009986;Alias=snap_masked-tig00001845-processed-gene-0.30;Note=Protein of unknown function; tig00001845 maker mRNA 23959 25066 . + . ID=LCD1_T00009986-RA;Parent=LCD1_G00009986;Name=LCD1_T00009986-RA;Alias=snap_masked-tig00001845-processed-gene-0.30-mRNA-1;_AED=0.45;_QI=0|0|0|1|1|1|2|0|348;_eAED=0.51;Note=Protein of unknown function; tig00001845 maker exon 23959 24075 . + . ID=LCD1_T00009986-RA:exon:6202;Parent=LCD1_T00009986-RA;
If you have the time, you can write your own parser in any language and analyse the fields having a '=' .
Alternatively, you can use Juke-34's suggestion or from the UCSC's admin tools the
gff3ToGenePred
.The tool scans the gff3 and transforms it into a genePred format. While doing so and providing the parameter
-attrsOut
, you'll get a file with all attributes per mRNA.Denote, you'll only get what is annotated. Your two example genes are coding for
If you don't have any GO annotation, you'll need to get the ID and mapp it on the GO term using a the appropriate database.
You can give a try to the script
agat_sp_functional_statistics.pl
from the AGAT.You will get functional annotation statistics but it will not give you such information: 143 genes are responsible for extracellular functions, 6743 are responsible for transcription, etc...
You will get a txt file for each type of
dbxref
attached e.g. one file with all GO terms. Based on that you can retrieve the GO term linked totranscription
for example and count how many do you have.