remove UTR annotations from gff
2
I have UTR annotations which I need to remove for a set of genes, which I could do just by deleting but then it would not update the start and stop positions of the gene and mRNA features, and correct exon features. Is there a tool to remove UTR features from a gff file and make the necessary corrections to the remaining mRNA and gene, and exon features? I can't seem to find something but there must be??
gff
• 2.8k views
Hi rob234king ,
I think you can try gff3_file_UTR_trimmer.pl from PASApipeline:
(edited: changed the example to the gff3 in PASA directory; still thinking the meaning of a shift of phase, not a copy-paste problem)
$ cat test.gff3
gi| 68711 TIGR gene 12923 14228 . + . ID= 68711.t00017; Name= my_protein_name
gi| 68711 TIGR mRNA 12923 14228 . + . ID= model.68711.m00017; Parent= 68711.t00017
gi| 68711 TIGR five_prime_utr 12923 13029 . + . ID= utr5p_of_68711.m00017; Parent= 68711.m00017
gi| 68711 TIGR exon 12923 13060 . + . ID= 68711.e00069; Parent= model.68711.m00017
gi| 68711 TIGR CDS 13030 13060 . + 0 ID= 13030_13060cds_of_68711.m00017; Parent= model.68711.m00017
gi| 68711 TIGR exon 13411 13550 . + . ID= 68711.e00070; Parent= model.68711.m00017
gi| 68711 TIGR CDS 13411 13550 . + 1 ID= 13411_13550cds_of_68711.m00017; Parent= model.68711.m00017
gi| 68711 TIGR exon 13677 13802 . + . ID= 68711.e00071; Parent= model.68711.m00017
gi| 68711 TIGR CDS 13677 13802 . + 0 ID= 13677_13802cds_of_68711.m00017; Parent= model.68711.m00017
gi| 68711 TIGR exon 13876 14228 . + . ID= 68711.e00072; Parent= model.68711.m00017
gi| 68711 TIGR CDS 13876 14016 . + 0 ID= 13876_14016cds_of_68711.m00017; Parent= model.68711.m00017
gi| 68711 TIGR three_prime_utr 14017 14228 . + . ID= utr3p_of_68711.m00017; Parent= 68711.m00017
$ perl ~/src/PASApipeline-v2.3.3/misc_utilities/gff3_file_UTR_trimmer.pl test.gff3
gi| 68711 TIGR gene 13030 14016 . + . ID= 68711.t00017.1; Name= my_protein_name
gi| 68711 TIGR mRNA 13030 14016 . + . ID= model.68711.m00017; Parent= 68711.t00017.1; Name= my_protein_name
gi| 68711 TIGR exon 13030 13060 . + . ID= model.68711.m00017.exon1; Parent= model.68711.m00017
gi| 68711 TIGR CDS 13030 13060 . + 0 ID= cds.model.68711.m00017; Parent= model.68711.m00017
gi| 68711 TIGR exon 13411 13550 . + . ID= model.68711.m00017.exon2; Parent= model.68711.m00017
gi| 68711 TIGR CDS 13411 13550 . + 2 ID= cds.model.68711.m00017; Parent= model.68711.m00017
gi| 68711 TIGR exon 13677 13802 . + . ID= model.68711.m00017.exon3; Parent= model.68711.m00017
gi| 68711 TIGR CDS 13677 13802 . + 0 ID= cds.model.68711.m00017; Parent= model.68711.m00017
gi| 68711 TIGR exon 13876 14016 . + . ID= model.68711.m00017.exon4; Parent= model.68711.m00017
gi| 68711 TIGR CDS 13876 14016 . + 0 ID= cds.model.68711.m00017; Parent= model.68711.m00017
•
link
5.6 years ago by
AK
★
2.2k
It seems simplest to just use grep for this.
cat $gff | grep -v "five_prime_UTR" | grep -v "three_prime_UTR" > $new_gff
You could probably even just use
cat $gff | grep -v "_UTR"
and testing is as easy as
cat $gff | grep -v "_UTR" | cut -f 3 | grep -v "#" | sort -u
cat $gff | grep "_UTR" | cut -f 3 | grep -v "#" | sort -u
wherein the first diagnostic should contain know UTR categories and the last should be only UTR categories
Login before adding your answer.
Traffic: 3533 users visited in the last hour
great thanks, I'll give it a go
It worked perfectly, once I had the right perl version running for it.
Weird that the phase of the second CDS have changed from 1 to 2 during the process. A bug or copy past problem?