I m trying to use this agat which adds new attributes from tsv to gtf file.
My file formats are as such
input tsv which is my reference file
gene_id Entrez_ID
ENSCAFG00845006432 399518
ENSCAFG00845002136 399530
ENSCAFG00845029798 399544
ENSCAFG00845011460 399545
ENSCAFG00845001610 399653
ENSCAFG00845013158 403157
ENSCAFG00845014982 403168
ENSCAFG00845021967 403170
ENSCAFG00845019241 40340
Next one is my gtf file
#!genome-build ROS_Cfam_1.0
#!genome-version ROS_Cfam_1.0
#!genome-date 2020-09
#!genome-build-accession GCA_014441545.1
#!genebuild-last-updated 2020-10
X ensembl gene 24550462 24552226 . - . gene_id "ENSCAFG00845015183"; gene_version "1"; gene_source "ensembl"; gene_biotype "protein_coding";
X ensembl transcript 24550462 24552226 . - . gene_id "ENSCAFG00845015183"; gene_version "1"; transcript_id "ENSCAFT00845027108"; transcript_version "1"; gene_source "ensembl"; gene_biotype "protein_coding"; transcript_
source "ensembl"; transcript_biotype "protein_coding"; tag "Ensembl_canonical";
X ensembl exon 24552206 24552226 . - . gene_id "ENSCAFG00845015183"; gene_version "1"; transcript_id "ENSCAFT00845027108"; transcript_version "1"; exon_number "1"; gene_source "ensembl"; gene_biotype "protein_coding"; tr
anscript_source "ensembl"; transcript_biotype "protein_coding"; exon_id "ENSCAFE00845128634"; exon_version "1"; tag "Ensembl_canonical";
X ensembl CDS 24552206 24552226 . - 0 gene_id "ENSCAFG00845015183"; gene_version "1"; transcript_id "ENSCAFT00845027108"; transcript_version "1"; exon_number "1"; gene_source "ensembl"; gene_biotype "protein_coding"; tr
anscript_source "ensembl"; transcript_biotype "protein_coding"; protein_id "ENSCAFP00845021332"; protein_version "1"; tag "Ensembl_canonical";
X ensembl start_codon 24552224 24552226 . - 0 gene_id "ENSCAFG00845015183"; gene_version "1"; transcript_id "ENSCAFT00845027108"; transcript_version "1"; exon_number "1"; gene_source "ensembl"; gene_biotype "protein_cod
ing"; transcript_source "ensembl"; transcript_biotype "protein_coding"; tag "Ensembl_canonical";
Now this my command and its argument agat_sq_add_attributes_from_tsv.pl --gff Canis_lupus_familiaris.ROS_Cfam_1.0.108.gtf --tsv entrez_id_filtered.tsv -o test_v1.gtf
The head of the new gtf file
cat test_v1.gtf | head
##gff-version 3
X ensembl gene 24550462 24552226 . - .
X ensembl transcript 24550462 24552226 . - .
X ensembl exon 24552206 24552226 . - .
X ensembl CDS 24552206 24552226 . - 0
X ensembl start_codon 24552224 24552226 . - 0
X ensembl exon 24550462 24551997 . - .
X ensembl CDS 24550462 24551997 . - 0
X ensembl gene 24606240 24606309 . - .
X ensembl transcript 24606240 24606309 . - .
Tail
cat test_v1.gtf | tail
JAAUVH010000016.1 ensembl exon 4087 4232 . - .
JAAUVH010000221.1 ensembl gene 649 789 . + .
JAAUVH010000221.1 ensembl transcript 649 789 . + .
JAAUVH010000221.1 ensembl exon 649 789 . + .
JAAUVH010000128.1 ensembl gene 2862 3007 . + .
JAAUVH010000128.1 ensembl transcript 2862 3007 . + .
JAAUVH010000128.1 ensembl exon 2862 3007 . + .
JAAUVH010000325.1 ensembl gene 7802 7946 . + .
JAAUVH010000325.1 ensembl transcript 7802 7946 . + .
JAAUVH010000325.1 ensembl exon 7802 7946 . + .
Not sure about the output if this is how it should be because I don't see Entrez_ID tag in the new gtf file. Any suggestion would be really helpful
It is now fixed (master branch). It will be available through conda in the next release
thank you so very much i have tried all sorts of combination looking at the example still no desired output so I thought Im doing some mistake