Hello everyone. I now have GFF files downloaded from NCBI in the following format:
NC_056054.1 RefSeq region 1 278617202 . + . ID=NC_056054.1:1..278617202;Dbxref=taxon:9940;Name=1;breed=Rambouillet;chromos
NC_056054.1 Gnomon pseudogene 42249 46639 . - . ID=gene-LOC114110831;Dbxref=GeneID:114110831;Name=LOC114110831;gbkey=Gene;gene
NC_056054.1 Gnomon exon 42249 43660 . - . ID=id-LOC114110831;Parent=gene-LOC114110831;Dbxref=GeneID:114110831;gbkey=exon;gene=LO
NC_056054.1 Gnomon exon 43959 44085 . - . ID=id-LOC114110831-2;Parent=gene-LOC114110831;Dbxref=GeneID:114110831;gbkey=exon;gene=
NC_056054.1 Gnomon exon 46503 46639 . - . ID=id-LOC114110831-3;Parent=gene-LOC114110831;Dbxref=GeneID:114110831;gbkey=exon;gene=
NC_056054.1 Gnomon gene 46755 48356 . - . ID=gene-LOC114112203;Dbxref=GeneID:114112203;Name=LOC114112203;gbkey=Gene;gene=LOC1141
NC_056054.1 Gnomon mRNA 46755 48356 . - . ID=rna-XM_027963747.2;Parent=gene-LOC114112203;Dbxref=GeneID:114112203,Genbank:XM_0279
NC_056054.1 Gnomon exon 46755 48356 . - . ID=exon-XM_027963747.2-1;Parent=rna-XM_027963747.2;Dbxref=GeneID:114112203,Genbank:XM_
NC_056054.1 Gnomon CDS 46755 48356 . - 0 ID=cds-XP_027819548.2;Parent=rna-XM_027963747.2;Dbxref=GeneID:114112203,Genbank:XP_027
I want to convert this file to ensenbl format, here is another version of GFF I downloaded from ensenbl in the following format:
1 ensembl gene 87434 89380 . + . ID=gene:ENSOARG00020000042;Name=FAM240C;biotype=protein_coding;description=family with sequenc
1 ensembl mRNA 87434 89380 . + . ID=transcript:ENSOART00020000042;Parent=gene:ENSOARG00020000042;Name=FAM240C-201;biotype=prote
1 ensembl exon 87434 87579 . + . Parent=transcript:ENSOART00020000042;Name=ENSOARE00020000042;constitutive=1;ensembl_end_phase=
1 ensembl CDS 87434 87579 . + 0 ID=CDS:ENSOARP00020000015;Parent=transcript:ENSOART00020000042;protein_id=ENSOARP00020000015
1 ensembl exon 89251 89305 . + . Parent=transcript:ENSOART00020000042;Name=ENSOARE00020000043;constitutive=1;ensembl_end_phase=
1 ensembl CDS 89251 89305 . + 1 ID=CDS:ENSOARP00020000015;Parent=transcript:ENSOART00020000042;protein_id=ENSOARP00020000015
1 ensembl exon 89307 89326 . + . Parent=transcript:ENSOART00020000042;Name=ENSOARE00020000044;constitutive=1;ensembl_end_phase=
1 ensembl CDS 89307 89326 . + 0 ID=CDS:ENSOARP00020000015;Parent=transcript:ENSOART00020000042;protein_id=ENSOARP00020000015
1 ensembl exon 89329 89380 . + . Parent=transcript:ENSOART00020000042;Name=ENSOARE00020000045;constitutive=1;ensembl_end_phase=
1 ensembl CDS 89329 89380 . + 1 ID=CDS:ENSOARP00020000015;Parent=transcript:ENSOART00020000042;protein_id=ENSOARP00020000015
Especially Parent=
this place, they are so different that my other analysis software can't recognize them.
I would like to ask if anyone knows of any software or code to accomplish what I have above, I would greatly appreciate it.
They are not conceptually different. Both are GFF3 files and a region of type gene is the parent for mRNA region which again is parent to exon and CDS. This is the same for both. Before we invest a lot of time, we should look into the specifics of the format understood by the analysis software and the exact error message. Which software are you using? It might just be a small hiccup because of a single attribute or character (e.g. : vs. -) in the identifiers.
hello michael I am using
, the code is as follows:
The error I'm getting is: