roary not working with pgap output gff files with fasta sequence but works with prokka's gff outputs
0
0
Entering edit mode
10 days ago
pramach1 ▴ 40

I have some annotated genomes both with prokka and pgap. Pgap will output gff files with fasta sequence at the end (annot_with_genomic_fasta.gff). I tried running roary (3.12.0) on the pgap annotated *_fasta.gff file, the error "I get is Use of uninitialized value $cells[8] in split at /nfs/software/apps/roary/3.12.0/lib/Bio/Roary/ReformatInputGFFs.pm line 147, <$input_gff_fh> line 75541". For all the lines from that gff file. The same version roary runs perfectly fine with prokka gffs. How to reformat pgap gff file to match prokka's so that roary would produce outputs on pgap annotated file. Thank you for the help.

Sample(few lines of header, middle and end) of pgap *_fasta.gff

gff-version 3

!gff-spec-version 1.21

!processor NCBI annotwriter

sequence-region 1 4650742

species https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=28901

GCA_007218495.1 Local region 1 4650742 . + . ID=GCA_007218495.1:1..4650742;Dbxref=taxon:28901;Name=ANONYMOUS;gbkey=Src;genome=chromosome;mol_type=genomic DNA GCA_007218495.1 . gene 124 1170 . - . ID=gene-pgaptmp_000001;Name=pyrC;gbkey=Gene;gene=pyrC;gene_biotype=protein_coding;locus_tag=pgaptmp_000001 GCA_007218495.1 Protein Homology CDS 124 1170 . - 0 ID=cds-pgaptmp_000001;Parent=gene-pgaptmp_000001;Name=extdb:pgaptmp_000001;Ontology_term=GO:0009220,GO:0004151;gbkey=CDS;gene=pyrC;go_function=dihydroorotase activity|0004151||IEA;go_process=pyrimidine ribonucleotide biosynthetic process|0009220||IEA;inference=COORDINATES: similar to AA sequence:RefSeq:NP_460134.1;locus_tag=pgaptmp_000001;product=dihydroorotase;protein_id=extdb:pgaptmp_000001;transl_table=11 GCA_007218495.1 . gene 1274 1834 . - . ID=gene-pgaptmp_000002;Name=pgaptmp_000002;gbkey=Gene;gene_biotype=protein_coding;locus_tag=pgaptmp_000002 GCA_007218495.1 Protein Homology CDS 1274 1834 . - 0 ID=cds-pgaptmp_000002;Parent=gene-pgaptmp_000002;Name=extdb:pgaptmp_000002;gbkey=CDS;inference=COORDINATES: similar to AA sequence:RefSeq:NP_460135.1;locus_tag=pgaptmp_000002;product=lipoprotein;protein_id=extdb:pgaptmp_000002;transl_table=11 GCA_007218495.1 . gene 1959 2606 . - . ID=gene-pgaptmp_000003;Name=grxB;gbkey=Gene;gene=grxB;gene_biotype=protein_coding;locus_tag=pgaptmp_000003 GCA_007218495.1 Protein Homology CDS 1959 2606 . - 0 ID=cds-pgaptmp_000003;Parent=gene-pgaptmp_000003;Name=extdb:pgaptmp_000003;Ontology_term=GO:0006749,GO:0004362,GO:0005515;gbkey=CDS;gene=grxB;go_function=glutathione-disulfide reductase (NADPH) activity|0004362||IEA,protein binding|0005515||IEA;go_process=glutathione metabolic process|0006749||IEA;inference=COORDINATES: similar to AA sequence:RefSeq:NP_460136.1;locus_tag=pgaptmp_000003;product=glutaredoxin 2;protein_id=extdb:pgaptmp_000003;transl_table=11 GCA_007218495.1 . gene 2670 3878 . - . ID=gene-pgaptmp_000004;Name=mdtH;gbkey=Gene;gene=mdtH;gene_biotype=protein_coding;locus_tag=pgaptmp_000004 GCA_007218495.1 Protein Homology CDS 2670 3878 . - 0 ID=cds-pgaptmp_000004;Parent=gene-pgaptmp_000004;Name=extdb:pgaptmp_000004;Ontology_term=GO:0055085,GO:0022857;gbkey=CDS;gene=mdtH;go_function=transmembrane transporter activity|0022857||IEA;go_process=transmembrane transport|0055085||IEA;inference=COORDINATES: similar to AA sequence:RefSeq:NP_460137.1;locus_tag=pgaptmp_000004;product=multidrug efflux MFS transporter MdtH;protein_id=extdb:pgaptmp_000004;transl_table=11 GCA_007218495.1 . gene 4115 4699 . + . ID=gene-pgaptmp_000005;Name=rimJ;gbkey=Gene;gene=rimJ;gene_biotype=protein_coding;locus_tag=pgaptmp_000005 GCA_007218495.1 Protein Homology CDS 4115 4699 . + 0 ID=cds-pgaptmp_000005;Parent=gene-pgaptmp_000005;Name=extdb:pgaptmp_000005;Ontology_term=GO:0008080;gbkey=CDS;gene=rimJ;go_function=N-acetyltransferase activity|0008080||IEA;inference=COORDINATES: similar to AA sequence:RefSeq:NP_460138.1;locus_tag=pgaptmp_000005;product=ribosomal protein S5-alanine N-acetyltransferase;protein_id=extdb:pgaptmp_000005;transl_table=11 GCA_007218495.1 . gene 4735 5382 . + . ID=gene-pgaptmp_000006;Name=pgaptmp_000006;gbkey=Gene;gene_biotype=protein_coding;locus_tag=pgaptmp_000006

GCA_007218495.1 tRNAscan-SE tRNA 4650567 4650642 . + . ID=rna-pgaptmp_004502;Parent=gene-pgaptmp_004502;anticodon=(pos:4650601..4650603);gbkey=tRNA;inference=COORDINATES: profile:tRNAscan-SE:2.0.12;locus_tag=pgaptmp_004502;product=tRNA-Glu GCA_007218495.1 tRNAscan-SE exon 4650567 4650642 . + . ID=exon-pgaptmp_004502-1;Parent=rna-pgaptmp_004502;anticodon=(pos:4650601..4650603);gbkey=tRNA;inference=COORDINATES: profile:tRNAscan-SE:2.0.12;locus_tag=pgaptmp_004502;product=tRNA-Glu

FASTA

lcl|GCA_007218495.1 Salmonella enterica chromosome, whole genome shotgun sequence GCACCGGCAGGCAGAGGTGACGTTTTGGCTATAGTGACTTCAATACGCATAATGGCCCCCTGTTGAATAT ACTGGATATATATACAGTTAAATCCAATATATAGCAACAGGTAAGCGCATTTTTTATTTTTTTACTGACC

pangenome PGAP NCBI roary Prokka • 312 views
ADD COMMENT

Login before adding your answer.

Traffic: 1401 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6