Entering edit mode
6.5 years ago
Chris
▴
30
Hi all, I have a big GFF3 file that I downloaded from Batch Entrez and it contains 54 genomes. I want to split it into individual genomes. Any help would be appreciated.
Thanks
This is an example of the file
##sequence-region Z18946.1 1 52297
##species https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=31757
Z18946.1 EMBL region 1 52297 . + . ID=id-1;Dbxref=taxon:31757;gbkey=Src;mol_type=genomic DNA
Z18946.1 EMBL gene 411 1100 . + . ID=gene-PBI_L5_1;Name=1;gbkey=Gene;gene=1;gene_biotype=protein_coding;locus_tag=PBI_L5_1
Z18946.1 EMBL CDS 411 1100 . + 0 ID=cds-CAA79380.1;Parent=gene-PBI_L5_1;Dbxref=InterPro:IPR025530,UniProtKB/Swiss-Prot:Q05218,NCBI_GP:CAA79380.1;Name=CAA79380.1;gbkey=CDS;gene=1;product=Hypothetical Protein;protein_id=CAA79380.1;transl_table=11
Z18946.1 EMBL gene 1305 2084 . + . ID=gene-PBI_L5_2;Name=2;gbkey=Gene;gene=2;gene_biotype=protein_coding;locus_tag=PBI_L5_2
Z18946.1 EMBL CDS 1305 2084 . + 0 ID=cds-CAA79381.1;Parent=gene-PBI_L5_2;Dbxref=UniProtKB/Swiss-Prot:Q05230,NCBI_GP:CAA79381.1;Name=CAA79381.1;gbkey=CDS;gene=2;product=Hypothetical Protein;protein_id=CAA79381.1;transl_table=11
Z18946.1 EMBL gene 2084 2335 . + . ID=gene-PBI_L5_3;Name=3;gbkey=Gene;gene=3;gene_biotype=protein_coding;locus_tag=PBI_L5_3
Z18946.1 EMBL CDS 2084 2335 . + 0 ID=cds-CAA79382.1;Parent=gene-PBI_L5_3;Dbxref=UniProtKB/Swiss-Prot:Q05242,NCBI_GP:CAA79382.1;Name=CAA79382.1;gbkey=CDS;gene=3;product=Hypothetical Protein;protein_id=CAA79382.1;transl_table=11
##sequence-region AF022214.1 1 49136
##species https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=28369
AF022214.1 Genbank region 1 49136 . + . ID=id-1;Dbxref=taxon:28369;gbkey=Src;mol_type=genomic DNA;old-name=Mycobacteriophage D29
AF022214.1 Genbank gene 401 1213 . + . ID=gene-PBI_D29_1;Name=1;gbkey=Gene;gene=1;gene_biotype=protein_coding;locus_tag=PBI_D29_1
AF022214.1 Genbank CDS 401 1213 . + 0 ID=cds-AAC18444.1;Parent=gene-PBI_D29_1;Dbxref=NCBI_GP:AAC18444.1;Name=AAC18444.1;Note=gp1%3B putative 30.3 kD protein;gbkey=CDS;gene=1;product=hypothetical protein;protein_id=AAC18444.1;transl_table=11
AF022214.1 Genbank gene 1327 2106 . + . ID=gene-PBI_D29_2;Name=2;gbkey=Gene;gene=2;gene_biotype=protein_coding;locus_tag=PBI_D29_2
AF022214.1 Genbank CDS 1327 2106 . + 0 ID=cds-AAC18445.1;Parent=gene-PBI_D29_2;Dbxref=NCBI_GP:AAC18445.1;Name=AAC18445.1;Note=gp2%3B putative 28.8 kD protein;gbkey=CDS;gene=2;product=hypothetical protein;protein_id=AAC18445.1;transl_table=11
AF022214.1 Genbank gene 2106 2357 . + . ID=gene-PBI_D29_3;Name=3;gbkey=Gene;gene=3;gene_biotype=protein_coding;locus_tag=PBI_D29_3
AF022214.1 Genbank CDS 2106 2357 . + 0 ID=cds-AAC18446.1;Parent=gene-PBI_D29_3;Dbxref=NCBI_GP:AAC18446.1;Name=AAC18446.1;Note=gp3%3B putative 9.0 kD protein;gbkey=CDS;gene=3;product=hypothetical protein;protein_id=AAC18446.1;transl_table=11
##sequence-region AF068845.1 1 52797
##species https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=88870
AF068845.1 Genbank region 1 52797 . + . ID=id-1;Dbxref=taxon:88870;gbkey=Src;mol_type=genomic DNA;old-name=Mycobacteriophage TM4
AF068845.1 Genbank gene 100 234 . + . ID=gene-TM4_1;Name=1;gbkey=Gene;gene=1;gene_biotype=protein_coding;locus_tag=TM4_1
AF068845.1 Genbank CDS 100 234 . + 0 ID=cds-AAD17569.1;Parent=gene-TM4_1;Dbxref=NCBI_GP:AAD17569.1;Name=AAD17569.1;Note=gp1;gbkey=CDS;gene=1;product=hypothetical protein;protein_id=AAD17569.1;transl_table=11
AF068845.1 Genbank gene 236 448 . + . ID=gene-TM4_2;Name=2;gbkey=Gene;gene=2;gene_biotype=protein_coding;locus_tag=TM4_2
AF068845.1 Genbank CDS 236 448 . + 0 ID=cds-AAD17570.1;Parent=gene-TM4_2;Dbxref=NCBI_GP:AAD17570.1;Name=AAD17570.1;Note=gp2;gbkey=CDS;gene=2;product=hypothetical protein;protein_id=AAD17570.1;transl_table=11
##sequence-region AF271693.1 1 50550
##species https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=148603
AF271693.1 Genbank region 1 50550 . + . ID=id-1;Dbxref=taxon:148603;gbkey=Src;mol_type=genomic DNA;old-name=Mycobacteriophage Bxb1
AF271693.1 Genbank gene 599 895 . + . ID=gene-PBI_BXB1_1;Name=1;gbkey=Gene;gene=1;gene_biotype=protein_coding;locus_tag=PBI_BXB1_1
AF271693.1 Genbank CDS 599 895 . + 0 ID=cds-AAG59706.1;Parent=gene-PBI_BXB1_1;Dbxref=NCBI_GP:AAG59706.1;Name=AAG59706.1;Note=related to L5 gp4%3B 11.4 kD hypothetical;gbkey=CDS;gene=1;product=hypothetical protein;protein_id=AAG59706.1;transl_table=11
AF271693.1 Genbank gene 930 1370 . + . ID=gene-PBI_BXB1_2;Name=2;gbkey=Gene;gene=2;gene_biotype=protein_coding;locus_tag=PBI_BXB1_2
AF271693.1 Genbank CDS 930 1370 . + 0 ID=cds-AAG59707.1;Parent=gene-PBI_BXB1_2;Dbxref=NCBI_GP:AAG59707.1;Name=AAG59707.1;Note=related to L5 gp5%3B 16.3 kD hypothetical;gbkey=CDS;gene=2;product=hypothetical protein;protein_id=AAG59707.1;transl_table=11
AF271693.1 Genbank gene 1716 2027 . + . ID=gene-PBI_BXB1_3;Name=3;gbkey=Gene;gene=3;gene_biotype=protein_coding;locus_tag=PBI_BXB1_3
AF271693.1 Genbank CDS 1716 2027 . + 0 ID=cds-AAG59708.1;Parent=gene-PBI_BXB1_3;Dbxref=NCBI_GP:AAG59708.1;Name=AAG59708.1;Note=11.4 kD hypothetical protein;gbkey=CDS;gene=3;product=hypothetical protein;protein_id=AAG59708.1;transl_table=11
##sequence-region AY129330.1 1 59471
##species https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=205868
AY129330.1 Genbank region 1 59471 . + . ID=id-1;Dbxref=taxon:205868;gbkey=Src;mol_type=genomic DNA;old-name=Mycobacteriophage Che8
AY129330.1 Genbank gene 108 563 . + . ID=gene-PBI_CHE8_1;Name=1;gbkey=Gene;gene=1;gene_biotype=protein_coding;locus_tag=PBI_CHE8_1
AY129330.1 Genbank CDS 108 563 . + 0 ID=cds-AAN12399.1;Parent=gene-PBI_CHE8_1;Dbxref=NCBI_GP:AAN12399.1;Name=AAN12399.1;gbkey=CDS;gene=1;product=hypothetical protein;protein_id=AAN12399.1;transl_table=11
AY129330.1 Genbank gene 571 2208 . + . ID=gene-PBI_CHE8_2;Name=2;gbkey=Gene;gene=2;gene_biotype=protein_coding;locus_tag=PBI_CHE8_2
AY129330.1 Genbank CDS 571 2208 . + 0 ID=cds-AAN12400.1;Parent=gene-PBI_CHE8_2;Dbxref=NCBI_GP:AAN12400.1;Name=AAN12400.1;gbkey=CDS;gene=2;product=hypothetical protein;protein_id=AAN12400.1;transl_table=11
AY129330.1 Genbank gene 2239 3609 . + . ID=gene-PBI_CHE8_3;Name=3;gbkey=Gene;gene=3;gene_biotype=protein_coding;locus_tag=PBI_CHE8_3
AY129330.1 Genbank CDS 2239 3609 . + 0 ID=cds-AAN12401.1;Parent=gene-PBI_CHE8_3;Dbxref=NCBI_GP:AAN12401.1;Name=AAN12401.1;gbkey=CDS;gene=3;product=hypothetical protein;protein_id=AAN12401.1;transl_table=11
##sequence-region AY129331.1 1 75931
##species https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=205869
AY129331.1 Genbank region 1 75931 . + . ID=id-1;Dbxref=taxon:205869;gbkey=Src;mol_type=genomic DNA;old-name=Mycobacteriophage CJW1
AY129331.1 Genbank gene 276 569 . + . ID=gene-PBI_CJW1_1;Name=1;gbkey=Gene;gene=1;gene_biotype=protein_coding;locus_tag=PBI_CJW1_1
AY129331.1 Genbank CDS 276 569 . + 0 ID=cds-AAN01616.1;Parent=gene-PBI_CJW1_1;Dbxref=NCBI_GP:AAN01616.1;Name=AAN01616.1;gbkey=CDS;gene=1;product=hypothetical protein;protein_id=AAN01616.1;transl_table=11
AY129331.1 Genbank gene 566 751 . + . ID=gene-PBI_CJW1_2;Name=2;gbkey=Gene;gene=2;gene_biotype=protein_coding;locus_tag=PBI_CJW1_2
AY129331.1 Genbank CDS 566 751 . + 0 ID=cds-AAN01617.1;Parent=gene-PBI_CJW1_2;Dbxref=NCBI_GP:AAN01617.1;Name=AAN01617.1;gbkey=CDS;gene=2;product=hypothetical protein;protein_id=AAN01617.1;transl_table=11
AY129331.1 Genbank gene 748 1038 . + . ID=gene-PBI_CJW1_3;Name=3;gbkey=Gene;gene=3;gene_biotype=protein_coding;locus_tag=PBI_CJW1_3
AY129331.1 Genbank CDS 748 1038 . + 0 ID=cds-AAN01618.1;Parent=gene-PBI_CJW1_3;Dbxref=NCBI_GP:AAN01618.1;Name=AAN01618.1;gbkey=CDS;gene=3;product=hypothetical protein;protein_id=AAN01618.1;transl_table=11