virus genome annotation
0
1
Entering edit mode
6 months ago
G.S ▴ 60

Hi

I need help to edit the NCBI GFF3 file so I can annote my consensus sequence proteins. I have Ns stretches (total 12 Ns) at the beginning and end of my sequence compared o NCBI sequnce. Please any idea how can I modify the gFF3 files see the attached picture. Then I want to annotate the coding and non coding regions. I have written this code but these values is not correct based on my sequence?? Any idea how can I edit this value to match my sequence?

coding = case_when(Position >= 45     & Position <= 98 ~ "Non-coding",
                   Position >= 596  & Position <= 627 ~ "Non-coding",
                   Position >= 1126 & Position <= 1140 ~ "Non-coding",
                   Position >= 2230 & Position <= 2346 ~ "Non-coding",
                   Position >= 3253 & Position <= 3261 ~ "Non-coding",
                   Position >= 4220 & Position <= 4303 ~ "Non-coding",
                   Position >= 4674 & Position <= 4687 ~ "Non-coding",
                   Position >= 5649 & Position <= 5661 ~ "Non-coding",
                   Position >= 7598 & Position <= 7606 ~ "Non-coding",
                   Position >= 99     & Position <= 504 ~ "Coding",
                   Position >= 507  & Position <= 988 ~ "Coding",
                   Position >= 991  & Position <= 2302 ~ "Coding",
                   Position >= 2305 & Position <= 3058 ~ "Coding",
                   Position >= 3061 & Position <= 4018 ~ "Coding",
                   Position >= 4021 & Position <= 4484 ~ "Coding",
                   Position >= 4487 & Position <= 5571 ~ "Coding",
                   Position >= 5574 & Position <= 7372 ~ "Coding",
                   Position >= 7375 & Position <= 8171 ~ "Coding",
                   Position >= 8180 & Position <= 8418 ~ "Coding",
                   Position >= 8421 & Position <= 14982 ~ "Coding"

Here is the link for NCBI reference sequence https://www.ncbi.nlm.nih.gov/nuccore/KT992094

##sequence-region KT992094.1 1 15223
##species https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=11250
KT992094.1  Genbank region  1   15223   .   +   .   ID=KT992094.1:1..15223;Dbxref=taxon:11250;gb-acronym=HRSV;gbkey=Src;genome=genomic;mol_type=viral cRNA;note=recombinant D46/D53 strain;strain=A2
KT992094.1  Genbank gene    45  576 .   +   .   ID=gene-NS1;Name=NS1;gbkey=Gene;gene=NS1;gene_biotype=protein_coding
KT992094.1  Genbank CDS 99  518 .   +   0   ID=cds-ALS35583.1;Parent=gene-NS1;Dbxref=NCBI_GP:ALS35583.1;Name=ALS35583.1;gbkey=CDS;gene=NS1;product=nonstructural protein 1;protein_id=ALS35583.1
KT992094.1  Genbank gene    596 1098    .   +   .   ID=gene-NS2;Name=NS2;gbkey=Gene;gene=NS2;gene_biotype=protein_coding
KT992094.1  Genbank CDS 628 1002    .   +   0   ID=cds-ALS35584.1;Parent=gene-NS2;Dbxref=NCBI_GP:ALS35584.1;Name=ALS35584.1;gbkey=CDS;gene=NS2;product=nonstructural protein 2;protein_id=ALS35584.1
KT992094.1  Genbank gene    1126    2328    .   +   .   ID=gene-N;Name=N;gbkey=Gene;gene=N;gene_biotype=protein_coding
KT992094.1  Genbank CDS 1141    2316    .   +   0   ID=cds-ALS35585.1;Parent=gene-N;Dbxref=NCBI_GP:ALS35585.1;Name=ALS35585.1;gbkey=CDS;gene=N;product=nucleoprotein;protein_id=ALS35585.1
KT992094.1  Genbank gene    2330    3243    .   +   .   ID=gene-P;Name=P;gbkey=Gene;gene=P;gene_biotype=protein_coding
KT992094.1  Genbank CDS 2347    3072    .   +   0   ID=cds-ALS35586.1;Parent=gene-P;Dbxref=NCBI_GP:ALS35586.1;Name=ALS35586.1;gbkey=CDS;gene=P;product=phosphoprotein;protein_id=ALS35586.1
KT992094.1  Genbank gene    3253    4210    .   +   .   ID=gene-M;Name=M;gbkey=Gene;gene=M;gene_biotype=protein_coding
KT992094.1  Genbank CDS 3262    4032    .   +   0   ID=cds-ALS35587.1;Parent=gene-M;Dbxref=NCBI_GP:ALS35587.1;Name=ALS35587.1;gbkey=CDS;gene=M;product=matrix protein;protein_id=ALS35587.1
KT992094.1  Genbank gene    4220    4629    .   +   .   ID=gene-SH;Name=SH;gbkey=Gene;gene=SH;gene_biotype=protein_coding
KT992094.1  Genbank CDS 4304    4498    .   +   0   ID=cds-ALS35582.1;Parent=gene-SH;Dbxref=NCBI_GP:ALS35582.1;Name=ALS35582.1;gbkey=CDS;gene=SH;product=small hydrophobic protein;protein_id=ALS35582.1
KT992094.1  Genbank gene    4674    5596    .   +   .   ID=gene-G;Name=G;gbkey=Gene;gene=G;gene_biotype=protein_coding
KT992094.1  Genbank CDS 4689    5585    .   +   0   ID=cds-ALS35588.1;Parent=gene-G;Dbxref=NCBI_GP:ALS35588.1;Name=ALS35588.1;gbkey=CDS;gene=G;product=attachment glycoprotein G;protein_id=ALS35588.1
KT992094.1  Genbank gene    5649    7551    .   +   .   ID=gene-F;Name=F;gbkey=Gene;gene=F;gene_biotype=protein_coding
KT992094.1  Genbank CDS 5662    7386    .   +   0   ID=cds-ALS35589.1;Parent=gene-F;Dbxref=NCBI_GP:ALS35589.1;Name=ALS35589.1;gbkey=CDS;gene=F;product=fusion protein;protein_id=ALS35589.1
KT992094.1  Genbank gene    7598    8558    .   +   .   ID=gene-M2;Name=M2;gbkey=Gene;gene=M2;gene_biotype=protein_coding
KT992094.1  Genbank CDS 7607    8191    .   +   0   ID=cds-ALS35591.1;Parent=gene-M2;Dbxref=NCBI_GP:ALS35591.1;Name=ALS35591.1;gbkey=CDS;gene=M2;product=m2-1;protein_id=ALS35591.1
KT992094.1  Genbank CDS 8160    8432    .   +   0   ID=cds-ALS35592.1;Parent=gene-M2;Dbxref=NCBI_GP:ALS35592.1;Name=ALS35592.1;gbkey=CDS;gene=M2;product=m2-2 protein;protein_id=ALS35592.1
KT992094.1  Genbank gene    8491    15068   .   +   .   ID=gene-L;Name=L;gbkey=Gene;gene=L;gene_biotype=protein_coding
KT992094.1  Genbank CDS 8499    14996   .   +   0   ID=cds-ALS35590.1;Parent=gene-L;Dbxref=NCBI_GP:ALS35590.1;Name=ALS35590.1;gbkey=CDS;gene=L;product=L polymerase protein;protein_id=ALS35590.1

Thanks in advance,

enter image description here

enter image description here

annotation gff3 consensus NCBI coding_regions • 288 views
ADD COMMENT

Login before adding your answer.

Traffic: 1388 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6