Entering edit mode
6 months ago
G.S
▴
60
Hi
I need help to edit the NCBI GFF3 file so I can annote my consensus sequence proteins. I have Ns stretches (total 12 Ns) at the beginning and end of my sequence compared o NCBI sequnce. Please any idea how can I modify the gFF3 files see the attached picture. Then I want to annotate the coding and non coding regions. I have written this code but these values is not correct based on my sequence?? Any idea how can I edit this value to match my sequence?
coding = case_when(Position >= 45 & Position <= 98 ~ "Non-coding",
Position >= 596 & Position <= 627 ~ "Non-coding",
Position >= 1126 & Position <= 1140 ~ "Non-coding",
Position >= 2230 & Position <= 2346 ~ "Non-coding",
Position >= 3253 & Position <= 3261 ~ "Non-coding",
Position >= 4220 & Position <= 4303 ~ "Non-coding",
Position >= 4674 & Position <= 4687 ~ "Non-coding",
Position >= 5649 & Position <= 5661 ~ "Non-coding",
Position >= 7598 & Position <= 7606 ~ "Non-coding",
Position >= 99 & Position <= 504 ~ "Coding",
Position >= 507 & Position <= 988 ~ "Coding",
Position >= 991 & Position <= 2302 ~ "Coding",
Position >= 2305 & Position <= 3058 ~ "Coding",
Position >= 3061 & Position <= 4018 ~ "Coding",
Position >= 4021 & Position <= 4484 ~ "Coding",
Position >= 4487 & Position <= 5571 ~ "Coding",
Position >= 5574 & Position <= 7372 ~ "Coding",
Position >= 7375 & Position <= 8171 ~ "Coding",
Position >= 8180 & Position <= 8418 ~ "Coding",
Position >= 8421 & Position <= 14982 ~ "Coding"
Here is the link for NCBI reference sequence https://www.ncbi.nlm.nih.gov/nuccore/KT992094
##sequence-region KT992094.1 1 15223
##species https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=11250
KT992094.1 Genbank region 1 15223 . + . ID=KT992094.1:1..15223;Dbxref=taxon:11250;gb-acronym=HRSV;gbkey=Src;genome=genomic;mol_type=viral cRNA;note=recombinant D46/D53 strain;strain=A2
KT992094.1 Genbank gene 45 576 . + . ID=gene-NS1;Name=NS1;gbkey=Gene;gene=NS1;gene_biotype=protein_coding
KT992094.1 Genbank CDS 99 518 . + 0 ID=cds-ALS35583.1;Parent=gene-NS1;Dbxref=NCBI_GP:ALS35583.1;Name=ALS35583.1;gbkey=CDS;gene=NS1;product=nonstructural protein 1;protein_id=ALS35583.1
KT992094.1 Genbank gene 596 1098 . + . ID=gene-NS2;Name=NS2;gbkey=Gene;gene=NS2;gene_biotype=protein_coding
KT992094.1 Genbank CDS 628 1002 . + 0 ID=cds-ALS35584.1;Parent=gene-NS2;Dbxref=NCBI_GP:ALS35584.1;Name=ALS35584.1;gbkey=CDS;gene=NS2;product=nonstructural protein 2;protein_id=ALS35584.1
KT992094.1 Genbank gene 1126 2328 . + . ID=gene-N;Name=N;gbkey=Gene;gene=N;gene_biotype=protein_coding
KT992094.1 Genbank CDS 1141 2316 . + 0 ID=cds-ALS35585.1;Parent=gene-N;Dbxref=NCBI_GP:ALS35585.1;Name=ALS35585.1;gbkey=CDS;gene=N;product=nucleoprotein;protein_id=ALS35585.1
KT992094.1 Genbank gene 2330 3243 . + . ID=gene-P;Name=P;gbkey=Gene;gene=P;gene_biotype=protein_coding
KT992094.1 Genbank CDS 2347 3072 . + 0 ID=cds-ALS35586.1;Parent=gene-P;Dbxref=NCBI_GP:ALS35586.1;Name=ALS35586.1;gbkey=CDS;gene=P;product=phosphoprotein;protein_id=ALS35586.1
KT992094.1 Genbank gene 3253 4210 . + . ID=gene-M;Name=M;gbkey=Gene;gene=M;gene_biotype=protein_coding
KT992094.1 Genbank CDS 3262 4032 . + 0 ID=cds-ALS35587.1;Parent=gene-M;Dbxref=NCBI_GP:ALS35587.1;Name=ALS35587.1;gbkey=CDS;gene=M;product=matrix protein;protein_id=ALS35587.1
KT992094.1 Genbank gene 4220 4629 . + . ID=gene-SH;Name=SH;gbkey=Gene;gene=SH;gene_biotype=protein_coding
KT992094.1 Genbank CDS 4304 4498 . + 0 ID=cds-ALS35582.1;Parent=gene-SH;Dbxref=NCBI_GP:ALS35582.1;Name=ALS35582.1;gbkey=CDS;gene=SH;product=small hydrophobic protein;protein_id=ALS35582.1
KT992094.1 Genbank gene 4674 5596 . + . ID=gene-G;Name=G;gbkey=Gene;gene=G;gene_biotype=protein_coding
KT992094.1 Genbank CDS 4689 5585 . + 0 ID=cds-ALS35588.1;Parent=gene-G;Dbxref=NCBI_GP:ALS35588.1;Name=ALS35588.1;gbkey=CDS;gene=G;product=attachment glycoprotein G;protein_id=ALS35588.1
KT992094.1 Genbank gene 5649 7551 . + . ID=gene-F;Name=F;gbkey=Gene;gene=F;gene_biotype=protein_coding
KT992094.1 Genbank CDS 5662 7386 . + 0 ID=cds-ALS35589.1;Parent=gene-F;Dbxref=NCBI_GP:ALS35589.1;Name=ALS35589.1;gbkey=CDS;gene=F;product=fusion protein;protein_id=ALS35589.1
KT992094.1 Genbank gene 7598 8558 . + . ID=gene-M2;Name=M2;gbkey=Gene;gene=M2;gene_biotype=protein_coding
KT992094.1 Genbank CDS 7607 8191 . + 0 ID=cds-ALS35591.1;Parent=gene-M2;Dbxref=NCBI_GP:ALS35591.1;Name=ALS35591.1;gbkey=CDS;gene=M2;product=m2-1;protein_id=ALS35591.1
KT992094.1 Genbank CDS 8160 8432 . + 0 ID=cds-ALS35592.1;Parent=gene-M2;Dbxref=NCBI_GP:ALS35592.1;Name=ALS35592.1;gbkey=CDS;gene=M2;product=m2-2 protein;protein_id=ALS35592.1
KT992094.1 Genbank gene 8491 15068 . + . ID=gene-L;Name=L;gbkey=Gene;gene=L;gene_biotype=protein_coding
KT992094.1 Genbank CDS 8499 14996 . + 0 ID=cds-ALS35590.1;Parent=gene-L;Dbxref=NCBI_GP:ALS35590.1;Name=ALS35590.1;gbkey=CDS;gene=L;product=L polymerase protein;protein_id=ALS35590.1
Thanks in advance,