how to relate protein id(protein sequence) to genbank file seuqnce
1
0
Entering edit mode
8.9 years ago
kws15 ▴ 40

Hi everyone,

I am completely new to bioinformatics and I'm working on a project about tomato. So I have used some package to identify the orthologs of S.pennellii to transcription factors of S.lycopersicum. I did that by aligning the S.lycopersicum's transcription factor protein sequences against all the protein sequences (fasta file on ncbi) of S.pennellii.

Now I basically have something like this

Solyc07g053610.2.1 100%,Sopen07g027560.100%

What I want to do about these protein ids is that I want to relate them to genbank file (nucleotide sequences), does anyone have any idea how can I do this? These protein id may not be compatible with the genbank files as they having different naming system? Thank you very much

genbank protein id • 2.2k views
ADD COMMENT
0
Entering edit mode
8.9 years ago
piet ★ 1.9k

It seems that these protein identifiers have only been used internally by ITAG (international tomato annotation group) but never submitted to Genbank.

There is currently only one full genome of tomato in Genbank. It has seen some upgrades in recent years, but with every upgrade the chromosomal coordinates are shifted.The latest assembly from ITAG is available as a NCBI refsequence. This refsequence has been automatically reannotated by NCBI, but the original ITAG annotation can be downloaded from www.solgenomics.net.

wget ftp://ftp.solgenomics.net/tomato_genome/annotation/ITAG2.4_release/ITAG2.4_gene_models.gff3

The GFF file can be grepped for the position of protein Solyc07g053610 in the chromosomal DNA sequence:

awk '$3~/gene/ && $9~/Solyc07g053610/' ITAG2.4_gene_models.gff3 | sed 's/SL2.50ch07/NC_015444.2/'

NC_015444.2     ITAG_eugene     gene    62033451        62049779        .       +       .       ID=gene:Solyc07g053610.2;Name=Solyc07g053610.2;Alias=Solyc07g053610;from_BOGAS=1;length=16329

Table on mapping between chromosome numbers and NCBI refsequence accessions here

ADD COMMENT

Login before adding your answer.

Traffic: 1848 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6