Entering edit mode
5 weeks ago
reza
▴
300
i have a list of gene names and a file (gpff format) including proteins sequences. i want to extract protein sequence from gpff format file for each gene. how can i do this?
A part of gpff format file
LOCUS XP_031247110 372 aa linear PLN 22-OCT-2019
DEFINITION GDSL esterase/lipase At4g16230-like [Pistacia vera].
ACCESSION XP_031247110
VERSION XP_031247110.1
DBLINK BioProject: PRJNA578116
DBSOURCE REFSEQ: accession XM_031391250.1
KEYWORDS RefSeq; includes ab initio.
SOURCE Pistacia vera
ORGANISM Pistacia vera
Eukaryota; Viridiplantae; Streptophyta; Embryophyta; Tracheophyta;
Spermatophyta; Magnoliopsida; eudicotyledons; Gunneridae;
Pentapetalae; rosids; malvids; Sapindales; Anacardiaceae; Pistacia.
COMMENT MODEL REFSEQ: This record is predicted by automated computational
analysis. This record is derived from a genomic sequence
(NW_022196320.1) annotated using gene prediction method: Gnomon.
Also see:
Documentation of NCBI's Annotation Process
##Genome-Annotation-Data-START##
Annotation Provider :: NCBI
Annotation Status :: Full annotation
Annotation Name :: Pistacia vera Annotation Release 100
Annotation Version :: 100
Annotation Pipeline :: NCBI eukaryotic genome annotation
pipeline
Annotation Software Version :: 8.2
Annotation Method :: Best-placed RefSeq; Gnomon
Features Annotated :: Gene; mRNA; CDS; ncRNA
##Genome-Annotation-Data-END##
##RefSeq-Attributes-START##
ab initio :: 8% of CDS bases
##RefSeq-Attributes-END##
COMPLETENESS: full length.
FEATURES Location/Qualifiers
source 1..372
/organism="Pistacia vera"
/cultivar="Batoury"
/db_xref="taxon:55513"
/chromosome="Unknown"
/tissue_type="leaf"
/country="China"
Protein 1..372
/product="GDSL esterase/lipase At4g16230-like"
/calculated_mol_wt=40507
CDS 1..372
/gene="LOC116104818"
/coded_by="XM_031391250.1:1..1119"
/db_xref="GeneID:116104818"
ORIGIN
1 mtekiptkfl llcfpllaif fpcnvycwst ygsqikgmfv fgsslvdngn nnflltlaka
61 nyspygvdfp ggpsgrftng mnvidllgee lqlpslipvf ydpstkggrt ivhgvnyasg
121 gsgilndtgs iagnvvslne qirnfdevtl pelkthvdcr stdllhnylf vvgsggndys
181 fnyfltqana nvsveaftdn linslsqqlk klyslggrkf vlmsvnplgc npvarasqpt
241 gqdgciqvln qaahlfnsrl rltvdfirpq mpgstlvfvn sykiitdiig dpvsngfndt
301 rkaccqvlsv neggngilck rggrvcaern ihvffdglhp teavniqiak kafgsynrde
361 vypinvrqla kl