How to retrieve protein sequence from gene ID and output a fasta file
1
I want to receive the protein sequence of the following gene IDs and output a fasta file with the sequences with its identifier.
handle = Entrez.esearch(db="gene",
term="primate[Orgn] AND TNF[Gene Name]",
idtype="acc",
retmax='50',
)
record = Entrez.read(handle)
idlist = record['IdList']
print(idlist)
But I am not sure where to go from here. Any help would be appreciated.
ncbi
gene
protein
biopython
entrez
• 1.6k views
Using command line EntrezDirect (truncated for space) :
$ esearch -db gene -query "primate [orgn] AND TNF [gene]" | elink -target protein | efetch -format fasta > tnf.fa
$ more tnf.fa
>sp|Q19LH4.1|TNFA_CALJA RecName: Full=Tumor necrosis factor; AltName: Full=Cachectin; AltName: Full=TNF-alpha; AltName: Full=Tumor necrosis factor ligand superfamily member 2; Short=TNF-a; Contains: RecName: Full=Tumor necrosis factor, membrane form; AltName: Full=N-terminal fragment; Short=NTF; Contains: RecName: Full=Intracellular domain 1; Short=ICD1; Contains: RecName: Full=Intracellular domain 2; Short=ICD2; Contains: RecName: Full=C-domain 1; Contains: RecName: Full=C-domain 2; Contains: RecName: Full=Tumor necrosis factor, soluble form; Flags: Precursor
MSTETMIQDVELAEEALPKTRGPQGSKRRLFLSLFSFLLVAGATALFCLLHFGVIGPQKDELSKDFSLIS
PLALAVRSSSRIPSDKPVAHVVANPQAEGQLQWLNRRANALLANGVELRDNQLVVPSEGLYLVYSQVLFK
GQGCPSNFMLLTHSISRIAVSYQAKVNLLSAIKSPCQRETPQGAKTNPWYEPIYLGGVFQLEKGDRLSAE
INLPDYLDLAESGQVYFGIIGL
>sp|P48094.1|TNFA_MACMU RecName: Full=Tumor necrosis factor; AltName: Full=Cachectin; AltName: Full=TNF-alpha; AltName: Full=Tumor necrosis factor ligand superfamily member 2; Short=TNF-a; Contains: RecName: Full=Tumor necrosis factor, membrane form; AltName: Full=N-terminal fragment; Short=NTF; Contains: RecName: Full=Intracellular domain 1; Short=ICD1; Contains: RecName: Full=Intracellular domain 2; Short=ICD2; Contains: RecName: Full=C-domain 1; Contains: RecName: Full=C-domain 2; Contains: RecName: Full=Tumor necrosis factor, soluble form; Flags: Precursor
MSTESMIRDVELAEEALPRKTAGPQGSRRCWFLSLFSFLLVAGATTLFCLLHFGVIGPQREEFPKDPSLI
SPLAQAVRSSSRTPSDKPVAHVVANPQAEGQLQWLNRRANALLANGVELTDNQLVVPSEGLYLIYSQVLF
KGQGCPSNHVLLTHTISRIAVSYQTKVNLLSAIKSPCQRETPEGAEAKPWYEPIYLGGVFQLEKGDRLSA
If you want to save individual sequence in a separate file then use:
$ esearch -db gene -query "primate [orgn] AND TNF [gene]" | elink -target protein | efetch -format acc | xargs -n 1 sh -c 'efetch -db protein -id "$0" -format fasta > "$0".fa'
Login before adding your answer.
Traffic: 2314 users visited in the last hour
This works thanks so much!