Entering edit mode
5.1 years ago
Chvatil
▴
130
I would like from a protein fasta file to get the corresponding NCBI nucleotide sequences in a new fasta file.
I know that esearch
can do it in python but I do know how to handle it with fasta file etc.
I guess it should be something like:
from Bio import SeqIO
import Entrez
with open Nucleotide_fasta_file as nuc_file:
for record in SeqIO.parse("Protein_fasta_file", "fasta"):
Entrez.esearch (db='protein',query=record.id)
print(">",the nucleotide id,file=nuc_file)
print(the nucleotide sequence,file=nuc_file)
Here is an exemple: If I have a fasta prom file such as:
>YP_001883411.1
MDEMPQHSHNLPSPPTDTLRPSSSHNGPRKENGDKDLQPLQPPSSTMGNQHNISVKIGMI
MDNFNDVIERTLNRMHIYVMTEKINFICMAQTQYHNLVFERELFDILCKNKYAIEHEAAD
NNPSDDSLHLANNDYPFSLIICSKEAKRSSILLSNVGLVLVKFILYLRMLYRNVARPDSP
NCFLAINTRILPFGRVCIDIDYKIPLLGEGDGDYDEFVRKSFELVAQYTTMGNIIMTRNC
YTPNTRSFHLITEQQFDATTRHIIFMRIAEGIRALNDNVKIDQVHVWMLPFGRGHVPVRK
YDRRLNEFMDLVYPYTEVDFELMMPFDVSQGMDNLYTLYSLNTDEVAVANGDDFNYDVLH
ECLGRDIMDSYYIGDTSDIAKNLQLLSRVLSSRYEFAFSSQFRQRYDANYLSRVMGNKFN
NAFIIMSNKKLQLETSWNIPKPKAKPRPSERYEIYRFIEHRLLGEMRTLEDLFKVMPSDL
VKRNVNIGNVDELQNEELAKRKLASEINVDSVYTGPVHNLALDIAENQCIDFVWDNIVED
TSSHPWPYVYIGRSQIVQTAMSHMRNVHEYYTRTIVDNVYTYEKHDGIMRCFENIQSMFD
GQLLADTMNALCQNIFCSDFQRVQSYLFTMYEFFCRVHYIDACVTRSEFEEVVEAYLLET
VRPMFDEADLKTYRERHVQMTPGPKAYAKFPSPCSPLARVWDSIRPANKILLHIIYLMIV
EHNYSSVFFHLHTITRNKDNSQILASLFLHIIDNSYMTTPEGDEDSGDDTMNVVRSYASK
EFLNFIYMMFINAGVNYECDLRGKNVVFSSVDSKDYINDMKELIVTSPLWFFLCNYQYVD
EQMSYSNRFDLFAAIFRQDNSAGGRQTSSPPSADQRTSNGPSPPKQRRGVSGGGSGGSGG
NNGAANFNTERVLNANNISDRYHADLLRIFFRYILAYMKTETGVYIYDGVRMMSLPFKEP
SNIPQLEVKDPVSFLGMYRHQYGIYNTWTMQMERNINVLNGQINISNDELGNYPHLFNPY
NDDIYRLLVNRFLKSITFTRVINYQKNLALFLAPIYDPNVENNLKVLNYNIDSIQINIHD
LASSEFNIPQEMFVDILDVGKKPKNKLYEMFKWLYCIVCHYSENYSCVITTPSTFIPKCM
LPECGEDKENIFSMVKGNGAMDDDDNNNNGYGGGDNFLQRLHSVLESKNREQKEIITDEL
QKLSQFELSTLVNLFKNAYMFEEDGEDAENGILEGIEGMDNNMEVDHDISAQSQTPGSPS
PQSSSANATTEEIFKFNLSVHRNMAGGGGGNGTDIMDFFEGDEFVENSKIKLLLNLFNRK
LSAEIKQMSPEAFKQIIEDNHSHHITRFVLLTLSWLIRTLHTHIFADTRFFRELQQYRQL
LYDDLSDLVFRHNGYFMYNNRITDVAQIFSHYCRHVELVVDPVFEMSMSVDRDLYLQHDA
ELEKRVSPEIVRDIEDACVSAIYQGQFIEDTNVDLSRLWARVTVPRNKHRISPLFTLHTA
TGKSEYLTERCRRHFNNKYFNNVLDPSSLAQTDHRGTDMARELNTNLIVCIEEFNSLTAK
FKQVCGYTSVAYKPLFADTKVSFQNNSTVILSTNNDPKCNEEAIVARLHVYPRRIQYANV
NKYLKFQRSSMLASTSLLKINNIMSVQMIMEKMPRVLAENYRGNFMMTWLLKRFFLFNII
DHVTVHTSETLQNHINNFYTMINAQEFVLQRLDMTTTSTMTLVQFRRLVNRICEENRSLF
NTKIDTYNVYKILCDRLKALINNDQQTIRISEKNDNALRQ
>YP_009345696.1
MDRYFINEMQLFERKLTFDSNNDYHHLEIISSHENQHIKKTKITLFTYNLLLDCLHYFYM
KCVDSNLFYDSGLTLVLHKEKKIFLNQLIFDVDFKSANISAIISKEKINDYNDFLNERNN
IIKNMIYIIFKYLKIDFTVENIQKYCSVTSRPLKLSFHLHIFYHVDYFTERILRFKMHNN
WKTFVTSMDTNFILDEPILHSLPFSRQHRPNKIHCQQTESEDVNAICINVDLLFLKDSLD
ISISNETWKLFSIFNIEKVLLKNISLLTKKKSDFTILIKEDYGNEINIFNGKKIGFFSFE
DINNFISTSMEIENVKNKEIGIIDEDEYNIHIPSVYINHIKRQHKIILYDIKDSIVHESI
DSVLFFDLLKYASFLSRYNKTDNFSINLSTKNYNENENIDEFYETVNLFSLKYKKDFFSI
FDDASNDEKEQNNIENSCEFDENDVYRRGEKNKEEINIICSDEKMKKFQHFSDSSFFPFT
EKQIHFLESIINKYNEDILSETDNAVCSKVLEYLESLNSLYPPLLFLLGSSFFNYSNDED
LQLISMYINNVDVNLPKIQTNKKRKLTTSKVDKNDIQRILKMDYKWIKEILNYIINCGCV
TSSVYLLSRIGVFQSLFDNIYYILSSMSLTLHAQYILSKWIQIVPEPITFFSSSFYNSDI
AYMCLILFDFNEKSREHKEKKNFLSGDNLNIKFIYSLFIKILSKFEIEYNSELEVFFITQ
ILMCRSLGYRAIYFNGHGHVISQDTSFLKDFCEKKSTLMLPKYKATDDVLNSFVYIENVG
IFNTLFNVYEFPSPSLNSLVTHKLPSVLTYCDNNINYFSHTTYPLLQNFIFELYGKMFHF
SKKCKENVVSLILFSPNVRISKKCPEILMELDILNFYDSSFIFLLEDYAKILNNHSNIEE
YWNDDFISIITTTNDKYSFFLKRLLFIFYIIFQENKILNFSNIILFIQTLFGQSKYEQLG
LRKQTFFKNINANNNKNNNNNCTTTDDASNGENKDIEMIEDDKLYHKTISYSAKHFTDKM
EDNTCFFNDIRKRINSVLYEQHELKKKIINETILTTKIIKPSLLSKCEEQNNYMNFCLSD
YYENVDDIILPFDKNDKSLFNTKIKNHFLNVLLNIDNSENNDYIELISSIDLTKQFKKNL
YFLVLLLNWFLKMGNIHAYSNTSFFKDIQKYRQEIYNLMTNHVLKSFGPLLKFSSSFHLA
NIIQKFNENTELDFDFVLEKFSHLKHPFYEKNNNKKLLNNENIKKKYFSISDRDIIADNE
MLRSKNHIYHAMAVLLMFSEFNFDTLTDVIKFMIYILYKGNMLRICLYLFGVTESLKSRF
TEILANLTQTDQTQSFNNANISRSAVQDFDSIVISGASNTFIFFDEVEKVCITRFKTIVN
SIAMSSRDIKNSEAINLKLACTPLMSSNSPFVVDNASCARLRPIRKKMQFCETFSGETFD
RFSIDSLKEYISTPNIGGIFLINRIPMWHDNSASIIGFLLIQRYLYPYFFKSYTSPISKK
MSKTMKNELRIYMSNNHPVAFFLNTVKISVDHNFISEQDFYKLIDKWWLAYKDRFKNSDF
DSKSLVKEISQYLTQYKSTLNNVKGYNIKIVLE
Ok thank you I tried in python :
So you were able to make it work? I can move my comment to an answer in that case.
yes it is done thank you :)