Updating my previous post, i made my gene list with GeneID (no more with Locus Tag), then i have a list like this
36344776
36346103
36336986
36341279
36339830
36346795
36343976
36345820
36338828
36337519
36344362
36344959
36346561
36342521
36346551
36340258
36341314
36342966
36337111
36336623
Then i performed this command
epost -db gene -input "file2.txt" | elink -target protein -name gene_protein_refseq | efetch -format fasta_cds_na
and i finally get what in need, here is (part of) the output:
>lcl|XM_024500329.1_cds_XP_024345260.1_1 [locus_tag=EGR_11080] [db_xref=GeneID:36346795] [protein=Non-capsid protein NS-1] [protein_id=XP_024345260.1] [location=1..588] [gbkey=CDS]
ATGTTTAACCTACGCGCGGTTCCGACGGCCGGCCGTCGCGCAGGCAATCCGGATTTGCGAACCAATCAGC
GTAGAGCACACGCCCGCACCACGGAGACGCACGCATCCCACCCCCCCGGAAGCGGAAGTGAGTCCATTCA
ATGGCTGGAAGACATGTTTTTAGCCAATGAAATCGCCCTTGTCGATTTTGCAATTACGCTTCGCATTATA
ATGAATTGCGAAGATGAGAAAATCAACACTCTTGTGTTGTACGGCCCGACCAATACGGGCAAATCGCTTA
TTTGTAAGCTGACAACGACCTTCCTTGAGCATGGCAGTGTCATGCGCAGGCAGGGGGCATCAGCCTTCGC
TTACGAGAACCTTCTTAATAGGAAGGTTGCGTTAATGGAGGAGCCTGGGATCTGCGCTGCTAACCAGCAG
GATCTGAAGCAGATCCTAGGAGGCGAGACATTTAAGGGCCCCAAAGACATGCAGACGACCCAACAGGCTC
CACAGCCTGTTCAGAGCACTGCCCAACACCCGCCTGCACCTGCGACTCCTACACCACATACATGGCTTAA
AGGTGACACCACCACCTTGTCGGCTTGA
>lcl|XM_024500095.1_cds_XP_024345491.1_1 [locus_tag=EGR_10846] [db_xref=GeneID:36346561] [protein=hypothetical protein] [protein_id=XP_024345491.1] [location=1..693] [gbkey=CDS]
ATGTCGCGACACTTCATGCACTCCGCCAAGGGTACGGATTCCCTACAGCAATCACTGGAAGTGAGGTCGG
TTCGAAGAAGACTAGATGGTCGCAATGTTGCCGTGGACTTGCGTCGTCGTCGGTGCTCCCCTCGGTCAAC
CTCCTGTATCGGCCATCACAACGTAGTCGGTGTTGTAAGAAAGGCACCCAAGAACCAACCGTGCCAATTA
TGGAAATGGCTTGTAACTGAGAAGATATTTGACCACAACGTCAGTGACGAGTTGCTGCAGATCAATGGTT
TCATGACGGCGGGAATGTCGTATCGTCGTGCGGTGGAGATAATCCGGGCGGGTGGCAACTTGGTCCGCCC
CGCTGCGCTGTCGCCGCTGCCTCTACCTGCGCTCTCCTTCGCACGCTTCCATCCACTGCCGCCTATACCG
TCTTCAGCCCCCGTGGTGCCGCAGCCACCACCGGCGCCTCCACTACCACCGCCATATCTCCGCTCACTGC
AGTTCTTTGCCCCTTCATCCGTGCACAAATCATCCATTGGATCCCATCCCCTACTCTCCTCGGCCTCCGC
CGCAGCCGTTGCCTCCACCGCACTACCAGAATGGGGGAATGAGGATGCCTCTGGGTTTCCCCTTCTCTCC
ACCACATTTCCGCCACCCGTTTCTCTCCTCCTCTCTCCACCTCCCACCATCACTGAAATCTAA
So i think this is the way to do this, please correct me if im wrong.
My last question is, is there any way to print the output to a file?
Thank you again and hope for the last time!
You can use Entrezdirect. An
epost
solution can be applied to many entries in one file to get them all.To get nucleotide sequence
Problem is that
locus_tag
returns 163 entries for this genome so I not sure which of these sequences are you interested in or all of them.OR
Thank you for your answer, that is nearly about what im searching for. As you say, in this example "EGR_04594" the command returns 163 entries, but if you look at every entry, there is only one called "EGR_04594". Here are some examples (only first lines are showed):
The code:
Some examples of the entries:
This one returns the locus tag for EGR_04541
This returns the locus tag for EGR_04543
And this entry is for EGR_04494 (this is the entry that im looking for)
I dont know why it returns many entries, if i only asked for "EGR_04594", i really appreciate your help, i was watching for something like this many time ago, hope you may still help me with this last question, thank you!
See the new answer below.
Thank you for your kind help! Now that i found that, im trying to perform the same command, but with many genes from a list. I Have a file called "file.txt" with my genes, and its like this (only few genes are shown, for space reasons):
Then i tried a command, but did not work...i will show:
INPUT
OUTPUT
Hope you may help me again, im really newbie in this things. Thank you again.
Please use
ADD COMMENT/ADD REPLY
when responding to existing posts to keep threads logically organized.SUBMIT ANSWER
is for new answers to original question.