Hi all,
I' m wondering if is there a a way to get the sequence of a gene just from the scientific name of the species. More precisely, I have a list of more than 2000 plants and I need the sequence of the trnL for all of them in order to build a database. Is there a way to do it by creating for example a query? I've seen that Entrez maybe could do it? But I'm really far from being an expert in bioinformatic and I have no idea on how to write the code! I would appreciate if someone could help me.
Thanks a lot!
Yes, it is definitely possible, but the query is a bit more complex than one might think. trnL is a tRNA gene and thus has multiple copies with the same name. Try something like this:
Wow, thanks a lot for your reply! But how to do with multiple names? I have a list of 2000, is it possible to put a file with all the names instead of a single name?
Yes, it is possible too. Put all the names in a text file, one name per line, and then use a little bash script. I can post code as soon as I figure out how to get fasta sequence from a gene entry. It is not as easy as it looks at first :)
seemingly the following query works for a single species:
It would be amazing! Thanks a lot, you are very kind, I wait for your reply :)
Could it be:
IFS=$'\n'; for next in $(scientificname_list.txt); do esearch -db nuccore -query '(trnl[gene])| efetch -db nucleotide -format fasta; done
?