Hi everyone!
This entrez command give me an outpout with the genes sequences of a gen/enzymes for various organism:
esearch -db gene -query "glutaminase-asparaginase [Gene/Protein Name] AND (bacteria [orgn] OR fungi [orgn] OR archaea [orgn]) AND alive [prop]" | efetch -format docsum | xtract -pattern GenomicInfoType -element ChrAccVer -element ChrStart -element ChrStop |xargs -n 3 sh -c 'efetch -db nuccore -id "$0" -seq_start "$1" -seq_stop "$2" -format fasta'
The output is similiar to:
>NC_030957.1:c4121890-4120582 Colletotrichum higginsianum
TGAGAGCTTCTTACTTGTCGACGCTGTTGTTGCCAGCTCTGGTAGCCCATGGTTTCGCCTCCCCAGTCGG
>NC_016603.1:c898826-897759 Acinetobacter pittii
TGTTGACTAAAACTGTTAAATCTTTAGGTTTAGCGATGGGCTTATTAG
>NC_002947.4:c2800289-2799201 Pseudomonas putida
TGAATGCCGCACTGAAAACCTTCGCCCCAAGCGCACTCGCCCTGCTGCTGATCCTGCCATCCAGCGCCTC
But I need to do this for several genes that i have in a first column of a table, like that:
GeneNameA OtherColumn OtherColumn
GeneNameB OtherColumn Other Colmn
I am searching for a Perl script that read the first column and pass each GeneName to this space of the entrez command : "X" [Gene/Protein Name], and create a multifasta files that contains the sequences for each Gene separetely.
My programming skills are poor yet and I am stuck in this part. I´ll will be grateful with your help!
I added (code) markup to your post for increased readability. You can do this by selecting the text and clicking the 101010 button. When you compose or edit a post that button is in your toolbar, see image below: