Download nucleotide sequence with locus_tag
1
I have a list of locus_tag
, my idea was to download them using esearch but the downloaded file is not the desired gene, instead the nucleotide sequence of the entire contig is downloaded.
in this example my gene of interest to download has 830 nc.
esearch -db nucleotide -query "JG64_RS07240" | efetch -format fasta > gen.fasta
Any idea to obtain by esearch only my sequence of interest and not all the contig?
I know I can do it manually, but I have more than 400 locus_tag that do not have gi.
Thanks for reading, I'll be attentive to any response
SEQUENCE
LOCUS_TAG
NUCLEOTIDE
NCBI
• 864 views
You can do this:
$ esearch -db nucleotide -query "JG64_RS07240" | efetch -format gene_fasta | awk '/^>/ {printf("%s%s\t",(N>0?"\n":""),$0);N++;next;} {printf("%s",$0);} END {printf("\n");}' | grep JG64_RS07240 | tr "\t" "\n"
>lcl|NZ_JQQM01000039.1_gene_17 [locus_tag=JG64_RS07240] [location=19567..20385] [gbkey=Gene]
ATGAAAAAACTTTCGATTTTGGCTATCTCCGTTGCACTCTTTGCAAGCATTACCGCTTGTGGTGCTTTCGGTGGTCTGCCAAGCCTAAAAAGCTCTTTTGTTCTGAGCGAGGACACAATCCCAGGGACAAACGAAACCGTAAAAACGTTACTTCCCTACGGATCTGTGATCAACTATTACGGATACGTAAAGCCAGGACAAGCGCCGGACGGTTTAGTCGATGGAAACAAAAAAGCATACTATCTCTATGTTTGGATTCCTGCCGTAATCGCTGAAATGGGAGTTCGTATGATTTCCCCAACAGGCGAAATCGGTGAGCCAGGCGACGGAGACTTAGTAAGCGACGCTTTCAAAGCGGCTACCCCAGAAGAAAAATCAATGCCACATTGGTTTGATACTTGGATCCGTGTAGAAAGAATGTCGGCGATTATGCCTGACCAAATCGCCAAAGCTGCGAAAGCAAAACCAGTTCAAAAATTGGACGATGATGATGATGGTGACGATACTTATAAAGAAGAGAGACACAACAAGTACAACTCTCTTACTAGAATCAAGATCCCTAATCCTCCAAAATCTTTTGACGATCTGAAAAACATCGACACTAAAAAACTTTTAGTAAGAGGTCTTTACAGAATTTCTTTCACTACCTATAAACCAGGTGAAGTGAAAGGATCTTTCGTTGCATCTGTTGGTCTGCTTTTCCCACCAGGTATTCCAGGTGTGAGCCCGCTGATCCACTCAAATCCTGAAGAATTGCAAAAACAAGCTATCGCTGCTGAAGAGTCTTTGAAAAAAGCTGCTTCTGACGCGACTAAGTAA
If you have a list of those ID's then use a for
loop.
Simply fetch all the gene sequences using
$ esearch -db nucleotide -query "JG64_RS07240" | efetch -format gene_fasta > all_genes.fa
$ for i in `cat ids.txt`; do awk '/^>/ {printf("%s%s\t",(N>0?"\n":""),$0);N++;next;} {printf("%s",$0);} END {printf("\n");}' < all_genes.fa | grep ${i} | tr "\t" "\n" >> needed.fa; done
needed.fa
will have sequences you want.
Login before adding your answer.
Traffic: 4295 users visited in the last hour
It worked, thank you very much. Greetings from Colombia.