Batch download of protein FASTAs from GenBank using a list (>100) of protein accessions
2
1
Entering edit mode
8.3 years ago
al-ash ▴ 210

Hi!

(I'm comletely new to unix and after being amazed by the capabilities of tools such as sed, awk and grep in the couple of previous days I'm now slowly trying to do something useful for my work:)

I'm trying to retrieve multiple protein FASTAs from GenBank using a list of protein accessions (such as "XP_015438716.1" which I have in a file, one accession per line; several tens to two hundreds accessions in total per file for which I would like to download the protein FASTAs) and save the FASTAs into one file. I would like to do this not via web (e.g. http://www.ncbi.nlm.nih.gov/sites/batchentrez) but via a command line (using bash commands or unix utilities) as I'd like to build this step into a pipeline which I try to construct.

I played with E-utils and particularly with efetch, which works fine for downloading a single protein fasta using e.g.:

efetch -db protein -format=fasta -id XP_015438716.1 > testEFETCH.fa

but I did not manage to use a file as an input for efetch (I'm wondering whether it is possible). I will appreciate any hints or help!

genbank FASTA E-utils • 5.2k views
ADD COMMENT
3
Entering edit mode
8.3 years ago
indexofire ▴ 40

$ for i in $(cat file); do efetch -db protein -format fasta -id $i >> fetch.fa; done

ADD COMMENT
0
Entering edit mode

Thanks; this helped me a lot!

ADD REPLY
0
Entering edit mode
8.3 years ago
Sej Modha 5.3k

If you save the list of accession numbers in a file and provide that file as an input of the following script, you should be able to download the sequences in fasta format a file.

while read line
    do
    {
            efetch -db protein -format fasta -id $line >> output.fa
    }
    done<$1
ADD COMMENT

Login before adding your answer.

Traffic: 1254 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6