Question

Batch download of protein FASTAs from GenBank using a list (>100) of protein accessions

1

Entering edit mode

8.2 years ago

al-ash ▴ 210

Hi!

(I'm comletely new to unix and after being amazed by the capabilities of tools such as sed, awk and grep in the couple of previous days I'm now slowly trying to do something useful for my work:)

I'm trying to retrieve multiple protein FASTAs from GenBank using a list of protein accessions (such as "XP_015438716.1" which I have in a file, one accession per line; several tens to two hundreds accessions in total per file for which I would like to download the protein FASTAs) and save the FASTAs into one file. I would like to do this not via web (e.g. http://www.ncbi.nlm.nih.gov/sites/batchentrez) but via a command line (using bash commands or unix utilities) as I'd like to build this step into a pipeline which I try to construct.

I played with E-utils and particularly with efetch, which works fine for downloading a single protein fasta using e.g.:

efetch -db protein -format=fasta -id XP_015438716.1 > testEFETCH.fa

but I did not manage to use a file as an input for efetch (I'm wondering whether it is possible). I will appreciate any hints or help!

genbank FASTA E-utils • 5.2k views

ADD COMMENT • link updated 8.2 years ago by indexofire ▴ 40 • written 8.2 years ago by al-ash ▴ 210

score 3 · Answer 1 · 2016-08-31

3

Entering edit mode

8.2 years ago

indexofire ▴ 40

$ for i in $(cat file); do efetch -db protein -format fasta -id $i >> fetch.fa; done

ADD COMMENT • link 8.2 years ago by indexofire ▴ 40

0

Entering edit mode

Thanks; this helped me a lot!

ADD REPLY • link 8.2 years ago by al-ash ▴ 210

score 0 · Answer 2 · 2016-08-31

0

Entering edit mode

8.2 years ago

Sej Modha 5.3k

If you save the list of accession numbers in a file and provide that file as an input of the following script, you should be able to download the sequences in fasta format a file.

while read line
    do
    {
            efetch -db protein -format fasta -id $line >> output.fa
    }
    done<$1

ADD COMMENT • link 8.2 years ago by Sej Modha 5.3k