How to download multiple fasta files from NCBI in linux command line?
2
1
Entering edit mode
3.8 years ago

Hi, I have a text file that contains a list of accession numbers for multiple nucleotide sequences like below:

NM_001354644.1
NM_001354643.1
NM_007288.3
NM_001300741.2

I want to use this list containing text file as input and download all fasta file altogether by using the Linux command line. The downloaded files need to be separate files (not in a single multifasta file).

How I can accomplish this??? Thanks in advance.

fasta linux command line NCBI • 5.4k views
ADD COMMENT
4
Entering edit mode
3.8 years ago
GenoMax 147k

Using Entrezdirect:

$ more id
NM_001354644.1
NM_001354643.1
NM_007288.3
NM_001300741.2

Option 1:

$ epost -db nuccore -input id -format acc | efetch -format fasta > seq.fa

NOTE: You can split multi-fasta output file (seq.fa) into individual files using faSplit utility from Jim Kent using directions here: C: How to split fasta by '>' into a file each containing one sequence, and have the

Option 2:

If you don't want to split the large file you can download as individual files using following method:

$ for i in `cat id`; do efetch -db nuccore -id ${i} -format fasta > ${i}.fa ; done

Just to show the fasta headers of files recovered:

$ epost -db nuccore -input id -format acc | efetch -format fasta | grep ">"
>NM_001300741.2 Homo sapiens nudix hydrolase 12 (NUDT12), transcript variant 2, mRNA
>NM_001354644.1 Homo sapiens membrane metalloendopeptidase (MME), transcript variant 5, mRNA
>NM_001354643.1 Homo sapiens membrane metalloendopeptidase (MME), transcript variant 4, mRNA
>NM_007288.3 Homo sapiens membrane metalloendopeptidase (MME), transcript variant 2a, mRNA
ADD COMMENT
0
Entering edit mode

Thank you for detailed explanation.

ADD REPLY
0
Entering edit mode
3.8 years ago
Mensur Dlakic ★ 28k

Did it ever occur you to do the same thing as already suggested in one of your previously answered posts? This is essentially the same problem except that you want the files saved individually. It always heartens me to see when posters learn something from previous posts and try to apply it to new problems.

Absent that, this will do the trick (assuming your ID numbers are saved in a file named ids:

cat ids | xargs -i sh -c "esearch -db nuccore -query {} | efetch -format fasta > {}.fna"
ADD COMMENT
0
Entering edit mode

Thank you very much. It was really helpful.

ADD REPLY

Login before adding your answer.

Traffic: 2519 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6