What do I need to change from my looped unix code to run the perl batch download reference proteomes code from https://www.uniprot.org/help/api_downloading ? I am getting a "zsh: parse error near `done'".
I am also not sure if anything needs to be changed with the perl code or if using the TextEdit application for saving the perl code as .pl was the right approach.
I will be downloading from a list with thousands of taxids in the future. This is a small example I have tried with no success:
cat > taxids.txt
226186
345219
FILE=taxids.txt
while read line: do
perl apidownload.pl $line
done <$FILE
Not quite sure what this response means. Were there 500 results? There should only have been 1 reference. When I search Proteomes on UniProt and type 226186 or 345219, it finds the species.
I used the "Download the UniProt reference proteomes for all organisms below a given taxonomy node in compressed FASTA format" Perl example.
My output after processing the above code: Failed, got 500 Can't verify SSL peers without knowing which Certificate Authorities to trust for https://www.uniprot.org/proteomes/?query=reference:yes+taxonomy:taxonomy:226186&format=list Failed, got 500 Can't verify SSL peers without knowing which Certificate Authorities to trust for https://www.uniprot.org/proteomes/?query=reference:yes+taxonomy:taxonomy:345219&format=list
Looks like I need to install Mozilla::CA certificates on the command line: 500 Can't verify SSL peers without knowing which Certificate Authorities to trust . Are these trusted certificates?
Does the UniProt Perl example use edirect? If so it looks like I will need an API KEY, and I will need to request to download very high volumes in the future close to 5,000 and not 10. This is where API KEY is set up to gain certificates: https://ncbiinsights.ncbi.nlm.nih.gov/2017/11/02/new-api-keys-for-the-e-utilities/
Sorry yes you need to provide the full query URL. Try: perl apidownload.pl "https://www.uniprot.org/proteomes/?query=reference:yes+taxonomy:226186"
and install whatever perl modules are required.
After downloading certificates with sudo cpan Mozilla::CA the follow script was successful:
FILE=taxids.txt while read line; do mkdir ./${line} perl apidownload.pl $line > ./${line} done <$FILE
However, my output from running the pl script does not end up in the ./${line} folder . What would I need to change? Modifying the perl script like so did not do the trick:
my $OutputDir = './ARGV[1]';
for my $proteome (split(/\n/, $response_list->content)) { my $file = ./$ARGV[1] $proteome . '.fasta.gz';
How about changing the shell script like this:
Thank you! This helped me arrive at a solution.
The full query, "https://www.uniprot.org/proteomes/?query=reference:yes+taxonomy:${line}", did not work for the search and returned "Failed, got 400 Bad Request".
However, using $line as the argument was sufficient to obtain the reference proteomes and only returned "Redundant argument in sprintf", referring to the perl code.
For future reference, the following bash code can be used to create individual folders with their corresponding taxonomy name (may contain spaces) or taxid & download the reference proteome/s from UniProt directly to those folders.
For a full tutorial on how to perform this task, please visit https://github.com/kostrouc/Bioinformatics_Tutorials/