I have downloaded reads from the SRA with the script below in order to process them through the meta barcoding analysis pipeline, DADA2. Unfortunately, I end up with the following error when processing them through the pipeline:
Error: BiocParallel errors 0 remote errors, element index: 156 unevaluated and other errors first remote error: Execution halted
I know some of the reads are not problematic, as I can at least process three of them without error. But a larger number does not work, and there is apparently no way to distinguish the reads that work from the other. The fastq files have all the expected size, and do not seem to be corrupted.
Did this problem happen to anyone? Is there anything I should know about SRA data?
# Perform the search and retrieve metadata
esearch -db sra -query "PRJNA997374[All Fields] AND rbcl[Title]" | efetch -format docsum > sra_results.xml
# Extract SRA accession numbers from the XML output
grep '<Sample acc=' sra_results.xml | sed 's/.*acc="\([^"]*\)".*/\1/' > list_sra.txt
# Get the data in .sra format:
prefetch *.sra
# Specify the file containing SRA accession numbers
input_file="list_sra.txt"
# Loop through each accession number in the input file
while IFS= read -r accession_number
do
# Run prefetch to download the SRA data
prefetch "$accession_number"
done < "$input_file"
# Import files in working directory
ls -d SRR* > directories.txt
while read f;
do
cp "$f"/*.sra ./ ;
done < directories.txt
# convert all the files in frw and rev fasta formats:
fastq-dump --split-files *.sra
One cannot help you with this. The error is from R, yet you show not a single line of R code.
Thank you ATpoint, I did already investigate this error on the DADA2 github. It seems that the "BiocParallel" error can be due to multiple things, and as I have not modified the code of the pipeline, maybe it is not relevant to show it here. However, as explained in my post, I am confident that some of the reads are the cause of the issue. It is why I show the script used to import the reads. I thought that the error might be certainly here.
This
fastq-dump --split-files *.sra
gives you fastq files. I am not sure how, with the given information, one might debug your problem. As a lowlevel validation you can run fastqc on the data and see whether this throws any errors. If not then the files are probably not corrupted.Use
vdb-validate
included insratoolkit
to check your*.sra
files for integrity.