Problem with data downloaded from Short Reads Archive (SRA)
1
0
Entering edit mode
6 months ago

I have downloaded reads from the SRA with the script below in order to process them through the meta barcoding analysis pipeline, DADA2. Unfortunately, I end up with the following error when processing them through the pipeline:

Error: BiocParallel errors 0 remote errors, element index: 156 unevaluated and other errors first remote error: Execution halted

I know some of the reads are not problematic, as I can at least process three of them without error. But a larger number does not work, and there is apparently no way to distinguish the reads that work from the other. The fastq files have all the expected size, and do not seem to be corrupted.

Did this problem happen to anyone? Is there anything I should know about SRA data?

# Perform the search and retrieve metadata
esearch -db sra -query "PRJNA997374[All Fields] AND rbcl[Title]" | efetch -format docsum > sra_results.xml

# Extract SRA accession numbers from the XML output
grep '<Sample acc=' sra_results.xml | sed 's/.*acc="\([^"]*\)".*/\1/' > list_sra.txt

# Get the data in .sra format:
prefetch *.sra

# Specify the file containing SRA accession numbers
input_file="list_sra.txt"

# Loop through each accession number in the input file
while IFS= read -r accession_number
do
    # Run prefetch to download the SRA data
    prefetch "$accession_number"
done < "$input_file"

# Import files in working directory
ls -d SRR* > directories.txt

while read f;
do 
    cp "$f"/*.sra ./ ;
    done < directories.txt 

# convert all the files in frw and rev fasta formats:
fastq-dump --split-files *.sra
SRA DADA2 metabarcoding • 625 views
ADD COMMENT
0
Entering edit mode

One cannot help you with this. The error is from R, yet you show not a single line of R code.

ADD REPLY
0
Entering edit mode

Thank you ATpoint, I did already investigate this error on the DADA2 github. It seems that the "BiocParallel" error can be due to multiple things, and as I have not modified the code of the pipeline, maybe it is not relevant to show it here. However, as explained in my post, I am confident that some of the reads are the cause of the issue. It is why I show the script used to import the reads. I thought that the error might be certainly here.

ADD REPLY
2
Entering edit mode

This fastq-dump --split-files *.sra gives you fastq files. I am not sure how, with the given information, one might debug your problem. As a lowlevel validation you can run fastqc on the data and see whether this throws any errors. If not then the files are probably not corrupted.

ADD REPLY
2
Entering edit mode

Use vdb-validate included in sratoolkit to check your *.sra files for integrity.

ADD REPLY
1
Entering edit mode
6 months ago

Prefetch files using sratoolkit prefetch command and then use fasterq-dump. There you can see the number of reads. For validation, no tool exists other than vdb-validate.

ADD COMMENT

Login before adding your answer.

Traffic: 2397 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6