FASTQ-dump: --split-files: Rejected 5 READS because READLEN < 1
1
0
Entering edit mode
2.1 years ago

Hi everyone,

I have to perform bulk RNA sequencing (I am new to this). I want to run this: fastq-dump --split-files -X 5 SRR14933197 -Z (which is supposed to give me the first 5 spots/reads). The layout is single (reads). I do get an output, but I also get "Rejected 5 READS because READLEN < 1". I don't really know how to interpret this. From colleagues I heard that --split-files is are only supposed to be used when having paired-end reads, however, my professor used the same line of code to run his SRR which also was a single-read. So I don't really understand why I have this rejection. Can anyone help me?

Thanks in advance!

reads single-end • 2.3k views
ADD COMMENT
2
Entering edit mode

Ditch the terrible SRA-toolkit, enter that SRR ID over at https://sra-explorer.info/ and get a download link for fastq file directly.

ADD REPLY
0
Entering edit mode

But it's actually for an assignment, and we are supposed to run it via jupyternotebooks using that line of code fastq-dump --split-files -X 5 SRR14933197 -Z. :/ Does someone know the answer to my problem? If you need specifications do ask please.

ADD REPLY
3
Entering edit mode
2.1 years ago
GenoMax 147k

In this case the dataset is single end so treating it as such does not generate any errors.

$ fastq-dump -X 5  SRR14933197
Read 5 spots for SRR14933197
Written 5 spots for SRR14933197

But if you were to add the --split-files option then you are getting that spurious error. You can check that by adding an additional option -M 0 (which should keep all reads irrespective of length) which removes that error but generates an additional sequence file (_2 with no sequence). It is this second set of sequences (with 0 length) that are generating the error you see.

$ fastq-dump --split-files -X 5  -M 0 SRR14933197
Read 5 spots for SRR14933197
Written 5 spots for SRR14933197

$ more SRR14933197_2.fastq
@SRR14933197.1 1 length=0

+SRR14933197.1 1 length=0

@SRR14933197.2 2 length=0

+SRR14933197.2 2 length=0

The same 5 reads appear to be dumped out in all methods _1.fastq. So you should be good to use that file.

This behavior may be due to fact that this run seems to show a 0 bp second read in SRA. Which may be a submission error.

read_2

ADD COMMENT
0
Entering edit mode

Thank you GenoMax for the elaborated answer!

ADD REPLY
1
Entering edit mode

You can accept the answer (green check mark) to provide closure to this thread.

ADD REPLY

Login before adding your answer.

Traffic: 1604 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6