Question

Downloading PE reads from SRA but reads are not split

0

Entering edit mode

3.3 years ago

Michael 55k

I am trying to download some Placozoan RNA-seq reads to assemble them from SRA(SRR8193747 SRR8193748 SRR8193749) using fastq-dump:

 nohup fastq-dump --split-3 SRR8193748 SRR8193747 SRR8193749 &

According to SRA the reads are paired. Therefore I am expecting to get *_1.fastq and *_2.fastq files (and maybe some unpaired as well) like always. However, in this case I am getting only one file per run: SRR8193747.fastq SRR8193748.fastq SRR8193749.fastq.

The output from fastq-dump says:

Rejected 5037041 READS because of filtering out non-biological READS
Read 5037041 spots for SRR8193748
Written 5037041 spots for SRR8193748
Rejected 5157059 READS because of filtering out non-biological READS
Read 5157059 spots for SRR8193747
Written 5157059 spots for SRR8193747
Read 8889366 spots for SRR8193749
Written 8889366 spots for SRR8193749

Is there something wrong with the SRA files or with my setup?

sra fastq-dump rna-seq • 1.4k views

ADD COMMENT • link updated 3.3 years ago by lieven.sterck 15k • written 3.3 years ago by Michael 55k

score 0 · Answer 1 · 2021-09-27

0

Entering edit mode

3.3 years ago

lieven.sterck 15k

Weird thing indeed. When checking the metadata and such for one of them it seems the length of read2 is 0 ??? (which could explain the behavior you're seeing but not why it's like that ) https://trace.ncbi.nlm.nih.gov/Traces/sra/?run=SRR8193748

One thing I could think of is that they actually had single end data but submitted it as paired end ?

ADD COMMENT • link 3.3 years ago by lieven.sterck 15k

0

Entering edit mode

Yes, that may be it. I found the publication and it says somewhere in the methods: ... for the haplotype H4 RNA libraries 32 – 37 million single 150 bp reads were obtained. I guess that means single end.