Hi,
I have converted a paired-end, 454 SRA file (SRR1171018.sra, Argopecten irradians) to FASTQ using fastq-dump.2.3.2
</path/to/fastq-dump/> -F --split-files </path/to/SRR1171018.sra>
This yielded SRR1171018_1.fastq
and SRR1171018_2.fastq
. Despite _2
being absolutely normal, the entirety of the _1
file looks like this:
@IE4R6ZA01CKY6V
TCAG
+IE4R6ZA01CKY6V
IIII
@IE4R6ZA01EDSKW
TCAG
+IE4R6ZA01EDSKW
IIII
@IE4R6ZA01DTY42
TCAG
I initially thought that this may be single-end but incorrectly labelled as paired-end within NCBI, but converting to a single fastq resulted in all reads beginning with TCAG.
I have converted at least 100 sra files in this way in the last 2 months and have never seen this.
- Is this just bad data?
- Could I assemble
_2
as if it were single-end to avoid losing the data?
Many thanks,
Lewis
Many thanks for this, very well explained.