Question

Paired-end 454 data - forward run consisting only of TCAG

1

Entering edit mode

10.3 years ago

lewis.stevens07 ▴ 80

Hi,

I have converted a paired-end, 454 SRA file (SRR1171018.sra, Argopecten irradians) to FASTQ using fastq-dump.2.3.2

</path/to/fastq-dump/> -F --split-files </path/to/SRR1171018.sra>

This yielded SRR1171018_1.fastq and SRR1171018_2.fastq. Despite _2 being absolutely normal, the entirety of the _1 file looks like this:

@IE4R6ZA01CKY6V
TCAG
+IE4R6ZA01CKY6V
IIII
@IE4R6ZA01EDSKW
TCAG
+IE4R6ZA01EDSKW
IIII
@IE4R6ZA01DTY42
TCAG

I initially thought that this may be single-end but incorrectly labelled as paired-end within NCBI, but converting to a single fastq resulted in all reads beginning with TCAG.

I have converted at least 100 sra files in this way in the last 2 months and have never seen this.

Is this just bad data?
Could I assemble _2 as if it were single-end to avoid losing the data?

Many thanks,

Lewis

sequence software-error RNA-Seq • 3.8k views

ADD COMMENT • link updated 3.0 years ago by Ram 44k • written 10.3 years ago by lewis.stevens07 ▴ 80

Ram · Answer 1 · 2014-08-20

2

Entering edit mode

10.3 years ago

kmcarr00 ▴ 290

TCAG is the "key" sequence at the beginning of every 454 library molecule (they did change the sequence to distinguish FLX from FLX Titanium). The sequencer uses this key to one, identify library beads as opposed to control beads which have different key sequence and two, calibrate the signal intensity for single base incorporation. When the original 454 SFF file was uploaded to SRA the submitter properly identified the first 4 bases as a "technical" read and the remaining bases as the a sequence read. fastq-dump with the --split-files option is correctly separating the key (technical) read from the sequence read. Discard the first (_1) file, it is meaningless and proceed with just the sequence (_2) file.

ADD COMMENT • link updated 3.0 years ago by Ram 44k • written 10.3 years ago by kmcarr00 ▴ 290

0

Entering edit mode

Many thanks for this, very well explained.

ADD REPLY • link updated 3.0 years ago by Ram 44k • written 10.3 years ago by lewis.stevens07 ▴ 80

Ram · Answer 2 · 2014-08-06

0

Entering edit mode

10.3 years ago

Istvan Albert 102k

My best guess is that your file contains the barcodes for the run (although these seem shorter than usual).

Often these are included to be able to identify which multiplexed sample was it in a multisample run.

But if that is true then the description of paired end run is incorrect (does the 454 even offer paired end sequencing?, I never heard of that before).

ADD COMMENT • link updated 3.0 years ago by Ram 44k • written 10.3 years ago by Istvan Albert 102k

0

Entering edit mode

I have had a sequencing rep offer 'paired-end' 454 before. It's not true paired end where the two directions can be exactly connected, but it's the simultaneous sequencing of both strands of a double stranded sequence so from 1000 reads you get 500 forwards and 500 reverses from 500 DNA sequences.

ADD REPLY • link 10.3 years ago by Daniel ★ 4.0k

0

Entering edit mode

Thanks for the reply, that would make a lot more sense. Yeah, this is admittedly the only 454 data I have used that is labelled as 'paired-end'.

ADD REPLY • link updated 3.0 years ago by Ram 44k • written 10.3 years ago by lewis.stevens07 ▴ 80

Ram · Answer 3 · 2014-08-20

0

Entering edit mode

10.3 years ago

lexnederbragt ★ 1.3k

You need to dump the sra file to a single fastq (without trying to split it) and then split into pairs on the paired end linker from 454. See this thread for background and pointers.

ADD COMMENT • link updated 3.0 years ago by Ram 44k • written 10.3 years ago by lexnederbragt ★ 1.3k

0

Entering edit mode

The fact that one needs to scour websites (then getting conflicting information) when trying to figure out something as simple as how to get raw data from a supposedly public data repository is nothing short of mindboggling ...

ADD REPLY • link updated 3.0 years ago by Ram 44k • written 10.3 years ago by Istvan Albert 102k

0

Entering edit mode

It doesn't help that 454/Roche prefer to write as little documentation as possible while avoiding trying to fit into "standard" approaches.

ADD REPLY • link 10.3 years ago by pld 5.1k