Paired End FASTQ data; separate files for reverse forward strand reads?
1
0
Entering edit mode
8.4 years ago

I am designing some bioinformatics software, but have little working experience with FASTQ data.

The data I wish to compute over is paired end data. From which I understand consists of "mate sequences", namely left and right mates, which correspond to sequencing the same region of the genome, in the reverse and forward orientation.

My question is about how this data is returned back to the user after sequencing. Is the researcher given separate files containing only forward or reverse orientation sequences? Or is the data mixed together.

This basically comes down to how I process data in the software. If it is the case that separate orientations are given separate files, then the I can allow the user to specify the orientation at the command line; otherwise, I will have to read every sequence id to determine the orientation.

Kind regards, Izaak

fastq paired-end • 7.3k views
ADD COMMENT
1
Entering edit mode

Before you start reading too much into "reverse" and "forward", note that the pairs are just sequencing different ends of the same original fragment. Which of the two will end up being "forward" after alignment is essentially random and can't be determined from read IDs.

ADD REPLY
0
Entering edit mode

Yeah, I've read around that there is no real concept of which is forward or reverse, it was just easier to express ;) Also, out of interest, I've seen that, often, one of the mate pairs is sequenced, then the next. However, are they also sometimes sequenced in parallel, but with multiplex capable primers? Or is multiplexing mainly used to differentiate samples?

ADD REPLY
0
Entering edit mode

In Illumina technology only one read happens at a time. Order is generally [Read 1 --> Index 1 (if present) --> Index 2 (if present) --> Read 2]. Multiplex is only used to differentiate samples.

ADD REPLY
0
Entering edit mode

Keep in mind that sequence is always present in 5'-->3' orientation no matter if it is the forward or reverse read. In case of Illumina there is a convention followed which indicates if the data is from forward or reverse read (rather first and second may be more appropriate to think about it). That information is encoded in the fastq header.

ADD REPLY
1
Entering edit mode
8.4 years ago
Medhat 9.8k

they could come in separate or same file

if it is in two files they will contain for example: first file forward reads and the other will contain the reverse

or same file and the will be differentiated

general info here

ADD COMMENT
0
Entering edit mode

Right, so it is not safe to give the user the option of saying reverse or forward, I have to check each sequence. And therefore, the Seq_id can be used to determine the orientation?

ADD REPLY
0
Entering edit mode

I think it should be some parameter for example 1 for forward file 2 for reverse and you can check the number and/or names in each file if they are different you can raise error for example:

SomeSoftware -option_do_someting 1 forward.fastq 2 reverse.fastq

real example for alignment

bowtie -S -t hg18_combined.fa.bowtie -1 Pair1.fastq -2 Pair2.fastq bowpeout.sam

ADD REPLY

Login before adding your answer.

Traffic: 2875 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6