I would argue that this is an important bioinformatics question, since you need to understand the source of the data in a FASTQ or BAM file before you can effectively analyze it.
I never fully understood it until I sat down to work it all out.
An ASCII example . . .
Unstranded protocol
Here's the original fragment (say, a 200bp long piece of mRNA):
# 5' ----------- 3'
Adapters are the same on both 5' and 3' sides for the unstranded protocol. Here's my notation for adapters:
# 5' adapter: ====
# 5' adapter, rev. comp: oooo
So the fragment with adapters ligated looks like this:
# adapter adapter
# 5' ====-----------oooo 3'
And here's the cDNA; you might imagine them anchored to a flow cell on the left-hand side:
# 5' ====-----------oooo 3'
# 3' oooo-----------==== 5'
Sequencing primer (SP->
) is also the 3' end of the 5' adapter. So it will sit down on the revcomped 5' adapter, "oooo
":
# 3' ----------- 5' <- sequenced read; reported in FASTQ as 5' to 3';
# reverse complement of the original
# <-SP
# 5' ====-----------oooo 3' <- was RNA from the (+) strand
# 3' oooo-----------==== 5' <- complement
# SP->
# 5'----------- 3' <- sequenced read; reported in FASTQ as 5' to 3';
# original sequence
Since the sequencing primer can start from both 5' ends, you can sequence the original sequence or its reverse complement -- no way to tell which is which.
Stranded protocol
Different adapters are used for either end in the stranded (directional) protocol:
# 5' Adapter: ====
# Complement to 5' adapter: oooo
# 3' Adapter: ++++
# Complement to 3' adapter: ::::
The key here is that you're putting a different adapter on each side:
# 5' adapter 3' adapter
# 5' ====-----------++++ 3'
cDNA:
# 5'====-----------++++3' <- was RNA from the (+) strand
# 3'oooo-----------::::5' <- complement
Sequencing primer is still the 3' end of the 5' adapter, so the only place it will sit down is on the "oooo
". And the only place this occurs is on the 3' end of the original fragment:
# 5'====-----------++++3' <- was RNA from the (+) strand
# 3'oooo-----------::::5' <- complement
# SP->
# 5' ----------- 3' <- sequenced read, reported in FASTQ as 5' to 3'.
# This sequence is the same as the original RNA
# sequence.
Since there's only one place for the sequencing primer to start, you know what strand the final read came from; the 5'-to-3' sequence reported in the FASTQ file matches the 5'-to-3' sequence of the original fragment.
I think it's one of those that falls in the grey area. Bioinformaticians would benefit from knowing the answer (I would certainly like to know!)
I think this is better asked on Seqanswers. It seems also off-topic to me as it is not about bioinformatics.
I wasn't sure it will be the right forum for that kind of question. I guess from the lack of answers either there is none or it is really the wrong forum. But as a bioinformatician who must now confront this question, I think it will be interesting to know.
BTW, I asked a similar question in www.seqanswers.com. There were no answers there as well. Is it such a complicated question?
molbiolab@molbiolab:~/Documents/Juli/Practice/cufflinks-2.2.1.Linux_x86_64$ ./cuffmerge File "./cuffmerge", line 95 except getopt.error, msg: ^ SyntaxError: invalid syntax
julianasrinjuli : Please don't post randomly in pre-existing unrelated threads.
If you have a question, first search biostars using google. If you are not able to find a pre-existing thread that answers your question then create a new thread/post. Provide details about what you are trying to do along with command lines, error messages and software version information. Just posting a random error message is not going to get you anywhere.