Say we run a paired-end RNA-Seq where mates are at least 200 bases apart.
Does this mean that a 50-bases-long transcript will never appear in the result (ignoring sequencing errors)?
Say we run a paired-end RNA-Seq where mates are at least 200 bases apart.
Does this mean that a 50-bases-long transcript will never appear in the result (ignoring sequencing errors)?
The fundamental problem is upstream of the data analysis. During the standard RNA-Seq library preparation protocol, the small RNAs are filtered out. Small RNAs are not physically present among the fragments sent for sequencing, after the standard RNA-Seq library preparation protocol. For the small RNA-Seq library preparation protocol, it's the opposite, only the small RNAs are present.
I do know of at least one paper claiming to have done an exhausted analysis of mature miRNAs based on a standard RNA-Seq protocol. Unfortunately, this conclusion was based on blindly trusting the output of a software program used to process the data without taking into account the physical realities of the libraries. Don't make the same mistake.
I would caution against oversimplifying the process.
A sequencer is unlike a sausage maker where whatever you toss in on one end it has to come out on the other. It is a lot more complicated process, there are physical processes in play that require the DNA to be within a certain range and that range is not related to the number of cycles that the instrument measures (read lenght). If your sequences are too short the sequencing will not work properly in the first place.
To get back to your question when the sequences are short the mate is the reverse complement of the first read. At the same time both the read and its mate may also end up measuring the artificial sequences ligated to their ends. So only the overlapping region is complementary.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
This depends on how you do your size-selection, but probably not "never".
If I ever see it, what should I expect to see as its mate?
You appreciate that there is aIf you happen to have a 50 bp insert and you are doing 50 cycle sequencing then those two reads would overlap over a large extent.distribution of insert sizes
so not everything is 200 bp.Well, I said "at least 200 bases apart", not "exactly 200 bases apart".
I think you are confused about paired end sequencing. In paired-end sequencing, you just sequence the fragments from both ends. The gap between the read mates depends on the read length relative to the fragment size.
For example, if the fragment size is 150 bases, and the read length is 50 bases, the gap between the read mates will be 50 bases. If the fragment is only 25 bases long, and the read length is 50 bases, the paired reads will overlap and sequence into the adapter on the other side.
The question about having a mixture of very short and very long fragments within the same library is moot anyway, since you will not have a mixture of very short and very long fragments using the standard library preparation protocol. During the standard RNA-Seq library preparation protocol, short fragments are filtered out. During the small RNA-Seq library preparation protocol, only the small transcripts are kept.