How are short RNAs handled by paired-end RNA-Seq?
2
0
Entering edit mode
8.4 years ago
roma ▴ 120

Say we run a paired-end RNA-Seq where mates are at least 200 bases apart.

Does this mean that a 50-bases-long transcript will never appear in the result (ignoring sequencing errors)?

RNA-Seq • 2.1k views
ADD COMMENT
0
Entering edit mode

This depends on how you do your size-selection, but probably not "never".

ADD REPLY
0
Entering edit mode

If I ever see it, what should I expect to see as its mate?

ADD REPLY
0
Entering edit mode

You appreciate that there is a distribution of insert sizes so not everything is 200 bp. If you happen to have a 50 bp insert and you are doing 50 cycle sequencing then those two reads would overlap over a large extent.

ADD REPLY
0
Entering edit mode

Well, I said "at least 200 bases apart", not "exactly 200 bases apart".

ADD REPLY
0
Entering edit mode

I think you are confused about paired end sequencing. In paired-end sequencing, you just sequence the fragments from both ends. The gap between the read mates depends on the read length relative to the fragment size.

For example, if the fragment size is 150 bases, and the read length is 50 bases, the gap between the read mates will be 50 bases. If the fragment is only 25 bases long, and the read length is 50 bases, the paired reads will overlap and sequence into the adapter on the other side.

The question about having a mixture of very short and very long fragments within the same library is moot anyway, since you will not have a mixture of very short and very long fragments using the standard library preparation protocol. During the standard RNA-Seq library preparation protocol, short fragments are filtered out. During the small RNA-Seq library preparation protocol, only the small transcripts are kept.

ADD REPLY
2
Entering edit mode
8.4 years ago
ablanchetcohen ★ 1.2k

The fundamental problem is upstream of the data analysis. During the standard RNA-Seq library preparation protocol, the small RNAs are filtered out. Small RNAs are not physically present among the fragments sent for sequencing, after the standard RNA-Seq library preparation protocol. For the small RNA-Seq library preparation protocol, it's the opposite, only the small RNAs are present.

I do know of at least one paper claiming to have done an exhausted analysis of mature miRNAs based on a standard RNA-Seq protocol. Unfortunately, this conclusion was based on blindly trusting the output of a software program used to process the data without taking into account the physical realities of the libraries. Don't make the same mistake.

ADD COMMENT
0
Entering edit mode

It's worth noting that these filtering processes are not 100% effective; the "unwanted" fragments are not completely eliminated, only reduced.

ADD REPLY
2
Entering edit mode
8.4 years ago

I would caution against oversimplifying the process.

A sequencer is unlike a sausage maker where whatever you toss in on one end it has to come out on the other. It is a lot more complicated process, there are physical processes in play that require the DNA to be within a certain range and that range is not related to the number of cycles that the instrument measures (read lenght). If your sequences are too short the sequencing will not work properly in the first place.

To get back to your question when the sequences are short the mate is the reverse complement of the first read. At the same time both the read and its mate may also end up measuring the artificial sequences ligated to their ends. So only the overlapping region is complementary.

ADD COMMENT
0
Entering edit mode

A sequencer is unlike a sausage maker where whatever you toss in on one end it has to come out on the other.

Nice analogy! Now I'm hungry. Headed for the HiSeq room...

ADD REPLY

Login before adding your answer.

Traffic: 1927 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6