Question

the defination of insert size in RNA-seq, including the length of read, or not?

1

Entering edit mode

9.4 years ago

zju.whw ▴ 70

In paired-end RNA-seq analysis, some tools require the insert size (also called inner distance) as the parameter to run (such as MATS's -r option). And I also know that some tools can be used to estimate the insert size, such as CollectInsertSizeMetrics in the picard or inner_distance.py in the RSeQC.

I used both the two tools for my 2*100bp RNA-seq data, however, the results are different, as shown Figure1 in the bellow. It seems the distributions are same, but the values of average insert size are different. It seems that it is because, the CollectInsertSizeMetrics calculates the length of template (the RNA fragment), in contrast, the inner_distance.py calculates the length of template minuse the length of two reads (as shown in Figure2 below, source from RSeQC website).

Is there anyone know the defination of insert size in RNA-seq? It should inculde the read length (the method of CollectInsertSizeMetrics), or not (the method of inner_distance.py)?

The mean value of CollectInsertSizeMetrics is 175.825835bp

The mean value of inner_distance.py is -38.1662853494061

(I don't know how to upload a figure into the biostars here.)

Figure1 the mean value and distribution of insert size for my paired-end RNAseq analysis

Figure2 the insert size that inner_distance.py calculates, source for RSeQC website.

RNA-Seq insert-size inner-distance • 5.6k views

ADD COMMENT • link updated 23 months ago by Ram 44k • written 9.4 years ago by zju.whw ▴ 70

Ram · Accepted Answer · 2015-06-29

4

Entering edit mode

9.4 years ago

thackl ★ 3.0k

Insert size includes read length

---------->           <----------
|_______   insert size __________|

"Remember that "insert" refers to the DNA fragment between the adaptors, and not the gap between R1 and R2." (http://thegenomefactory.blogspot.de/2013/08/paired-end-read-confusion-library.html)

ADD COMMENT • link updated 23 months ago by Ram 44k • written 9.4 years ago by thackl ★ 3.0k

0

Entering edit mode

So the inner_distance.py in the RSeQC is wrong?

ADD REPLY • link 9.4 years ago by zju.whw ▴ 70

1

Entering edit mode

It's terminology. Inner distance distance refers to the gap between reads. The label in your figure is bad.

---------->         <----------
|_______   insert size ________|
           |__________|
           inner distance

ADD REPLY • link updated 23 months ago by Ram 44k • written 9.4 years ago by thackl ★ 3.0k

0

Entering edit mode

I think you are right. And the link you gave is very useful. Thank you very much.

ADD REPLY • link 9.4 years ago by zju.whw ▴ 70