In paired-end RNA-seq analysis, some tools require the insert size (also called inner distance) as the parameter to run (such as MATS's -r
option). And I also know that some tools can be used to estimate the insert size, such as CollectInsertSizeMetrics in the picard or inner_distance.py in the RSeQC.
I used both the two tools for my 2*100bp RNA-seq data, however, the results are different, as shown Figure1 in the bellow. It seems the distributions are same, but the values of average insert size are different. It seems that it is because, the CollectInsertSizeMetrics calculates the length of template (the RNA fragment), in contrast, the inner_distance.py calculates the length of template minuse the length of two reads (as shown in Figure2 below, source from RSeQC website).
Is there anyone know the defination of insert size in RNA-seq? It should inculde the read length (the method of CollectInsertSizeMetrics), or not (the method of inner_distance.py)?
The mean value of CollectInsertSizeMetrics is 175.825835bp
The mean value of inner_distance.py is -38.1662853494061
(I don't know how to upload a figure into the biostars here.)
Figure1 the mean value and distribution of insert size for my paired-end RNAseq analysis
Figure2 the insert size that inner_distance.py calculates, source for RSeQC website.
So the inner_distance.py in the RSeQC is wrong?
It's terminology. Inner distance distance refers to the gap between reads. The label in your figure is bad.
I think you are right. And the link you gave is very useful. Thank you very much.