Hi, we have RNA-Seq data from Illumina HiSeq 2000. Based on Illumina RNA-seq library protocol that we follow, the range of insert length is 120-210 bp, median insert length: 155 bp.
After doing an alignment with tophat, I got the stats on the insert based on the resulting BAM file.Based on the result, it seems that the median insert size is larger than 155bp. For tumor, the median insert size is 188bp, and for control sample, it is 280bp. The detailed results are as follows,
For tumor sample,
MEDIAN_INSERT_SIZE MIN_INSERT_SIZE MAX_INSERT_SIZE MEAN_INSERT_SIZE STANDARD_DEVIATION READ_PAIRS PAIR_ORIENTATION WIDTH_OF_10_PERCENT WIDTH_OF_20_PERCENT WIDTH_OF_30_PERCENT WIDTH_OF_40_PERCENT WIDTH_OF_50_PERCENT WIDTH_OF_60_PERCENT WIDTH_OF_70_PERCENT WIDTH_OF_80_PERCENT WIDTH_OF_90_PERCENT WIDTH_OF_99_PERCENT
188 75 227724412 721.727402 1451.554057 21582855 FR 25 49 71 91 113 143 253 1269 4501 41471
For control sample,
MEDIAN_INSERT_SIZE MIN_INSERT_SIZE MAX_INSERT_SIZE MEAN_INSERT_SIZE STANDARD_DEVIATION READ_PAIRS PAIR_ORIENTATION WIDTH_OF_10_PERCENT WIDTH_OF_20_PERCENT WIDTH_OF_30_PERCENT WIDTH_OF_40_PERCENT WIDTH_OF_50_PERCENT WIDTH_OF_60_PERCENT WIDTH_OF_70_PERCENT WIDTH_OF_80_PERCENT WIDTH_OF_90_PERCENT WIDTH_OF_99_PERCENT
280 74 242555584 394.905075 468.383792 50410660 FR 21 45 79 131 193 245 287 339 1491 14367
3762 74 242542706 3676.159079 492.235063 1560925 RF 11 19 27 41 65 79 101 6921 7299 20051069
So I wondered which insert size value I should use? And do I need to run TopHat again based on the insert size value from Picard? Thank you in advance.
Hi Arun, do you have any reference or paper for above method? Thanks.
Sorry, I just noticed the comment. Yes. Its used in BWA.