I downloaded ~70 SRA files for RNA-seq data and am attempting to align them. I used Picard's CollectInsertSizeMetrics to calculate insert size for two random files in this set (following this protocol: http://vinaykmittal.blogspot.com/2012/02/how-to-estimate-insert-size-for-paired.html).
However, the insert sizes from the two random files are vastly different 360, vs 180 (both with std dev around 60). The libraries were supposedly all prepared in the same way. Should I calculate insert size for each file individually to set the inner distance parameter for tophat? Or is there some flexibility in this parameter that tophat has built in?
Any help would be appreciated! Thanks!
I am sure there is some flexibility for this parameter, but it is probably best to work it out yourself for every data set. Can derive the inner distance with informatics.
Automatically feeding a good mate inner distance to TopHat2