Hi all,
I have a dataset which contains 30 samples. The read length for these 30 samples vary slightly (For example: some samples are 75bp while others are 76bp)
. While the most ideal situation is to generate two separate indices, with the respective Sjdboverhang
set as 74 and 75
. I decided to take on a more convenient approach which was to generate a single STAR matrix, because my assumption is that the one base pair variation between my samples are negligible.
But silly me, I made a mistake by specifying the Sjdboverhang
as 77
base pairs, which is neither ideal for any of my samples, although it is not far off. I would like to ask whether the mapping of my dataset with this "suboptimal" STAR index would affect subsequent gene-level quantification significantly, and if it is really worth it to regenerate my STAR index and remap everything again. I see from the STAR documentation that Sjdboverhang 100
works just as well as the ideal value. So my guess is that 77 may be fine?
I would think so. As long as all samples are aligned using the same index.
Thanks for the reply. But let's say I map the 15 samples with STAR index overhang specified to 74 and the other 15 samples set as 75. Are my samples comparable amongst each other, given that batch effect is controlled for here.
For those interested, I found a relatable GitHub issue on this https://github.com/alexdobin/STAR/issues/931
I guess strictly speaking, using ideal value for sjdboverhang for each cohort is best. But sticking to one common index itself should not change the results by a lot.