Entering edit mode
9.6 years ago
The Last Word
▴
230
I have EST as well as EST cluster data for the prediction of miRNAs in a particular genome. The EST sequences roughly come up to 170 MB while the EST cluster file is just 6 MB. I would normally use the cluster file but this big a difference in the size of sequences honestly is surprising to me. I have got a methodology planned out so could you please suggest as to which file I should use for the miRNA analysis.
Thanks in advance.
Not sure how we can answer your question since we have no real way of evaluating the quality of your data or the impact it will have on your downstream analysis. Practically, why not try things with each individually and compare results?
@Sean Davis Does it make sense of such a big difference in sequence size between the EST and the EST clusters? Btw, this is the link to the data I would be using
It might make sense to have such a big difference if your EST library is pretty redundant. Have you examined the differences between the EST and EST clusters? That would be the place to start. I wouldn't spend too much time thinking about file sizes.