Hello everyone,
My input was 633.679 transcripts (from Trinity.fasta) made from 10 samples, no groups defined. I run corset with default parameters and got 51.556 clusters. What bothered me is that multiple transcripts were assigned to the same cluster-ID so in the end I have 46.698 unique cluster IDs. Is that normal ? I need to use this for DGE analysis.
I read about superTranscripts and that I should run corset with the -D parameter set high and that this might resolve this issue. In the example, it is set to 99999999999. https://github.com/Oshlack/Corset/wiki/Example
I tried that but it has been running for almost 2 weeks now. How do I decide about the value for -D parameter for my data? I know the sample size define default value but there is no more info in the user manual.
Tnx,
Lada
I think you should first read this post to interpret better Corset results.
tnx for your reply. Yes, I am aware of the high transcript number. Currently, I am testing my data with different Trinity parameters so this is not my only assembly for this dataset. I am new to transcriptomic so still learning but I guess, clustering could be a good step after de novo assembly. My question was more related to the -D parameter in Coreset for which I can't find much information so I was wondering if anyone used that and how to decide on the parameter value.