Entering edit mode
6.7 years ago
bioinfo456
▴
150
Hi all,
While running DESeq2 I encountered the following :-
-- note: fitType='parametric', but the dispersion trend was not well captured by the function: y = a/x + b, and a local regression fit was automatically substituted. specify fitType='local' or 'mean' to avoid this message next time.
Can somebody please explain this note? Thanks.
You may have low sample numbers or it could be that many of your transcripts have 0 counts. Can you confirm? Did you do any pre-filtering?
I have 25 diseased-normal sample pairs and ya i have approx 1k out of 20k genes having 0 counts. I did not do any pre-filtering. How serious is the impact on the results?
'Quite' serious. You should definitely remove genes that only have zeros (i.e. 100% zeros), and then remove other genes that have a high proportion of zeros (i.e. >50% zeros).
You can also do the logic another way by removing all genes whose mean raw count across all samples is <-10.
Thanks for your insight. I noticed the result section where it said "19375 out of 20530 non zero genes/ variables" (something like this). As for my understanding the package has automatically eliminated genes that only have zeros. Please correct me if I'm wrong. Regarding high proportion of zeros, the design of my experiment is such that I have normal and cancerous reading of the same sample adjacent to each other and I've used 25 such samples. Now, what if for a particular gene, the normal count of it is 0 and its corresponding cancerous count is some positive number for all samples. Won't it be eliminated in spite of it being quite significant?
Yes, that is a 'flaw' in the current way that we conduct differential expression analysis. Microarrays overcome this, to some extent. You may want to consider less harsh thresholds than those that I mentioned.
I presume that you are interested in some antisense transcript or non-coding RNA?
mRNA. I have eliminated genes whose count is less than 50. It seems to have a very minor impact on the resulting number of genes (ie; + or - 10 genes).
That is raw counts, right?; and, after that, you re-normalise.
The algorithms can handle zeros. In your case, though, you also have a low sample number.
The count is RSEM normalised. I rounded them off before inputting it to DESeq2. How many minimum samples should one have?
No set number, but the groups that you're comparing should also be balanced. For example, 20 Vs. 20 is better than 20 Vs. 5.
Oh, alright. Mine is 25 vs 25.