Suppose I have a level-3 expression matrix from TCGA (normalized), how to filter out absent genes before further processing like gene coexpression analysis? Currently what I do is:
- Set a lower threshold, like 5% quantile of all the expression level.
- Test whether a gene have enough portion of samples that pass this threshold. Like if 20% of samples have expression level of this gene larger than the threshold, then it can be considered as present.
I think my current method works to some extent. But is it robust enough?
Hi Cyriac,
When you say 3 reads at the end, do you mean 'paired-end read' as you defined in the beginning ? Wouldn't it be 3 fragments if its so ?
Thanks. What if I use microarray data?
I only have rough experience with gene expression arrays, so I can't recommend any cutoffs. But hybridization bias and the noisiness of probe intensities, make it hard to reach the kind of determinism we get from RNA-seq - of whether a fragment of a specific mRNA was transcribed or not.
Hi
Your topic is related to my project I am working with TCGA colorectal gene expression
Could you help me about Create a Tissue Model for Convert the Affymetrix data to a format that can be used by the "createTissueSpecific()" Cobra Toolbox function we use biocLite("affy") but my data is
unc_agilentg4502a_07
and it is ADF format how can convert it to present, absent AP.txt and EID.txt is value of gene expression. Which cutoff is best for it?Your question is not related to this thread. Please open a new one... and use tags to notify appropriate watchers.