Hi
While i am running GEO datasets for differentially expressed genes with LIMMA package, I am getting duplicate genes with different probe id and p-value. Which one to consider to go for further analysis??
Thanks in advance.
Hi
While i am running GEO datasets for differentially expressed genes with LIMMA package, I am getting duplicate genes with different probe id and p-value. Which one to consider to go for further analysis??
Thanks in advance.
Which one to consider to go for further analysis?
Answer: the important ones. If you have different probe IDs for the same gene, then you have different probe sequences, and the only way to figure out what's happening is to look at the data more carefully. You'll have to figure out why you get different signals at each probe. The best way to do this is to understand how well the probe represents the gene. If you are using affymetrix array data, you can examine the probe set names (the affy ids) to see what kinds of probes they are - there is a hierarchy of things to pay attention to, which may provide an easy filtering step. The probe suffix tells you what class of probe it is. Usually all probe ids end in _at (e.g. 1769336_at
), which means they detect the antisense sequence of the transcript. These probes detect their design sequence (aka the examplar sequence) uniquely and do not cross-hybridize to other design sequences. However, there are also probe sets with a penultimate letter code: _a_at
, _s_at
, _x_at
(e.g. 1769349_s_at
). The letter denotes that the probe set exhibits different kinds of cross-hybridization. The "a" designation indicates that the probe set detects a gene family, the "s" designation denotes cross-hybridization to different design sequences that are not part of the same gene family, and the "x" designation is the messiest, with various probe sequences cross hybridizing to many different design sequences (remember that an affy probe set is a "set" of at least 11 small probes - and this "set" has a single ID that appears in your results). Here's a diagram from affymetrix that might help you.
So if you want to know why different probe IDs give different p-values, you need to understand how well the underlying sequence represents the gene you think it does. The unique probes sets are the easiest to pay attention to.
Hi,
Consider the p values as the standard criteria for finding the significant genes. In case duplicates hence you can consider the p.values
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.