Entering edit mode
23 months ago
1769mkc
★
1.2k
For downstream analysis I trying to use methylation 450k data The M value, So the data is something like this
dput(methyl[1:10,1:3])
structure(list(Symbol = c("A4GALT", "A4GALT", "A4GALT", "A4GALT",
"A4GALT", "A4GALT", "A4GALT", "A4GALT", "A4GALT", "A4GALT"),
`TCGA-AB-2856` = c(-0.69571999396859, 6.59452651543373, -3.31241269267196,
-2.27831006586008, -6.67087214612625, -4.07354075597074,
-6.72587345772808, -3.99270962745257, 6.30759056557904, 4.35275426216806
), `TCGA-AB-2849` = c(-2.1258506029936, 6.29288805154616,
-0.989789351415863, -1.87695599373517, -6.29710435957612,
-0.552953206195101, -6.39859496846795, 1.81668629311401,
6.2635495345495, 3.91415116195022)), row.names = c(NA, -10L
), class = c("data.table", "data.frame"), .internal.selfref = <pointer: 0x000002708c1367e0>)
Here we can see for single gene there are multiple probes how do i filter or merge?
Is it correct to merge them or average them?
Each probe should correspond to one CpG, I do not think it would be a good idea to merge them as it stands because they are located in different genomic regions. Maybe if you find an appropriate annotation you can average them over a same CpG island or regulatory region but the gold standard in methylation array analysis at the gene/region scale remains to find differentially methylated regions (DMRs) between your conditions.
thank you i did that actually but many of them are not coming up or common to my rna seq , I wanted to use MultiāOmics Factor Analysis framework that requires perhaps having both the datasets same dimension so can you suggest me a way without merging or filtering how to do that?
I'm just curious how did they give gene name which are unique TCGA LAML methylation in this data they don't have probe ID rather genes which are already mapped i guess
I have never used MOFA so unfortunately I cannot help you on this point, but from what I see it seems possible that datasets which have different features dimension as long as the number of samples are the same Indeed it is really interesting how the summed up the methylation per gene, maybe they took the average (I cannot find any information on that) but it is not a good approach to me since with microarray genes are not equally covered. It is even more surprising that for other datasets from TCGA, LinkedOmics seems to provide methylation per CpG
" dimension as long as the number of samples are the same " Thank you now i can give it a try . yes my samples are same in both the data sets
"t is even more surprising that for other datasets from TCGA, LinkedOmics seems to provide methylation per CpG" yes not sure how they did it and thank you for the mofa part clearing confusion