Question

Filtering methylation probe ID from multiple probes

0

Entering edit mode

23 months ago

1769mkc ★ 1.2k

For downstream analysis I trying to use methylation 450k data The M value, So the data is something like this

 dput(methyl[1:10,1:3])
    structure(list(Symbol = c("A4GALT", "A4GALT", "A4GALT", "A4GALT", 
    "A4GALT", "A4GALT", "A4GALT", "A4GALT", "A4GALT", "A4GALT"), 
        `TCGA-AB-2856` = c(-0.69571999396859, 6.59452651543373, -3.31241269267196, 
        -2.27831006586008, -6.67087214612625, -4.07354075597074, 
        -6.72587345772808, -3.99270962745257, 6.30759056557904, 4.35275426216806
        ), `TCGA-AB-2849` = c(-2.1258506029936, 6.29288805154616, 
        -0.989789351415863, -1.87695599373517, -6.29710435957612, 
        -0.552953206195101, -6.39859496846795, 1.81668629311401, 
        6.2635495345495, 3.91415116195022)), row.names = c(NA, -10L
    ), class = c("data.table", "data.frame"), .internal.selfref = <pointer: 0x000002708c1367e0>)

Here we can see for single gene there are multiple probes how do i filter or merge?

Is it correct to merge them or average them?

methylation • 891 views

ADD COMMENT • link 23 months ago by 1769mkc ★ 1.2k

1

Entering edit mode

Each probe should correspond to one CpG, I do not think it would be a good idea to merge them as it stands because they are located in different genomic regions. Maybe if you find an appropriate annotation you can average them over a same CpG island or regulatory region but the gold standard in methylation array analysis at the gene/region scale remains to find differentially methylated regions (DMRs) between your conditions.

ADD REPLY • link 23 months ago by Basti ★ 2.0k

0

Entering edit mode

thank you i did that actually but many of them are not coming up or common to my rna seq , I wanted to use Multi‐Omics Factor Analysis framework that requires perhaps having both the datasets same dimension so can you suggest me a way without merging or filtering how to do that?

ADD REPLY • link 23 months ago by 1769mkc ★ 1.2k

0

Entering edit mode

I'm just curious how did they give gene name which are unique TCGA LAML methylation in this data they don't have probe ID rather genes which are already mapped i guess

ADD REPLY • link 23 months ago by 1769mkc ★ 1.2k

2

Entering edit mode

I have never used MOFA so unfortunately I cannot help you on this point, but from what I see it seems possible that datasets which have different features dimension as long as the number of samples are the same Indeed it is really interesting how the summed up the methylation per gene, maybe they took the average (I cannot find any information on that) but it is not a good approach to me since with microarray genes are not equally covered. It is even more surprising that for other datasets from TCGA, LinkedOmics seems to provide methylation per CpG

ADD REPLY • link 23 months ago by Basti ★ 2.0k

0

Entering edit mode

" dimension as long as the number of samples are the same " Thank you now i can give it a try . yes my samples are same in both the data sets

"t is even more surprising that for other datasets from TCGA, LinkedOmics seems to provide methylation per CpG" yes not sure how they did it and thank you for the mofa part clearing confusion

ADD REPLY • link 23 months ago by 1769mkc ★ 1.2k