Hello, I am going to analyze some data from Illumina 450k Methylation array, which are in .idat format. My supervisor wants me to work with Python but it's a new language for me, therefore I'm really lost. I am trying to use rpy2 tool to combine Python and R functions, but it seems to be out of my range. Currently I am working with minfi package and trying to remove some probes, eg.probes for sex chromosomes. In R it is easy job - I am finding indexes of probes which are NOT related to sex chromosomes using command:
keep <- !(featureNames(mSetSqFlt) %in% ann450k$Name[ann450k$chr %in%
c("chrX","chrY")])
and then I am replacing my old MethylSet with the new one, not containing unwanted probes:
mSet <- mSet[keep,]
But in Python I don't know how to extract only the data of interest from MethylSet. For now I have indexes of probes which I want to keep:
ann450k = minfi.getAnnotation('IlluminaHumanMethylation450kanno.ilmn12.hg19')
tmp = biobase.featureNames(mSet)
fnames = []
for i in range(len(matches)):
fnames.append(tmp[i])
names = dollar(ann450k, "Name") #dollar is in dictionary so it's basically chrs=ann450k$Name
chrs = dollar(ann450k, "chr")
matches = [i for i in range(0,len(names)) if chrs[i]!="chrX" and chrs[i]!="chrY"]
namesNew = []
for i in range(len(matches)):
namesNew.append(names[matches[i]])
idxs = [i for i, item in enumerate(fnames) if item in namesNew]
If someone could give me a piece of advice on how to process my data, I would be really grateful. Maybe there are some Python tools that would allow me to read and process .idat files? I haven't found any so far. My goal for now is to preprocess them with subset quantile normalization and SWAN algorithms to reconstruct pipeline used by the authors of experiment.
Kind regards, Agata
Can't you explain to your supervisor that you have to use the right tool for the right job, and that in this case R is far more convenient? Rpy2 can be a pain and generally not worth the trouble.