Hello Biostars Community,
If the probe sequences on a mouse methylation beadchip array is based on the (mm10) GRCm38 genome build, would it be a bad idea to annotate the probes with the latest genome build (mm39) GRCm39?
I am thinking yes it would be a bad idea, I just need some confirmation from peers, or maybe other helpful insight.
Thank you in advance!
What do you hope to gain from using the latest genome build? Usually, the biggest advantage for using new genome builds is to get a better representation of the actual sequences and particularly difficult spots may be better represented in newer builds. That can be beneficial if you're dealing with genome-wide data and may help reduce alignment artifacts. I don't really see how that would come into play with an array though that's presumably based on high-confidence loci to begin with.
But maybe I'm also misunderstanding what you're actually trying to do on a technical level, i.e. are you mostly referring to gene annotation? [But again, I would make the decision based on what you hope to gain from it]
Thank you for responding Friederike
So basically I just wanted to have more accurate/precise and higher quality gene and promoter annotations. I have come to the conclusion (almost) that I should annotate with the newer genome build. Long story short, simply a gene of interest had a promoter annotations in mm39 that was better than mm10. Generally it made more sense. The promoters were annotated near every TSS (there are probably exceptions to this). Now I just have to see how the rest of gene/promoter annotations turn out. Overall, in my case, it's probably the best idea to re annotate everything over in mm39. I am also convinced because the sesame package author recommends the same: https://github.com/zwdzwd/sesame/issues/47#issuecomment-915715414
Thank you again : )