Question

Good or bad idea to annotate with newer genome build?

0

Entering edit mode

4.2 years ago

Pratik ★ 1.1k

Hello Biostars Community,

If the probe sequences on a mouse methylation beadchip array is based on the (mm10) GRCm38 genome build, would it be a bad idea to annotate the probes with the latest genome build (mm39) GRCm39?

I am thinking yes it would be a bad idea, I just need some confirmation from peers, or maybe other helpful insight.

Thank you in advance!

array methylation beadchip build annotation • 3.2k views

ADD COMMENT • link 4.2 years ago by Pratik ★ 1.1k

1

Entering edit mode

What do you hope to gain from using the latest genome build? Usually, the biggest advantage for using new genome builds is to get a better representation of the actual sequences and particularly difficult spots may be better represented in newer builds. That can be beneficial if you're dealing with genome-wide data and may help reduce alignment artifacts. I don't really see how that would come into play with an array though that's presumably based on high-confidence loci to begin with.

But maybe I'm also misunderstanding what you're actually trying to do on a technical level, i.e. are you mostly referring to gene annotation? [But again, I would make the decision based on what you hope to gain from it]

ADD REPLY • link 4.2 years ago by Friederike 9.0k

1

Entering edit mode

Thank you for responding Friederike

So basically I just wanted to have more accurate/precise and higher quality gene and promoter annotations. I have come to the conclusion (almost) that I should annotate with the newer genome build. Long story short, simply a gene of interest had a promoter annotations in mm39 that was better than mm10. Generally it made more sense. The promoters were annotated near every TSS (there are probably exceptions to this). Now I just have to see how the rest of gene/promoter annotations turn out. Overall, in my case, it's probably the best idea to re annotate everything over in mm39. I am also convinced because the sesame package author recommends the same: https://github.com/zwdzwd/sesame/issues/47#issuecomment-915715414

Thank you again : )

ADD REPLY • link 4.2 years ago by Pratik ★ 1.1k

score 2 · Accepted Answer · 2021-09-06

2

Entering edit mode

4.2 years ago

benformatics 4.2k

It's always a good idea whenever possible right... but if its a huge barrier to downstream analysis then probably not necessary. People still publish data aligned to dm3 (2006) and hg19 (2009) regularly... (e.g. https://pubmed.ncbi.nlm.nih.gov/34004147/).

The only time is would probably be a bad idea is if you are focused investigating regions that were improved in recent genome build (e.g. telomeres, centromeres, repeat-regions, etc...). The main thing is that probes targeting sites in mm10 that were split or don't exist in mm39 will change.

ADD COMMENT • link 4.2 years ago by benformatics 4.2k

0

Entering edit mode

Thank you. After input from you, Friederike , and the sesame package author. Definitely feel more confident in making the change over.

ADD REPLY • link 4.2 years ago by Pratik ★ 1.1k