Can anyone shed some light on the relative merits of the different human iGenomes data sets i.e. Ensemble/NCBI/UCSC and which is most suited to use for basic gene expression analysis using the tuxedo suite.
Huw
Can anyone shed some light on the relative merits of the different human iGenomes data sets i.e. Ensemble/NCBI/UCSC and which is most suited to use for basic gene expression analysis using the tuxedo suite.
Huw
hg19 is the same as GRCh37 (http://www.ncbi.nlm.nih.gov/assembly/2758/). Since the release of GRCh37, the GRC (http://genomereference.org) has been releasing genome patches (http://www.ncbi.nlm.nih.gov/projects/genome/assembly/grc/info/patches.shtml). Ensembl and NCBI annotate patch releases, but not always the same ones. For example, NCBI is showing GRCh37.p5 and Ensembl is showing GRCh37.p7. In all of these cases, the chromosome coordinates are identical- the only difference between GRCh37 and any patch release are the patches.
It is my understanding that the diff between Ensemble/NCBI/UCSC is the sequence that you are aligning to. If you are going to be visualizing all your results on UCSC genome browser using hg19 assembly then go w/the UCSC one. NCBI/Ensemble might have newer human genome reference assemblies or assemblies that include supercondigs and mitochondria. If you have the space, you can download all of them and compare what is different between them (they will take quite a bit of space since most are about 10GB compressed).
I think you can use most recent one. For example:
Ensembl GRCh37 9696 MB Oct 24 2011
NCBI build37.2 11786 MB Oct 24 2011
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.