Entering edit mode
5.7 years ago
yliueagle
▴
290
In ENCODE, sometimes there are choices of "hg19" and "hg19 v19" when downloading the aligned RNA data. Is there a big difference between these versions? (See here as an example https://www.encodeproject.org/experiments/ENCSR297UBP/)
While in GEO, most of the descriptions are like "The reads were filtered, trimmed, and aligned in the UCSC reference human genome 19 (hg19)". I am wondering "hg19" is equivalent to "hg19 v19" (See here as an example of GEO: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM3110715)
I would say that v19 in the version of the gencode annotation https://www.gencodegenes.org/human/release_19.html
Thanks. Will there be a big difference if my focus is on gene expression analysis, between choosing UCSC hg19 and gencode v19?
No, there should be no difference in the results. You can read a bit about Ensembl and GENCODE here.
There are however, differences in the formatting of some files. I am not sure which RNA-seq pipeline you will be using, but for example, in
Salmon
, you would want to consider using the flag--gencode
during index generation if you choose the GENCODE reference.I disagree. Given that GENCODE contains more genes than UCSC, you perform more comparisons during differential testing and therefore the FDR-adjusted p-values might change. The difference might be limited but stating there was no difference is imho incorrect.
Hi ATpoint
You are absolutely correct. Apologies for the confusion. For some reason, I miss-interpreted the question as asking if there would be a difference between gencode and ensembl, which I wouldn't expect much. For UCSC, this is indeed true.
[Note: in retrospect, I have no idea what the source of my confusion was :) ]. Again, thanks for the correction.