Hello,
I am analyzing RNA sequencing results. There are a lot of sources to download human reference genome, providing different versions of .fa and .gtf files. I meet some questions when selecting from them:
Is there the best/ most common/ default version of human reference genome for general RNA-seq analyses?
Is it always good to adopt most recent version of the reference genome?
Some .gtf files of the human reference genome do not contain genes on the mitochondrial DNA. Is it better to take the genes in mitochondrial genome into consideration by using .gtf files with them, even though the analyses do not focus on these genes?
As for questions 1. and 2., you can find all the necessary files here. It is generally a good idea to use the latest genome version and annotation since these will contain the most comprehensive and up-to-date information. Less gaps and therefore, more complete sequences in the genome will lead to a more accurate mapping of the reads, which will improve all kind of downstream analyses.
Regarding the mitochondrial genes, if you are not interested in the expression of these, it does not matter if they are included or not in the annotation file. If they are, you'll get an expression value for them, and if they are not, you won't. However, it is important that the reference genome contains the mitochondrial chromosome sequence, so that the reads belonging to the mitochondrial map in their correct location and not somewhere else. The reference genome includes the mitochondrial chromosome, so no problem with that .