What reference do you use when mapping your RNASeq reads from human?
(I have been recently quite frustrated by inconsistencies between different databases or even within the same database. e.g. when downloading data chromosome by chromosome using http://useast.ensembl.org/biomart/martview I am getting different results than when downloading genes for the whole genome in bulk)
The important features for me right now are:
- completeness; I want ALL genes, including Y-chromosomal genes (e.g. XKRY is often missing)
- sequence and corresponding chromosome must be listed
I also looked at RefSeq genes here: ftp://ftp.ncbi.nlm.nih.gov/refseq/H_sapiens/RefSeqGene/ where it looks like I should concatenate those 8 files before looking into them (???)
I will be happy if you could share which reference you are using in your RNASeq experiments.
Also, those 8 RefSeq files together only have 5598 entries.