I have a very basic question. We map the m-RNA sequence cDNA with the reference genome and then based on the alignment of the reads figure out of which of the genes were expressed. But when reference genome was created I understand that it is iterated continuously to map the best possible genome.
But no genome can be perfect that can be referred. So if we are mapping a cdna and saying that the gene is deferentially expressed, it is quite possible that the reference genome has a snp and the read under consideration is correct.
I am confused with this whole reference genome mapping. Can someone help me get a more clear picture
I mean, what you've said is mostly true (though I'm going to guess you mean the human genome - its less common to have intensive revision of (for example) bacterial genomes).
The 'answer' is that your reference should be the closet you can get to the origin of your RNA data. Its understood that there might be errors, there always will be, but it most likely doesn't matter except under very particular circumstances.
A different of a SNP or two in a gene isn't going to radically alter your transcription mapping (unless you only allow perfect mapping). A SNP might affect the actual transcription of the gene in some way though.
Thanks for the reply. Yes I meant human genome.
In a nutshell: the alignment algorithms can tolerate base-mismatches. Otherwise, very few reads would align to the reference genome due to sequencing error and also because the reference genomes themselves are not 'wholly representative', with many millions of base positions differing between individuals and populations.
Note, that for mRNA (RNA-seq), we can also use 'pseudo' aligners, such as Salmon and Kallisto, which align reads to a reference transcriptome. The reference transcriptome contains the mRNA nucleotode sequences of known genes.