Before answering your question:
However in my count data, half of the samples are aligned to reference genome using mm10 genome build, and the other half is aligned using mm9(bam files).
Why such situation? Are these different experiments you want to analyse together? If this is the case, you have to take into account batch effects, and depending on the experimental design, it will be impossible to untangle batch effects from your factors of interest.
Regarding your question:
The mm10 genome sequence is better (more bases and less errors) than mm9, and one generally gets more mapped reads when using mm10 as reference genome.
In addition, and more important, the annotation have changed considerably, mostly with new genes added to mm10, but also with gene models changing between versions, pseudo-genes and incorrect annotations being dropped, and some genes / transcripts changing names.
In summary, you have to map the original reads to the same reference genome to proceed with differential expression analysis.
Yes of course that will influence your analysis. You should realign the mm9 data to mm10, and then use the same annotation (GTF) file for producing count files.
Thank you for your answer, they are from same experiment, they just used two different genome builds to map reads. So I have mm9 and mm10 count files and want to analyse them together.. At first I thought even the genome annotation is different in half of the samples, overall result should be same. Maybe I can create additional factor to control batch effect on my analysis..
Please use
ADD COMMENT/ADD REPLY
when responding to existing posts to keep threads logically organized.This comment belongs under @h.mon's answer.
You do have to align all reads to the same genome version, preferably mm10.
You don't need to correct for a batch effect, as there is none. I just asked because I considered odd to have part of the samples mapped to mm9, and part mapped to mm10, and I reasoned it could be due to the samples being sequenced at different times, due to being different experiments.