Gene expression analysis on gene counts from different genome builds
1
0
Entering edit mode
6.3 years ago
skylinesky ▴ 10

Hello all, I will do a differential gene expression analysis using Deseq2. However in my count data, half of the samples are aligned to reference genome using mm10 genome build, and the other half is aligned using mm9(bam files). I got their gene count data using their respective genome build. I have merged all count files and will do a differential gene expression analysis. I am wondering whether using different genome builds count file can influence the result

Thanks!

RNA-Seq deseq2 gene count mm9 mm10 • 1.3k views
ADD COMMENT
2
Entering edit mode

Yes of course that will influence your analysis. You should realign the mm9 data to mm10, and then use the same annotation (GTF) file for producing count files.

ADD REPLY
0
Entering edit mode

Thank you for your answer, they are from same experiment, they just used two different genome builds to map reads. So I have mm9 and mm10 count files and want to analyse them together.. At first I thought even the genome annotation is different in half of the samples, overall result should be same. Maybe I can create additional factor to control batch effect on my analysis..

ADD REPLY
0
Entering edit mode

Please use ADD COMMENT/ADD REPLY when responding to existing posts to keep threads logically organized.

This comment belongs under @h.mon's answer.

ADD REPLY
0
Entering edit mode

You do have to align all reads to the same genome version, preferably mm10.

You don't need to correct for a batch effect, as there is none. I just asked because I considered odd to have part of the samples mapped to mm9, and part mapped to mm10, and I reasoned it could be due to the samples being sequenced at different times, due to being different experiments.

ADD REPLY
0
Entering edit mode
6.3 years ago
h.mon 35k

Before answering your question:

However in my count data, half of the samples are aligned to reference genome using mm10 genome build, and the other half is aligned using mm9(bam files).

Why such situation? Are these different experiments you want to analyse together? If this is the case, you have to take into account batch effects, and depending on the experimental design, it will be impossible to untangle batch effects from your factors of interest.

Regarding your question:

The mm10 genome sequence is better (more bases and less errors) than mm9, and one generally gets more mapped reads when using mm10 as reference genome.

In addition, and more important, the annotation have changed considerably, mostly with new genes added to mm10, but also with gene models changing between versions, pseudo-genes and incorrect annotations being dropped, and some genes / transcripts changing names.

In summary, you have to map the original reads to the same reference genome to proceed with differential expression analysis.

ADD COMMENT

Login before adding your answer.

Traffic: 1703 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6