Hi everyone,
This is my first post in this community as I'm still fairly new to bioinformatics. Please forgive me if I make any mistakes.
My current issue is in regards to reads of the same samples not matching for single- versus paired-end reads. That is, when I perform differential gene expression analysis on the count matrix generated by each run, I get many genes with opposite expression trends. I was advised by our genomics center to just merge the count matrices; however, I do not feel comfortable doing so given that the PCA shows separation of the samples and the single-end library has more reads than the paired-end. I know that I would have to at least perform some type of batch correction since the runs were done separately. So, I would really appreciate any insight on how best to troubleshoot this issue.
That having said, were the samples processed equally and only the sequencing mode differs or are there any other sources of potential batch effects?
Thank you, rpolicastro! I will try to run the paired-end with only the R1 reads.
ATpoint I extracted the RNA samples, but the genomics core did the library preparation for single- and paired-end, so I'm not sure whether anything else was different. I'll ask them :)
Ask them if all samples were prepared the same. SE/PE is the same library, it's just a different mode on the sequencer.
I'm guessing they were not prepared the same way. Here's the updated graph:
The R1 samples are the R1 from the paired reads.
It does look like there was potentially some additional batch on top of the run type.
Mind remaking the plot but excluding the paired-end data when running PCA? Assuming
CR
andSE
refer to your conditions it might be possible to reduce this batch effect via a covariate in your model.You should also start investigating what could have caused this batch. Is there some difference in sample prep this could be attributed to such as a different RNA purification kit, or read length? Or perhaps it's something more subtle like the samples being prepared at different times.
I'm sorry for the late reply. A lot has happened in the past few days. It turns out the center mislabeled the single-end samples... Also, the read lengths are different between paired-end and single-end. Thank you!