Combining data from single- and paired-end sequencing done on same samples
1
0
Entering edit mode
19 months ago
tise.suz ▴ 10

Hi everyone,

This is my first post in this community as I'm still fairly new to bioinformatics. Please forgive me if I make any mistakes.

My current issue is in regards to reads of the same samples not matching for single- versus paired-end reads. PCA of Samples That is, when I perform differential gene expression analysis on the count matrix generated by each run, I get many genes with opposite expression trends. I was advised by our genomics center to just merge the count matrices; however, I do not feel comfortable doing so given that the PCA shows separation of the samples and the single-end library has more reads than the paired-end. I know that I would have to at least perform some type of batch correction since the runs were done separately. So, I would really appreciate any insight on how best to troubleshoot this issue.

bulk Illumina RNA-seq System Ovation single-end NextSeq 550 paired-end Universal • 1.3k views
ADD COMMENT
3
Entering edit mode
19 months ago

If you want to avoid the headache of batch correction you could consider only the R1 read when aligning/quantifying the paired end data.

ADD COMMENT
0
Entering edit mode

That having said, were the samples processed equally and only the sequencing mode differs or are there any other sources of potential batch effects?

ADD REPLY
0
Entering edit mode

Thank you, rpolicastro! I will try to run the paired-end with only the R1 reads.

ATpoint I extracted the RNA samples, but the genomics core did the library preparation for single- and paired-end, so I'm not sure whether anything else was different. I'll ask them :)

ADD REPLY
0
Entering edit mode

Ask them if all samples were prepared the same. SE/PE is the same library, it's just a different mode on the sequencer.

ADD REPLY
0
Entering edit mode

I'm guessing they were not prepared the same way. Here's the updated graph: enter image description here

The R1 samples are the R1 from the paired reads.

ADD REPLY
0
Entering edit mode

It does look like there was potentially some additional batch on top of the run type.

Mind remaking the plot but excluding the paired-end data when running PCA? Assuming CR and SE refer to your conditions it might be possible to reduce this batch effect via a covariate in your model.

You should also start investigating what could have caused this batch. Is there some difference in sample prep this could be attributed to such as a different RNA purification kit, or read length? Or perhaps it's something more subtle like the samples being prepared at different times.

ADD REPLY
1
Entering edit mode

I'm sorry for the late reply. A lot has happened in the past few days. It turns out the center mislabeled the single-end samples... Also, the read lengths are different between paired-end and single-end. Thank you!

ADD REPLY

Login before adding your answer.

Traffic: 2078 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6