best way to combine RNAseq data from different sequencing batches
2
2
Entering edit mode
4.8 years ago
emi ▴ 80

Hi,

I have 40 samples which I have sent for RNA-seq to the depth of 30-60M reads per sample. The sequencing company have sequenced my samples in 3 different batches to achieve the required depth.

My questions are:

1 is there a 'best practice' for how to combine the data from 3 batches into one before DGE analysis? I am aware that there's a method to merge bam files using samtools but I also know of those who convert individual bam files into counts and combine the counts after. Both sounds equally reasonable to me. Is there a different in each method? if so, which is better?

  1. At present, I have decided to try out the second option - convert bam files to counts and combine them. For DGE - should I add all the counts from 3 batches together or use the average? (The sum seems to be the more logical method to appreciate the DGE at the required depth.)

Thank you. Appreciate all advice as I am new to this.

RNA-Seq rna-seq bam sequencing • 3.6k views
ADD COMMENT
0
Entering edit mode

Are these technical replicates? Meaning you took one library and put third in each run?

ADD REPLY
0
Entering edit mode

These are technical replicated all from the same library prep. They were sequenced 3 times to achieve the required depth of 30-60M reads/sample

ADD REPLY
2
Entering edit mode
4.8 years ago

Unclear as to whether these are biological replicates or technical replicates split across three runs equally. For the former, assuming you use one of the common place DGE tools (like DESeq2 or edgeR), you can just provide batch in your design formula so that it is taken into account and differences due to batch will be ignored. There's no need to combine reads from biological replicates, and those same tools have the ability to collapse technical replicates if the latter is the case.

I'd recommend reading the DESeq2 or edgeR vignettes.

ADD COMMENT
0
Entering edit mode

Thank you for your suggestion. I had a read on the DeSeq2 vignette and the manual for collapseReplicates function. Does the collapseReplicates function simply add the counts per gene together for all technical replicates (which I can do manually from count matrix) ?

ADD REPLY
0
Entering edit mode

If you take a look at the source code of collapseReplicates(), you will see that there is a rowSums() call summing up gene counts per "group", so yes you can do so manually too.

And you can also feed all of your FASTQ files (resulting from different runs) of a given sample to your aligner of choice at once, that way you will end up with one single BAM file. This will be analogous to summing up gene counts.

ADD REPLY
1
Entering edit mode
4.8 years ago

Running the same library multiple times does not add any technical error unless the instrument screws up. You can join the fastqs from the beginning, or at any point after.

ADD COMMENT

Login before adding your answer.

Traffic: 1899 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6