Normalization of NGS data generated from different platforms of microRNAs
1
0
Entering edit mode
5.0 years ago
skjobs ▴ 190

I have read count of microRNAs NGS data generated from different platform such as Illumina NextSeq 500 and HiSeq 2000. Do I need to normalized it before differential expression analysis?. Which is the best of method to normalise data before DEs. I thought the data is generated from two different machines,

rna-seq sequencing next-gen R • 1.6k views
ADD COMMENT
1
Entering edit mode

I would check first by PCA if there is an obvious batch effect. Typically different Illumina platforms are very similar and do not influence the result dramatically. You can normalize the data by vst from DESeq2 and then use plotPCA. If there is no evidence for batch effect perform normal DEG analysis as in the manuals of established DEG tools.

ADD REPLY
2
Entering edit mode
5.0 years ago

Depends what samples were sequenced on what machines. NextSeq 500 and HiSeq are similar technologies, so the effects may not be too strong. Hopefully you have samples from each of the conditions sequenced on each of the machines. In which case you can just include the machine as a co-variate in your linear model.

So if you had a sample table that looked like:

Sample    Condition    Machine
1         Control      Hiseq
2         Control      Hiseq
3         Control      Nextseq
4         Control      Nextseq
5         Treatment    Hiseq
6         Treatment    Hiseq
7         Treatment    Nextseq
8         Treatment    Nextseq

Then you could use the design formula ~ Machine + Treatment to correct for the effect of the different machine, the same as any other batch effect. In the ideal world you design you be perfectly balanced, like in the example above, but you should get some benefit from this as long as your design isn't perfectly confounded (i.e. all the controls on one machine, all of the treatments on the other).

You can look for the effect of batch using dimensional reduction, such as MDS or PCA. You'll be looking for whether the samples cluster by batch or by condition. MDS is the easiest, but it can be difficult to interpret. If you are going to use PCA, you'll want to do some sort of variance stabilization, like DESeq2::rlog or DESeq2::vst first, but you can then look for your batch effects in more than just two dimensions.

If you do have perfect confounding, then there is not much you can so (consider the an MDS plot - clustering by machine and clustering by condition are the same thing, how could you, or any statistical method tell the difference?). Given the similarity of the technologies, the results of this DE might still be indicative in the absence of any correction, but I'd be nervous about basing any conclusion solely on this evidence.

Note all the above technically applies not just to analysis done on two different types of machine, but also on two different machines of the same type, or even two different lanes on the same machine.

ADD COMMENT
0
Entering edit mode

Thanks, Samples are not homogeneous like:

Sample    Condition    Machine
1         Control      Hiseq
2         Control      Hiseq
3         Control      Nextseq
4         Control      Nextseq
5         Treatment    Hiseq
6         Treatment    Hiseq
7         Treatment    Nextseq
8         Treatment    Nextseq

Few samples generated from HiSeq and few of them from NextSeq 500. Can I used simple statistical method to normalization (Log2, LOESS, Quantile, etc). So that all data will scale into one.

ADD REPLY
1
Entering edit mode

Most DE methods require raw, unnormalised counts for their statistical models to be valid, so I won't recommend any normalization outside that provided by the DE package unless you know exactly what you are doing. The samples do not have to be homogeneous for batch correction in the manner described to work, as long as the batches are not perfectly confounded with the condition. If there is perfect confounding then I can't see that any method is going to help.

If there is perfect confounding, and the only difference is the sequencing machine, you'd almost certainly be better not doing any normalization. In theory the machine used for sequencing should make little or no difference as long as everything else was the same (library prep kits, extraction protocols, people doing the preps...)

ADD REPLY
0
Entering edit mode

If these are different libraries/made by different people then I would also add that variable in your model. It would be interesting to see if sequencer type has no contribution to the batch effect you are going to see (if it is there).

ADD REPLY

Login before adding your answer.

Traffic: 1558 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6