I m running tophat protocol i have cufflink files , when i plot those normalised files as boxplot see quite a variation among the samples and i have take those data from three different source .So what kind of batch correction can I use to make those variation less .?
Any help and suggestion would be highly appreciated
but to use RUVseq there should be ERCC spike mix .The sample Im using doesnt have ERCC so I cannot use RUVSeq
Please read the documentation. There are 4 different types of normalization and ERCC is just one of them.
so can I use the cufflink data in the normalisation?
You need to use the raw counts. As you don't have spike-in controls, it will estimate the systematic effects on least differentially expressed genes ( ignoring top 5000 differential genes, by default ) and use them to normalize the data. The documentation is pretty clear.
yes that the other way to do it and I had done with one of the sample that has spike in control..so I cannot use the cufflink data batch correction ?
Instead of providing the cufflinks calculated FPKM values, you could calculate the FPKM/RPKM using edgeR
rpkm()
function, and then correct for batches. If you know the batches, you can useremoveBatchEffect()
in edgeR, otherwise try to use RUVSeq. I am not sure if there is any package that accepts the normalized values and corrects for batches. You have to tweak to make sure you are not over normalizing the data.