hello people!
My RNA-seq data has been sequenced in 5 batches. Upon applying MW stats, I found significance for primary aligned read counts between batches. Since the data is for severity, I compared mild of batch 1 with mild of another batch and likewise for mod and severe categories to assure that the significance between batches (for primary aligned reads for all mild, mod and sev) is coming because of only technical variability as I can't risk of losing actual biological impact between any of those groups.
I found significance in primary aligned read count when I compared mild of batch 1 with another batch and likewise but this wasn't consistent for all.
Considering this, I need to remove the batch effect. since I am at downstream considering coverage breadth (as the question is "how much of the gene body is covered by my seq reads?"), ... for this no tool is available. I read about DSeq2, using limma and other packages in back end for same, so thought to apply that.
Can anyone please help me understand how limma or other more suitable tool performs this batch correction (data normalisation), and how can I do this?
Thanks in advance!
Can you post your experimental layout, so which samples belong to which group and batch?
sure, so i am trying to figure out how much by coverage breadth, the gene body (reference) got covered by my sequencing reads. in 5 batches, illumina rna seq was performed (paired end), for 112 samples (45, 46, 21 in mild, mod and severe category).. for coverage breadth i did bedtools
to be assure that coverage has got no influence of read count, i did calculate sample wise read count by samtools and qualimap... got exact same figure.
now, to proceed, i observed MW high significance for primary reads percentages (taken without classification...) not to lose any biological impact, i did calculate MW significance this time between batches (mild of one batch with another ...likewise and then likewise for mod and sev categories) .
i still see high significance in most of the groups, though not in all. now looking to remove these batch effects or simply anyhow if i can normalise my read count data.
below is the classification.
Primary-aligned read count (%) across batches (severity wise)
how did you even compare read count can you tell? " i found signifcance for orimary aligned readcounts between batches. since the data is for severity, i compared mild of batch 1 with mild of another batch and likewise for mod and severe categorie" this part?
"i found significance in primary aligned read count when comapred mild of batch 1 with another batch and likewise... but this wan't consistent for all." please give you multiqc result that would help others to validate what you are saying
Mann whitney
can you give your multiqc report upload it or just the screenshot that is helpful for all.
Meanwhile what EDA(Exploratory data analysis ) have you done on your data ? if yes please let know
"can anyone please make me understand how limma or other more suitable tool performs this batch correction (data normalisation), and how can i do this ..." there are many solved issue which you can find in biostar or biconductor, what i understand from your approach is complicated and convoluted, either you are not given the right protocol or you are doing it for the first time. If I were you I wont compare the read count first. I would do some transformation using deseq2 or edger or limma any of them and go ahead with EDA such as start with PCA, clustering, correlation just to get the idea of underlying data
![multibamqc_ss
thank you for your response yes, doing it for the first time
uploading screenshots of my multiqc file