Big BCV (Biological Coefficient of Variation) - no sense to continue the analysis of differential gene expression?
2
1
Entering edit mode
21 months ago
Ann ▴ 40
  1. Is it possible to perform differential gene expression analysis on data with such dispersion, BCV and MDSplot? (Fig. A, B)

    y_disp_design <- estimateDisp(y_filtered, design = design)

    y_disp_design$common.dispersion

    0.3251901

  2. Is it possible to perform differential gene expression analysis on data with such experimental design?

I work with the data of a non-model invertebrate in which an particular organ (syncytial structure) develops in its tissues.

  • A - "normal" body tissues before organ development
  • B,C,D - 3 consecutive development stages of this organ

Each sample B,C,D contains contamination by "normal" tissues (sample A) as it was impossible to separate them.

It was assumed that the proportion of "normal" tissues (sample A) would be approximately the same in all samples, but, as I understand from the location of A samples on the MDS plot and the high values of BCV, this was not achieved.

The aim of the study was to identify some of signaling pathways involved in development of the organ of interest.

  1. Are there any ways to analyze such data? Or the problems described above make any statistical analysis impossible?

My pipeline: Trimming -> Trinity -> CD-HIT -> TransRate -> Salmon -> tximport -> EdgeR

plotMDS_plotBCV

BCV differential RNA-seq EdgeR design • 1.4k views
ADD COMMENT
1
Entering edit mode
21 months ago
LChart 4.7k

Variable levels of 'normal' contamination is a perennial issue in cancer tumor sequencing; and a number of methods have been developed to address the issue. You could try using something like IsoPureR (https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-015-0597-x) or ContamDE (https://academic.oup.com/bioinformatics/article/36/8/2492/5698700) to estimate 'Normal' and 'A' profiles, 'Normal' and 'B' profiles (etc) which would 'decontaminate' A, B, C. There are other methods; and you may need to provide marker genes for major cell types so profiles can be modeled as linear combinations of cell types; but these approaches should be applicable in this setting.

ADD COMMENT
0
Entering edit mode

Thank you very much for your answer! I didn't even know about this approach. I will try it.

ADD REPLY
0
Entering edit mode
21 months ago

You hardly ever want to tell the people who paid for the experiment that these was "no point".

But it is awfully underpowered. 2 replicates a condition really isn't really enough, to say nothing of your other concerns.

ADD COMMENT
0
Entering edit mode

Initially there were three replicates for each sample, but I was forced to throw third replacations out of the analysis, because I had doubts that they were dissected in the same way as the others. Also they were sampled with a difference of a year relative to other replicates and clearly demonstrated a batch effect, and the BCV for this dataset was even higher

The experiment was planned and carried out without my participation, so now I work only with the data received. It seems to me that the most honest way would be to redirect the focus of research to the de novo transcriptome analysis (for example, comparison with transcriptomes from other species or something like that). However, I want to be sure that there is no way to do otherwise.

ADD REPLY

Login before adding your answer.

Traffic: 1916 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6