Question

Accidental flip of "control" and "comparison" group for differential gene expression analysis

0

Entering edit mode

15 months ago

ivingan • 0

I recently discovered that my spatial transcriptomic samples were improperly labeled causing me to assign each sample to the wrong condition. Thankfully I only have one experimental sample and one control sample, so spotting the error and correcting the file names should be an easy fix.

I have already heavily processed and analyzed these data and it will be a pain to go back and correct everything (especially because my lab is losing access to gcloud so I will have to run most of the Seurat processing on my local device). My question is, is that necessary? With regards to differential expression analyses, is it as simple as multiplying the fold2change column by -1? Are there any other steps in the process that I should be aware of that requires correction control vs a comparison? like is it necessary at normalization and integration steps? Or is the only thing that is affected here the differential expression fold change values and the actual file names (I'm really hoping this is the answer).

Spatial-Transcriptomics RNA-seq • 835 views

ADD COMMENT • link 15 months ago by ivingan • 0

score 0 · Answer 1 · 2023-08-25

0

Entering edit mode

15 months ago

Istvan Albert 101k

Statistical methods do not treat the control samples differently. It is just a label.

I believe that your final list of genes would be the same.

That being said, this sort of after-the-fact manual relabeling is very much a risky behavior and you may create various inconsistencies if you report other data. Instead of manually relabeling and multiplying the values with this or that, use them as is.

The fold change is one condition divided by the other. Typically we have B/A, but as long as you clearly define it as A/B, nothing needs to be done. Whatever foldchange you have will reflect the correct comparison.

ADD COMMENT • link 15 months ago by Istvan Albert 101k

0

Entering edit mode

I appreciate the educational lesson on data hygiene and reporting, for lack of a better term. I was never formally trained in much of any of this, its been tutorials and forums that have gotten me further in my transcriptomics research than my PhD mentors.

Aside from the differential expression, is there any other step in a spatial transcriptomic (or single cell, because the data structures are very similar) pipeline that depends on the correct designation of control and comparison? The only step I can maybe think of is the integration steps in seurat. But I'm not sure if that normalizes across all samples equally, or refers to one as the baseline and the other as the comparison. Do you have any insight on this?

ADD REPLY • link 15 months ago by ivingan • 0