Deseq2 questions
1
0
Entering edit mode
17 months ago
UserA • 0

Hello everyone, I would like to perform a differential gene analysis on my dataset. It is composed of 8 pH points 5 different bacterial cocultures 3 biological triplicates = 120 samples. I have performed a de novo metatranscriptome with rnaSPAdes and now I am preparing my data to use Deseq2.

I'd like to carry out two types of analysis:

  1. Within my samples I have cultures of 2 bacteria and cultures of 3 bacteria (the same 2 + 1 to see the impact of adding this 3rd bacteria). I want to compare the differential gene expression of the 3-bacteria samples with the control, which would be my 2-bacteria sample. But I'm wondering if this is possible, given that the samples that always have one more bacteria (3-bacteria) than the control (2-bacteria) will inevitably have extra genes not present in the control?
  2. I want to compare gene expression between pH points for each 3-bacteria and 2-bacteria sample.

What tool should I use to obtain the counts of each gene per sample in order to have a matrix of counts for Deseq2? Is RSEM followed by tximport ok?

Do I have to use the RSEM tool (or other) to obtain a matrix of counts by groups of samples that I'm going to analyze together under Deseq2 or can I use it on all 120 samples and then cut up my matrix to use only groups of samples that I want to compare (the groups mentioned above, by groups of bacteria for the same pH and by pH points for the same co-culture) with Deseq2?

More theoretical question: Does it make sense to remove from the count matrix obtained the genes that with Blastx against a database give low % identities or should I do it earlier in the analysis like removing them from the metatranscriptome obtained directly?

deseq2 transcriptome • 1.7k views
ADD COMMENT
0
Entering edit mode
17 months ago
Asaf 10k
  1. You can normalize the the read counts using a list of single copy genes in the two common bacteria. You can obtain such a list from MusicC github page (the file uscg_76_kegg_min_2313.lst). After mapping the KOs to your genes it should be straightforward.
  2. Why are you mapping to the assembled transcriptome rather than the genomes? They should have a better quality and it should reduce the amount of noise, especially if you are dealing with closely related taxa.
  3. You can choose any mapping method you like, there are plenty out there. Once you have the full matrix of samples X genes, import it to R and start asking your questions with DESeq, I personally like to analyze as much data as I can together so I'll have better dispersion estimation.

Good luck

ADD COMMENT
0
Entering edit mode

Thank you for your reply. I now have my counts by transcript for each sample. So I'd like to use DESeq2 to make comparisons between my samples and my controls. I have a dataset including 2 variables, pH (8 values) and strains consortium (5 values) all in triplicate. My ColDesign for Deseq2 looks like this :

enter image description here

I would like to answer two questions: 1- For a given strains consortium do I have gene expression differences at any pH point compared to my control pH point (J1) ? 2- For a given pH point, do I have gene expression differences between my strains consortium compared to the control consortium (straincontrol) ?

1- I would like to have for straincontrol :

pHJ1_vs_pH6.45  pHJ1_vs_pH6.26  pHJ1_vs_pH6.00  pHJ1_vs_pH5.73  pHJ1_vs_pH5.37  pHJ1_vs_pH5.02 

pHJ1_vs_pH4.6

For strain1 :

pHJ1_vs_pH6.45  pHJ1_vs_pH6.26  pHJ1_vs_pH6.00  pHJ1_vs_pH5.73  pHJ1_vs_pH5.37  pHJ1_vs_pH5.02 

pHJ1_vs_pH4.6

And those comparisons for each of the strains.

2- I would like to have for pH J1 :

straincontrol_vs_strain2  straincontrol_vs_strain3  straincontrol_vs_strain4  straincontrol_vs_strain5

For pH 4.6 :

straincontrol_vs_strain2  straincontrol_vs_strain3  straincontrol_vs_strain4  straincontrol_vs_strain5

And those comparisons for each of the pH points.

I would like then generate one heatmap per pH comparison and per strain comparison since in each one I compare the samples to the same control. It would generate 13 heatmaps (8 with the pH and 5 with the strains consortium).

it is possible with DESeq2 to block one of the 2 variables, for example comparing for the consortium of strain1 the differents pH points to the control pH or should I split my data set to keep only the results of counts of genes corresponding to the consortium of strain1 and use DESeq2 with a single variable, the pH?

I tried differents designs such as :

dds <- DESeqDataSetFromTximport(txi, colData=colDesign, design= ~strain+pH)
dds <- DESeqDataSetFromTximport(txi, colData=colDesign, design= ~strain+pH+strain:pH)

but it doesn't give me the comparisons I want.

ADD REPLY
0
Entering edit mode

You would probably want to test if strain:pH is significant, you can test it with LRT test having the reduced formula strain + pH, this will tell you if the strains influence how pH changes gene expression. If you then want to test a specific strain for pH effect then you can use results to get the effect of a specific interaction term or compare one strain:pH to the reference strain:pH, which is different since it will include the pH overall (not strain specific) effect.

ADD REPLY
0
Entering edit mode

Thxs for the answer! I tried doing this :

files <- file.path("path_to_abundance_files/","Salmon_ORFs", samples$sample, "quant.sf")
names(files) <- samples$sample
txi <- tximport(files, type='salmon', txOut=TRUE, dropInfReps=TRUE)
colDesign <- data.frame(row.names=samples$sample, souche=samples$strain, pH=samples$pH, replicate=samples$replicate )

colDesign$strain<- factor(colDesign$strain, levels=c("straincontrol", "strain2", "strain3", "strain4", "strain5"))
colDesign$pH <- factor(colDesign$pH, levels=c("J1", "6.45", "6.26", "6.00", "5.73", "5.37", "5.02", "4.6"))

dds <- DESeqDataSetFromTximport(txi, colData=colDesign, design= ~strain+pH+strain:pH)
dds <- DESeq(dds, test="LRT", reduced = ~ strain+ pH)
resultsNames(dds)

I get these comparisons :

strain2_vs_straincontrol  strain3_vs_straincontrol   strain4_vs_straincontrol   strain5_vs_straincontrol   pH6.45_vs_pHJ1   pH6.26_vs_pHJ1   pH6.00_vs_pHJ1   pH5.73_vs_pHJ1   pH5.37_vs_pHJ1   pH5.02_vs_pHJ1   pH4.6_vs_pHJ1

and each strain except the straincontrol agaisnt each pH except J1.

I did this to extract the comparisons I want :

res = results(dds, contrast=list("strain2.pH6.45", "strain2.pH6.26"))

but as I don't have the comparison of each strain against the pHcontrol which is J1 I can't extract the comparisons I really want. And same for the controlstrain. How can I also get the comparisons with J1 and the straincontrol.pHxx ?

ADD REPLY
0
Entering edit mode

I didn't really understand what comparison you're looking for. You can use contrast=list(..,..) and put any pair of interacting strain and pH to compare them so you can compare a strain in different pH or two strains in the same pH. You can use the contrast=c(strain, 'strain1', 'strain2') to compare the overall effect of one strain against the control (regardless of the pH). If you can't find what you're looking for with this you can always design a vector the same length as the resultsNames with 1, 0 and -1 for the numerator, not relevant and denominator respectively.

Another thing you might want to consider is treating the pH as a continuous feature (and translate J1 to the actual pH onviously), you might want to add the squared pH for a LQ (linear quadratic) model to account for the case that the effect is not entirely linear. If the pH effect is vastly non-linear then forget what I've just said but I doubt it's the case. This will reduce the size of the design matrix.

ADD REPLY
0
Entering edit mode

To make a comparison I want, for exemple this one :

respH6.45_pHJ1 <- results(dds, contrast=list("strain2.pH6.45","strain2.pHJ1"))

I get this error : Error in checkContrast(contrast, resNames) : all elements of the contrast as a list of length 2 should be elements of 'resultsNames(object)'

When I run the command :

resultsNames(dds)

Comparisons of each strain against the pHcontrol which is J1 are not displayed. I have : strain2.pH6.45, strain2.pH6.26, strain2.pH6.00, strain2.pH5.73, strain2.pH5.37, strain2.pH5.02, strain2.pH4.6 but not strain2.pHJ1. however, I'd like to compare each pH point of the same strain to point J1 of that strain. Like this:

respH6.45_pHJ1 <- results(dds, contrast=list("strain2.pH6.45","strain2.pHJ1"))
respH6.26_pHJ1 <- results(dds, contrast=list("strain2.pH6.26","strain2.pHJ1"))
respH6.00_pHJ1 <- results(dds, contrast=list("strain2.pH6.00","strain2.pHJ1"))
respH5.73_pHJ1 <- results(dds, contrast=list("strain2.pH5.73","strain2.pHJ1"))
respH5.37_pHJ1 <- results(dds, contrast=list("strain2.pH5.37","strain2.pHJ1"))
respH5.02_pHJ1 <- results(dds, contrast=list("strain2.pH5.02","strain2.pHJ1"))
respH4.6_pHJ1 <- results(dds, contrast=list("strain2.pH4.6","strain2.pHJ1"))
ADD REPLY
0
Entering edit mode

If I do it like this :

respH6.45_pHJ1 <- results(dds, contrast=list(c("pH_6.45_vs_J1", "strain2.pH6.45")))

Does it give me the DEG between pH 6.45 and pH J1 for the strain2 ? I seen this possibility here : https://github.com/tavareshugo/tutorial_DESeq2_contrasts/blob/main/DESeq2_contrasts.md but I'm really not sure. Sorry I'm new to bioinformatics!

And is it normal to not have controlstrain.pHxx displayed after running the command resultsNames(dds) ?

ADD REPLY
0
Entering edit mode

You can add a 0 + strain + ... to the formula, this way there won't be an intercept and all the factors should appear in the resultsNames.

ADD REPLY
0
Entering edit mode

thxs for the help! with :

dds <- DESeqDataSetFromTximport(txi, colData=colDesign, design= ~0+souche+pH+souche:pH)
dds <- DESeq(dds, test="LRT", reduced = ~0+souche+pH)
resultsNames(dds)

I get :

controlstrain, strain2, strain3, strain4, strain5, pH6.45, pH6.26, pH6.00, pH5.73, pH5.37, pH5.02, pH4.6 and eachstrain.eachpH except pHJ1 (control) and controlstrain so I'm still not able to get the DEG of the controlstrain for eachpH against controlpH or maybe I miss something.

ADD REPLY
0
Entering edit mode

That's weird, the controls should be there

ADD REPLY

Login before adding your answer.

Traffic: 2116 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6