design

Question

How to consider batch effect and multiple variable to identify differential gene expressions for a given Phenotype in DESeq2

0

Entering edit mode

3.3 years ago

pierre • 0

Hello All,

I am working on RNAseq data for which one I have the Phenotype (DPN or PDPN), Gender (female or Male), Age and the batch (1st_round or 2nd_round ):

ID  PiNS.ID Phenotype   PiNS    Gender  Age batch
PINS_0112   112 PDPN    PINS_0112   Female  64  1st_round
PINS_0171   171 DPN PINS_0171   Male    74  2nd_round

Basically, I want to get the differential gene expression regarding the Phenotype (DPN or PDPN). However, I did a PCA and I have a strong batch effect. So, I would like to design my analysis to get the differential gene expression for the Phenotype but also considering the batch effect (by removing it), Gender and the Ages.

So far, I designed my analysis as follow:

dds <- DESeqDataSetFromMatrix(countData = countData,colData = colData, design = ~ batch + Phenotype)
dds$Phenotype <- factor(dds$Phenotype, levels = c("DPN","PDPN"))
dds <- DESeq(dds, full=design(dds))#, reduced = ~ batch)

First, I am not sure if it is the correct way to remove/consider the batch effect? Secondly, I am not sure how to write my design to also consider the Gender and the Age!

Many thanks in advance for your help.

Kind regards, Pierre

DESeq2 • 2.6k views

ADD COMMENT • link 3.3 years ago by pierre • 0

0

Entering edit mode

What is your colData ?

ADD REPLY • link 3.3 years ago by Carlo Yague 8.9k

0

Entering edit mode

Hi Carlo,

Thanks a lot for you answer. Really sorry, here you go:

head(colData) 
             PiNS.ID Phenotype  PiNS Gender Age     batch
PINS_0112     112      PDPN PINS_0112 Female  64 1st_round
PINS_0162     162       DPN PINS_0162   Male  49 1st_round
PINS_0168     168      PDPN PINS_0168   Male  59 1st_round
PINS_0170     170       DPN PINS_0170   Male  27 1st_round
PINS_0172     172      PDPN PINS_0172   Male  70 1st_round
PINS_0174     174      PDPN PINS_0174   Male  63 1st_round

The 2nd_round starts at the half of my design.tsv.

Thanks, Pierre

ADD REPLY • link 3.3 years ago by pierre • 0

0

Entering edit mode

Can you post the full colData? Is age supposed to be categorical (like young/old) or continuous?

ADD REPLY • link 3.3 years ago by ATpoint 85k

0

Entering edit mode

Sorry ATpoint, it seems that my answer did not appear, I reposts it!

So here is the full colData:

          PiNS.ID Phenotype      PiNS Gender Age     batch
PINS_0112     112      PDPN PINS_0112 Female  64 1st_round
PINS_0162     162       DPN PINS_0162   Male  49 1st_round
PINS_0168     168      PDPN PINS_0168   Male  59 1st_round
PINS_0170     170       DPN PINS_0170   Male  27 1st_round
PINS_0172     172      PDPN PINS_0172   Male  70 1st_round
PINS_0174     174      PDPN PINS_0174   Male  63 1st_round
PINS_0200     200      PDPN PINS_0200   Male  67 1st_round
PINS_0202     202       DPN PINS_0202   Male  76 1st_round
PINS_0208     208      PDPN PINS_0208   Male  64 1st_round
PINS_0210     210       DPN PINS_0210   Male  59 1st_round
PINS_0212     212       DPN PINS_0212   Male  74 1st_round
PINS_0213     213      PDPN PINS_0213 Female  55 1st_round
PINS_0217     217       DPN PINS_0217   Male  64 1st_round
PINS_0218     218      PDPN PINS_0218   Male  49 1st_round
PINS_0222     222       DPN PINS_0222 Female  68 1st_round
PINS_0231     231       DPN PINS_0231 Female  74 1st_round
PINS_0233     233      PDPN PINS_0233 Female  70 1st_round
PINS_0235     235      PDPN PINS_0235   Male  57 1st_round
PINS_0239     239      PDPN PINS_0239   Male  82 1st_round
PINS_0245     245       DPN PINS_0245 Female  70 1st_round
PINS_0247     247       DPN PINS_0247   Male  67 1st_round
PINS_0248     248       DPN PINS_0248   Male  72 1st_round
PINS_0249     249       DPN PINS_0249   Male  48 1st_round
PINS_0251     251      PDPN PINS_0251   Male  75 1st_round
PINS_0253     253       DPN PINS_0253   Male  64 1st_round
PINS_0256     256      PDPN PINS_0256   Male  66 1st_round
PINS_0259     259       DPN PINS_0259   Male  81 1st_round
PINS_0265     265      PDPN PINS_0265   Male  75 1st_round
PINS_0278     278      PDPN PINS_0278   Male  44 1st_round
PINS_0291     291      PDPN PINS_0291 Female  69 1st_round
PINS_0292     292       DPN PINS_0292   Male  64 1st_round
PINS_0293     293       DPN PINS_0293 Female  58 1st_round
PINS_0294     294      PDPN PINS_0294   Male  58 1st_round
PINS_0302     302      PDPN PINS_0302   Male  59 1st_round
PINS_0306     306       DPN PINS_0306   Male  61 1st_round
PINS_0311     311      PDPN PINS_0311   Male  76 1st_round
PINS_0318     318      PDPN PINS_0318   Male  75 1st_round
PINS_0336     336      PDPN PINS_0336   Male  62 1st_round
PINS_0337     337       DPN PINS_0337   Male  62 1st_round
PINS_0338     338      PDPN PINS_0338   Male  41 1st_round
PINS_0346     346       DPN PINS_0346   Male  65 1st_round
PINS_0351     351       DPN PINS_0351   Male  59 1st_round
PINS_0363     363      PDPN PINS_0363   Male  57 1st_round
PINS_0364     364       DPN PINS_0364 Female  72 1st_round
PINS_0365     365       DPN PINS_0365   Male  77 1st_round
PINS_0375     375       DPN PINS_0375   Male  71 1st_round
PINS_0378     378       DPN PINS_0378   Male  73 1st_round
PINS_0385     385       DPN PINS_0385   Male  63 1st_round
PINS_0386     386      PDPN PINS_0386 Female  57 1st_round
PINS_0519     519       DPN PINS_0519 Female  70 1st_round
PINS_0526     526       DPN PINS_0526   Male  51 1st_round
PINS_0528     528      PDPN PINS_0528   Male  53 1st_round
PINS_0530     530      PDPN PINS_0530 Female  51 1st_round
PINS_0532     532      PDPN PINS_0532 Female  65 1st_round
PINS_0536     536       DPN PINS_0536   Male  56 1st_round
PINS_0538     538       DPN PINS_0538 Female  74 1st_round
PINS_0539     539       DPN PINS_0539   Male  56 1st_round
PINS_0548     548      PDPN PINS_0548 Female  47 1st_round
PINS_0550     550      PDPN PINS_0550 Female  61 1st_round
PINS_0553     553      PDPN PINS_0553 Female  69 1st_round
PINS_0556     556      PDPN PINS_0556   Male  68 1st_round
PINS_0560     560       DPN PINS_0560 Female  62 1st_round
PINS_0561     561       DPN PINS_0561   Male  72 1st_round
PINS_0567     567      PDPN PINS_0567   Male  69 1st_round
PINS_0568     568      PDPN PINS_0568 Female  69 1st_round
PINS_0569     569      PDPN PINS_0569   Male  61 1st_round
PINS_0572     572      PDPN PINS_0572 Female  54 1st_round
PINS_0574     574       DPN PINS_0574 Female  53 1st_round
PINS_0599     599      PDPN PINS_0599 Female  26 1st_round
PINS_0700     700      PDPN PINS_0700   Male  64 1st_round
PINS_0721     721      PDPN PINS_0721   Male  61 1st_round
PINS_0726     726       DPN PINS_0726 Female  65 1st_round
PINS_0739     739      PDPN PINS_0739   Male  60 1st_round
PINS_0742     742       DPN PINS_0742 Female  72 1st_round
PINS_0747     747      PDPN PINS_0747   Male  80 1st_round
PINS_0754     754      PDPN PINS_0754   Male  61 1st_round
PINS_0756     756       DPN PINS_0756   Male  84 1st_round
PINS_0757     757       DPN PINS_0757   Male  71 1st_round
PINS_0761     761      PDPN PINS_0761   Male  73 1st_round
PINS_0768     768      PDPN PINS_0768   Male  63 1st_round
PINS_0772     772       DPN PINS_0772   Male  71 1st_round
PINS_0798     798      PDPN PINS_0798 Female  80 1st_round
PINS_0799     799      PDPN PINS_0799 Female  86 1st_round
PINS_0800     800       DPN PINS_0800   Male  71 1st_round
PINS_0808     808      PDPN PINS_0808   Male  77 1st_round
PINS_0813     813       DPN PINS_0813   Male  68 1st_round
PINS_0814     814       DPN PINS_0814 Female  82 1st_round
PINS_0819     819       DPN PINS_0819   Male  70 1st_round
PINS_0821     821       DPN PINS_0821   Male  45 1st_round
PINS_0822     822       DPN PINS_0822   Male  69 1st_round
PINS_0826     826      PDPN PINS_0826   Male  80 1st_round
PINS_0827     827       DPN PINS_0827 Female  53 1st_round
PINS_0828     828      PDPN PINS_0828 Female  71 1st_round
PINS_0834     834      PDPN PINS_0834   Male  78 1st_round
PINS_0840     840      PDPN PINS_0840   Male  63 1st_round
PINS_0842     842       DPN PINS_0842 Female  73 1st_round
PINS_0845     845       DPN PINS_0845 Female  81 1st_round
PINS_0911     911      PDPN PINS_0911   Male  55 1st_round
PINS_0912     912       DPN PINS_0912 Female  73 1st_round
PINS_0920     920       DPN PINS_0920   Male  73 1st_round
PINS_0929     929       DPN PINS_0929   Male  66 1st_round
PINS_0932     932       DPN PINS_0932   Male  76 1st_round
PINS_0936     936       DPN PINS_0936   Male  70 1st_round
PINS_0938     938       DPN PINS_0938   Male  78 1st_round
PINS_0940     940       DPN PINS_0940   Male  72 1st_round
PINS_0945     945       DPN PINS_0945 Female  76 1st_round
PINS_0968     968      PDPN PINS_0968 Female  79 1st_round
PINS_0977     977       DPN PINS_0977   Male  38 1st_round
PINS_0982     982       DPN PINS_0982   Male  70 1st_round
PINS_0171     171       DPN PINS_0171   Male  74 2nd_round
PINS_0177     177      PDPN PINS_0177   Male  74 2nd_round
PINS_0187     187      PDPN PINS_0187   Male  68 2nd_round
PINS_0189     189      PDPN PINS_0189   Male  46 2nd_round
PINS_0197     197      PDPN PINS_0197   Male  41 2nd_round
PINS_0205     205      PDPN PINS_0205   Male  69 2nd_round
PINS_0207     207      PDPN PINS_0207   Male  64 2nd_round
PINS_0216     216       DPN PINS_0216 Female  59 2nd_round
PINS_0219     219      PDPN PINS_0219   Male  72 2nd_round
PINS_0226     226      PDPN PINS_0226 Female  68 2nd_round
PINS_0229     229       DPN PINS_0229   Male  57 2nd_round
PINS_0240     240       DPN PINS_0240   Male  70 2nd_round
PINS_0242     242      PDPN PINS_0242 Female  64 2nd_round
PINS_0252     252      PDPN PINS_0252 Female  71 2nd_round
PINS_0257     257      PDPN PINS_0257   Male  72 2nd_round
PINS_0261     261      PDPN PINS_0261 Female  74 2nd_round
PINS_0262     262      PDPN PINS_0262 Female  53 2nd_round
PINS_0264     264      PDPN PINS_0264   Male  79 2nd_round
PINS_0267     267       DPN PINS_0267 Female  53 2nd_round
PINS_0270     270      PDPN PINS_0270   Male  79 2nd_round
PINS_0273     273       DPN PINS_0273   Male  72 2nd_round
PINS_0274     274      PDPN PINS_0274   Male  77 2nd_round
PINS_0277     277      PDPN PINS_0277   Male  68 2nd_round
PINS_0279     279       DPN PINS_0279 Female  69 2nd_round
PINS_0280     280       DPN PINS_0280   Male  76 2nd_round
PINS_0281     281      PDPN PINS_0281   Male  55 2nd_round
PINS_0283     283       DPN PINS_0283   Male  78 2nd_round
PINS_0284     284      PDPN PINS_0284 Female  66 2nd_round
PINS_0286     286      PDPN PINS_0286   Male  84 2nd_round
PINS_0287     287      PDPN PINS_0287 Female  69 2nd_round
PINS_0296     296      PDPN PINS_0296   Male  63 2nd_round
PINS_0301     301       DPN PINS_0301   Male  73 2nd_round
PINS_0303     303      PDPN PINS_0303   Male  71 2nd_round
PINS_0307     307      PDPN PINS_0307   Male  49 2nd_round
PINS_0310     310       DPN PINS_0310 Female  76 2nd_round
PINS_0313     313       DPN PINS_0313   Male  71 2nd_round
PINS_0316     316      PDPN PINS_0316   Male  57 2nd_round
PINS_0320     320       DPN PINS_0320   Male  76 2nd_round
PINS_0321     321       DPN PINS_0321   Male  74 2nd_round

I got the data like that! Many thanks for your help! I am not in the field of RNAseq, I struggle a bit...

Pierre

ADD REPLY • link 3.3 years ago by pierre • 0

0

Entering edit mode

Hi all,

I did some modifications on my design.tsv. Instead of having Age as a continuous variable, I transformed it to be categorical (I got an Error regarding the numerical value):

          PiNS.ID Phenotype      PiNS Gender Age     batch
PINS_0112     112      PDPN PINS_0112 Female   E 1st_round
PINS_0162     162       DPN PINS_0162   Male   C 1st_round
PINS_0168     168      PDPN PINS_0168   Male   D 1st_round
PINS_0170     170       DPN PINS_0170   Male   A 1st_round
PINS_0172     172      PDPN PINS_0172   Male   F 1st_round
PINS_0174     174      PDPN PINS_0174   Male   E 1st_round

Then, I modified my previous code above to design my analysis to get the differential gene expression for the Phenotype but also considering the batch effect, Gender and the Ages:

dds <- DESeqDataSetFromMatrix(countData = countData,colData = colData, design = ~ Phenotype + batch + Gender + Age)# + Phenotype:batch)
dds <- DESeq(dds, full=design(dds))
resPheno  <- results(dds, name=("Phenotype_PDPN_vs_DPN"))

After running the code, I got this warning message :

21 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest
.....
2 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest

Firstly, I am not sure to understand why I got this warning message. Is it because of my design?

If not, I was wondering if my design is the correct way to consider the batch effect, Gender and Age to identify differential gene expression regarding the Phenoype (PDPN vs DPN)?

At the end, I got 31 genes that have different expressions. However, for example, when I plot the top 6 of those genes (one example):

par(mfrow=c(2,3))
plotCounts(dds, gene="ENSG00000270641.1|TSIX", intgroup="Phenotype")

This gene (and the others) does not appear differentially expressed between the Phenotype (PDPN vs DPN) but more randomly across the samples. So, I am not sure if it's correct and it's because I don't have different genes expressed between my Phenotype or it is because I did not design my analysis correctly!

Any help or advice will be much appreciated. Many thanks in advance. Pierre

ADD REPLY • link 3.3 years ago by pierre • 0

0

Entering edit mode

Hi ! Your code and design look good. I think it is a good idea to use categorial rather than continuous Age factor. In the manual, it is stated that continuous covariates should only be used where a constant fold change is expected for each unit of the covariate, which probably is not the case with age.

Concerning the warning, it happens when you have rows with many zeros across your samples. You could perhaps pre-filter those, as describe here. Or as suggested in the warning, increase the number of iteration of the nbinomWaldTest function (maxit = 1000 for instance).

Once you fix this, if you still have only a few DEG, then either there is a very small phenotype effect compared to residual variance OR sequencing depth is insufficient to measure robust gene expression changes. For further diagnostic, performing a PCA will allow to assess the overall portion of variability explained by the Phenotype effect and a MA-plot might help you assess if you have sufficient read count.

ADD REPLY • link 3.3 years ago by Carlo Yague 8.9k

0

Entering edit mode

Hi Carlo,

Many thanks for your reply! I did the modification as suggested:

design

    dds <- DESeqDataSetFromMatrix(countData = countData,colData = colData, design = ~ Phenotype + batch + Gender + Age)# + Phenotype:batch) 
#filters
    dds <- estimateSizeFactors(dds)
    nc <- counts(dds, normalized=TRUE)
    filter <- rowSums(nc >= 10) >=2
    dds <- dds[filter,]

    dds <- estimateSizeFactors(dds)
    dds <- estimateDispersions(dds)
    dds <- nbinomWaldTest(dds, maxit=50500)

As you can see I have played with the maxit value (to increase it) because I am still getting the same warning message:

22 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest

My code is not correct or someone knows what it did not solve the warning message?

Thanks again for your help, much appreciated. Pierre

ADD REPLY • link 3.3 years ago by pierre • 0

1

Entering edit mode

I do not have in-depth knowledge of what maxit is actually doing but I think your filter is basically not doing much other than removing all-zero rows as you are only requesting that at least two samples have more than 10 counts but you have almost 150 samples, so you could still have like 98% of samples having zeros. I would either increase the filtering, say that at least, e.g. 10 or 20% per group must have counts above 10, or you can use the filterByExpr function in edgeR which does filtering with respect to the sample size and design with defaults that seem to be generally well accepted. But first I would just try quick'n'dirty to require 10% of samples have some counts, which is roughly 15 samples here rather than just two, and then play with this theshold a bit and see whether it solves the issue:

filter <- rowSums(nc >= 10) >= ceiling(ncol(dds)/10) # 10% of all samples

ADD REPLY • link 3.3 years ago by ATpoint 85k

0

Entering edit mode

Hi ATpoint,

Many thanks for your reply!

I did what you suggested, but unfortunately I am still getting the same warning message!

Not sure what to do more! I will try to find something!

Thanks again for your help, Pierre

ADD REPLY • link 3.3 years ago by pierre • 0