DESeq2 with unbalanced dataset and multifactor design
1
0
Entering edit mode
12 months ago
ari.bo • 0

Dear all,

I am doing a DGEA with DESeq2 and data imported with tximport. I have an unbalanced dataset as reported below. With the counts and the metadata that I have I would like to answer different questions.

  1. I would like to look at the differential expressed genes between the different Lines (e.g. Line "C" vs. Line "15" and all the combinations) while controlling for differences in Person_ID and Location
  2. I would like to look at the differential expressed genes between the different Locations (e.g. Location "A" vs. Location "R" and all the other combinations) while controlling for differences in Person_ID and Line

  3. I would like to look at the differential expressed genes between two different Lines in the same Location, e.g. considering Location A: which is the difference between Line 15 and Line 20 ?

  4. I would like to look at the differential expressed genes between two different Locations in the same Line, e.g. considering Line 20: which is the difference between Location A and Location D ?

Here is the code that I use to build the DESeq2 object:

dir<-"path/quantif"
setwd(dir)

samples<-list.files(dir)

files <-file.path(dir,samples,"t_data.ctab")
names(files)<-substr(str_split_i(files, "/", 6), 1, 10)
all(file.exists(files))

tmp <- read.csv(files[1], sep="\t")
head(tmp)
tx2gene <- tmp[, c("t_name", "gene_name")]

head(tx2gene)
txi<-tximport::tximport(files, type = "stringtie", tx2gene = tx2gene)
head(txi$counts)

samples_meta<-read.csv(file="/path/metadata.csv", sep="," , header = TRUE)

txi$counts <- txi$counts[, rownames(samples_meta)]
dds <-DESeqDataSetFromTximport(txi, colData = samples_meta,  design = ~ Line)    # which design do I have to use?? 

dds$Line <- relevel(dds$Line, ref = "C") # do I need a reference level? 
keep <- rowSums(counts(dds)) >= 10
dds <- dds[keep,]
dds<-DESeq(dds)

Metadata of the object:

     Sample_ID Person_ID Line Location
4  SA18082157    KK1451   15        A
7  SA18083382   KK1473N   15        S
12 SA18083387   KK1450N   15        R
14 SA18083360   KK1480N   15        S
19 SA18083365   KK1551N   15        R
25 SA18083368   KK1471N   15        A
27 SA18086317   KK1443N   15        D
32 SA18051387   KK1868N   15        D
38 SA18051384   KK1865N   15        S
41 SA18051386   KK1601N   15        A
18 SA18083364   KK1551N   18        R
23 SA18083366   KK1471N   18        A
33 SA18088686   KK1671N   18        A
36 SA18088660   KK1434N   18        R
2  SA18082155    KK1451   20        A
6  SA18083381   KK1473N   20        S
10 SA18083386   KK1450N   20        R
16 SA18083344   KK1480N   20        S
17 SA18083363   KK1551N   20        R
24 SA18082300   KK1471N   20        A
26 SA18086318   KK1443N   20        D
29 SA18086315   KK1374N   20        D
31 SA18051386   KK1868N   20        D
34 SA18086313   KK1671N   20        A
37 SA18051383   KK1865N   20        S
40 SA18051388   KK1601N   20        A
3  SA18083341    KK1451    C        A
8  SA18083346   KK1473N    C        S
11 SA18083343   KK1450N    C        R
15 SA18083345   KK1480N    C        S
20 SA18083342   KK1551N    C        R
1  SA18082156    KK1451    I        A
5  SA18083383   KK1473N    I        S
13 SA18083386   KK1480N    I        S
22 SA18083367   KK1471N    I        A
30 SA18086316   KK1374N    I        D
35 SA18086318   KK1671N    I        A
39 SA18051385   KK1865N    I        S
42 SA18051360   KK1601N    I        A
9  SA18083385   KK1450N    M        R
21 SA18082301   KK1471N    M        A
28 SA18088662   KK1443N    M        D

Which is the best design to answer all these questions? Do I need different designs for all the questions?

Once the design is set, how can I extract the right contrast?

sessionInfo( )

R version 4.3.1 (2023-06-16)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04.3 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.0 
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8   
 [6] LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

time zone: Europe/Berlin
tzcode source: system (glibc)

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] forcats_1.0.0               stringr_1.5.0               dplyr_1.1.3                 purrr_1.0.2                
 [5] readr_2.1.4                 tidyr_1.3.0                 tibble_3.2.1                tidyverse_1.3.1            
 [9] ggplot2_3.4.4               DESeq2_1.40.2               SummarizedExperiment_1.30.2 Biobase_2.60.0             
[13] MatrixGenerics_1.12.3       matrixStats_1.0.0           GenomicRanges_1.52.1        GenomeInfoDb_1.36.4        
[17] IRanges_2.34.1              S4Vectors_0.38.2            BiocGenerics_0.46.0         tximport_1.28.0            

loaded via a namespace (and not attached):
 [1] gtable_0.3.4            lattice_0.22-5          tzdb_0.4.0              vctrs_0.6.4             tools_4.3.1            
 [6] bitops_1.0-7            generics_0.1.3          parallel_4.3.1          fansi_1.0.5             pkgconfig_2.0.3        
[11] Matrix_1.6-1.1          dbplyr_2.3.4            readxl_1.4.3            lifecycle_1.0.3         GenomeInfoDbData_1.2.10
[16] compiler_4.3.1          munsell_0.5.0           codetools_0.2-19        RCurl_1.98-1.12         pillar_1.9.0           
[21] crayon_1.5.2            BiocParallel_1.34.2     DelayedArray_0.26.7     abind_1.4-5             rvest_1.0.3            
[26] tidyselect_1.2.0        locfit_1.5-9.8          stringi_1.7.12          grid_4.3.1              colorspace_2.1-0       
[31] cli_3.6.1               magrittr_2.0.3          S4Arrays_1.0.6          utf8_1.2.3              broom_1.0.5            
[36] withr_2.5.1             scales_1.2.1            backports_1.4.1         timechange_0.2.0        lubridate_1.9.3        
[41] modelr_0.1.11           XVector_0.40.0          httr_1.4.7              cellranger_1.1.0        hms_1.1.3              
[46] haven_2.5.3             rlang_1.1.1             Rcpp_1.0.11             glue_1.6.2              DBI_1.1.3              
[51] xml2_1.3.5              reprex_2.0.2            rstudioapi_0.15.0       jsonlite_1.8.7          R6_2.5.1               
[56] fs_1.6.3                zlibbioc_1.46.0
DESeq2 • 482 views
ADD COMMENT
0
Entering edit mode
12 months ago
jv ★ 1.8k

The answers to your questions are well described in the DESeq2 tutorial at https://bioconductor.org/packages/devel/bioc/vignettes/DESeq2/inst/doc/DESeq2.html

I recommend reviewing Multi-factor designs and Contrasts

ADD COMMENT

Login before adding your answer.

Traffic: 1600 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6