Question

How to make a gene counts matrix ? (input edgeR/DESeq2)

0

Entering edit mode

4.7 years ago

sunnykevin97 ▴ 990

Hi I used abundance_estimates_to_matrix.pl from trinity package and generate the gene count matrix for all ~32 samples. and how do I combine into one matrix so that I feed it as a input into edgeR/DESeq2 for differential gene expression analysis. I got struck, help me.

1-8 --> Odor  --> Experiment1
9-16 --> Model --> Experiment2
17-24 --> Real --> Experiment3
24--32 --> Control --> Experiment4
gene.counts.matrix 
    adundance
TRINITY_DN0_c0_g1   12717.86
TRINITY_DN100000_c0_g1  93.65
TRINITY_DN100001_c0_g1  110.58
TRINITY_DN100002_c0_g1  80.76
TRINITY_DN100003_c0_g1  386.84
TRINITY_DN100004_c0_g1  137.85
TRINITY_DN100005_c0_g1  52.14
TRINITY_DN100006_c0_g1  87.19
TRINITY_DN100009_c0_g1  27.88

RNA-Seq rna-seq Assembly gene • 2.6k views

ADD COMMENT • link updated 4.7 years ago by Alex Nesmelov ▴ 200 • written 4.7 years ago by sunnykevin97 ▴ 990

0

Entering edit mode

Well, I'm out of my mind, instead of using complicated r packages. I though it simpler way to do it. I called all the csv's in a directory

>setwd("/home/cmd/csv")
>filelist = list.files(pattern = "*.csv$")
> list.files()
 [1] "1_S4.csv"   "10_S9.csv"  "11_S13.csv" "12_S17.csv" "13_S21.csv" "14_S26.csv" "15_S30.csv" "16_S2.csv"  "17_S6.csv"  "18_S10.csv" "19_S14.csv" "2_S8.csv"   "20_S18.csv" "21_S22.csv"
[15] "22_S27.csv" "23_S31.csv" "24_S23.csv" "25_S28.csv" "26_S32.csv" "27_S3.csv"  "28_S7.csv"  "29_S11.csv" "3_S12.csv"  "30_S15.csv" "31_S19.csv" "32_S24.csv" "4_S16.csv"  "5_S20.csv" 
[29] "6_S25.csv"  "7_S29.csv"  "8_S1.csv"   "9_S5.csv"   "merge.r"

How do i merge all the csv in to one using merge function (merge(S1,s2,by="GeneID",all = TRUE) ?

Head Sample1 GeneID ,1_S4 TRINITY_DN0_c0_g1,7491.29 TRINITY_DN100000_c0_g1,45.16 TRINITY_DN100001_c0_g1,95.2 TRINITY_DN100002_c0_g1,47.17 TRINITY_DN100003_c0_g1,94.85

ADD REPLY • link 4.7 years ago by sunnykevin97 ▴ 990

0

Entering edit mode

Please don't post comments as answers. Rather, use the Add Reply or Add Comment options as appropriate.

ADD REPLY • link 4.7 years ago by jared.andrews07 ★ 18k

0

Entering edit mode

Agreed, I posted as reply. Thanks.

ADD REPLY • link 4.7 years ago by sunnykevin97 ▴ 990

0

Entering edit mode

I answered this here: https://support.bioconductor.org/p/132018/#132020 Please do not cross-post.

ADD REPLY • link 4.7 years ago by ATpoint 87k

score 1 · Answer 1 · 2020-06-17

1

Entering edit mode

4.7 years ago

jared.andrews07 ★ 18k

tximport provides many, many examples of how to do this from almost any file format. This is similar to the salmon examples there.

ADD COMMENT • link 4.7 years ago by jared.andrews07 ★ 18k

0

Entering edit mode

HI, instead of using tximport I used trinity abundance matrix generation scripts for all the samples and generated a matrix. I had ~32 individual matrixes generated how do I combine them into one matrix ? I'm not so clear with tximport so confusing for me. How do i do it, can please make to so simple for me. I'm working with a non model organism doesn't have any information in public domain. salmon.gene.counts.matrix

GeneID                               10_sample
TRINITY_DN0_c0_g1   12717.85
TRINITY_DN100000_c0_g1  93.65
TRINITY_DN100001_c0_g1  110.58
TRINITY_DN100002_c0_g1  80.76
TRINITY_DN100003_c0_g1  386.84
TRINITY_DN100004_c0_g1  137.85
TRINITY_DN100005_c0_g1  52.14
TRINITY_DN100006_c0_g1  87.19
TRINITY_DN100009_c0_g1  27.88

ADD REPLY • link 4.7 years ago by sunnykevin97 ▴ 990

0

Entering edit mode

Biostars is not a coding service - we will not write your code from scratch for you. We can help you troubleshoot, but you need to show some effort. What have you tried? What issues are you running in to? The vignette I linked is quite easy to follow, especially if you are using the "salmon" method from the trinity script.

ADD REPLY • link 4.7 years ago by jared.andrews07 ★ 18k

0

Entering edit mode

Well, the non-model organism I'm working with doesn't have a reference genome or GTF file available in public domain. I'm out of my mind, instead of using complicated r packages. I though it is simpler way to do it. I called all the csv's in a directory

>setwd("/home/cmd/csv")
>filelist = list.files(pattern = "*.csv$")
> list.files()
 [1] "1_S4.csv"   "10_S9.csv"  "11_S13.csv" "12_S17.csv" "13_S21.csv" "14_S26.csv" "15_S30.csv" "16_S2.csv"  "17_S6.csv"  "18_S10.csv" "19_S14.csv" "2_S8.csv"   "20_S18.csv" "21_S22.csv"
[15] "22_S27.csv" "23_S31.csv" "24_S23.csv" "25_S28.csv" "26_S32.csv" "27_S3.csv"  "28_S7.csv"  "29_S11.csv" "3_S12.csv"  "30_S15.csv" "31_S19.csv" "32_S24.csv" "4_S16.csv"  "5_S20.csv" 
[29] "6_S25.csv"  "7_S29.csv"  "8_S1.csv"   "9_S5.csv"   "merge.r"

How do i merge all the csv in to one using merge function (merge(S1,s2,by="GeneID",all = TRUE) ?

Head Sample1 GeneID ,1_S4 TRINITY_DN0_c0_g1,7491.29 TRINITY_DN100000_c0_g1,45.16 TRINITY_DN100001_c0_g1,95.2 TRINITY_DN100002_c0_g1,47.17 TRINITY_DN100003_c0_g1,94.85

ADD REPLY • link 4.7 years ago by sunnykevin97 ▴ 990

0

Entering edit mode

1) In colloquial English, "out of my mind" indicates an extreme amount of distress, which I don't think is your intended meaning. 2) I understand wanting to implement an import scheme that you understand, but if things are not working right, the virtue of using those R packages is that it is far easier for people to understand what you did. Also, the writers of those packages have thought of and solved possible problems that your homemade way might not address.

ADD REPLY • link 4.7 years ago by swbarnes2 14k

score 0 · Answer 2 · 2020-06-23

0

Entering edit mode

4.7 years ago

Alex Nesmelov ▴ 200

You may easily merge all tables, if you'll put them all in a list and then use purrr::reduce function. Try this:

library(tidyverse)
counts_files_pattern = "\\.csv$"

counts_table <-
   list.files(pattern = counts_files_pattern) %>%
  #apply an import function to the listed files
   map(function(x) { 
      print(x)
      read_csv(x,
            #rename second column in accordance with the file name
            col_names = c("ID", x),
             col_names = F)
  }
  ) %>%
 #merge all tables into one
 reduce(left_join, by = "ID") %>% 
 # clean up column names
 rename_all(gsub,
         pattern = counts_files_pattern,
         replacement = '')

Depending on your csv delimiter, you may need to replace read_csv function by read_csv2 - just try on a single file and make sure that import is correct. I can't understand, whether your files already contain column headers corresponding to sample ID. If they do, you can skip setting column names in the import function above and replace "ID" by "GeneID" in reduce function.

ADD COMMENT • link 4.7 years ago by Alex Nesmelov ▴ 200

0

Entering edit mode

I tried it shows error

Rscript rscript.r 
 [1] "1_S4.csv"   "10_S9.csv"  "11_S13.csv" "12_S17.csv" "13_S21.csv"
 [6] "14_S26.csv" "15_S30.csv" "16_S2.csv"  "17_S6.csv"  "18_S10.csv"
[11] "19_S14.csv" "2_S8.csv"   "20_S18.csv" "21_S22.csv" "22_S27.csv"
[16] "23_S31.csv" "24_S23.csv" "25_S28.csv" "26_S32.csv" "27_S3.csv" 
[21] "28_S7.csv"  "29_S11.csv" "3_S12.csv"  "30_S15.csv" "31_S19.csv"
[26] "32_S24.csv" "4_S16.csv"  "5_S20.csv"  "6_S25.csv"  "7_S29.csv" 
[31] "8_S1.csv"   "9_S5.csv"   "rscript.r" 
── Attaching packages ─────────────────────────────────────── tidyverse 1.3.0 ──
✔ ggplot2 3.3.2     ✔ purrr   0.3.4
✔ tibble  3.0.1     ✔ dplyr   1.0.0
✔ tidyr   1.1.0     ✔ stringr 1.4.0
✔ readr   1.3.1     ✔ forcats 0.5.0
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
[1] "1_S4.csv"
Error in read_csv(x, col_names = c("GeneID", x), col_names = F) : 
  formal argument "col_names" matched by multiple actual arguments
Calls: %>% ... eval -> _fseq -> freduce -> <Anonymous> -> map -> .f
Execution halted

  1 setwd("/home/data/csv")
  2 filelist = list.files(pattern = "*.csv$")
  3 list.files()
  4 library(tidyverse)
  5 counts_files_pattern = "\\.csv$"
  6 
  7 counts_table <-
  8    list.files(pattern = counts_files_pattern) %>%
  9   #apply an import function to the listed files
 10    map(function(x) {
 11       print(x)
 12       read_csv(x,
 13             #rename second column in accordance with the file name
 14             col_names = c("GeneID", x),
 15              col_names = F)
 16   }
 17   ) %>%
 18  #merge all tables into one
 19  reduce(left_join, by = "GeneID") %>%
 20  # clean up column names
 21  rename_all(gsub,
 22          pattern = counts_files_pattern,
 23          replacement = '')
 24 


    head 10_S9.csv 
    GeneID,                 10_S9
    TRINITY_DN0_c0_g1,12717.85
    TRINITY_DN100000_c0_g1,93.65
    TRINITY_DN100001_c0_g1,110.58
    TRINITY_DN100002_c0_g1,80.76
    TRINITY_DN100003_c0_g1,386.84
    TRINITY_DN100004_c0_g1,137.85
    TRINITY_DN100005_c0_g1,52.14
    TRINITY_DN100006_c0_g1,87.19
    TRINITY_DN100009_c0_g1,27.88

ADD REPLY • link 4.7 years ago by sunnykevin97 ▴ 990