Question

Featurecounts to DESEq2 - How to merge biological replicates?

0

Entering edit mode

8.2 years ago

Sreeraj Thamban ▴ 310

Hi I have four aligned BAM files for two conditions, treated and untreated,each of them having two biological replicates, I could run featurecounts for the four BAM files separtly,but I dont know how to make these output files compatible for downstream differential gene expression analysis using DESeq2, can anyone help please... Thank you.

DESeq2 RNA-Seq featurecounts • 8.9k views

ADD COMMENT • link updated 2.8 years ago by DareDevil ★ 4.4k • written 8.2 years ago by Sreeraj Thamban ▴ 310

2

Entering edit mode

Everything is explained in DESeq2 manual.

1) Get the count (featurecount is a good tool for that) 2) Provide 3 variables: a. A matrix of the count (rows=genes, column=samples). b. A matrix with all the informations on your samples (names, group=control/case etc) c. A design (Ex: ~group) 3) Run DESeq2 following the manual

Regarding the output of featureCount you should get for each sample a dataframe containing by row the genes (and info) with the last column being the count associated to each gene. You can simply combine the outputs by column to get the matrix of count require by DESeq.

ADD REPLY • link 8.2 years ago by VHahaut ★ 1.2k

1

Entering edit mode

featureCounts produces a matrix directly when you feed it multiple BAM files (provide them in the order you want the samples to be in the matrix columns).

ADD REPLY • link 8.2 years ago by GenoMax 152k

0

Entering edit mode

Dear Genomax, can you tell me how to feed multiple bam files to featurecounts?

ADD REPLY • link 8.2 years ago by Sreeraj Thamban ▴ 310

2

Entering edit mode

featureCounts [options] -o counts.txt file1.bam file2.bam file3.bam etc

ADD REPLY • link 8.2 years ago by GenoMax 152k

1

Entering edit mode

You need a matrix file with genes as your rows and samples as your columns. No need to merge the replicates.

ADD REPLY • link 8.2 years ago by GenoMax 152k

0

Entering edit mode

Hi Genomax2, I have the matrix files for all bam files, but how to load these files to Deseq2? I have been using kallisto - tximport-deseq2 pipeline and there it was straight forward, Thanks

ADD REPLY • link 8.2 years ago by Sreeraj Thamban ▴ 310

0

Entering edit mode

help(read.delim)

ADD REPLY • link 8.2 years ago by Devon Ryan 105k

0

Entering edit mode

Hi: For my understanding. Read.delim can only read 1 txt file once. I have used this when using edgeR and limma for differential gene expression. I am also wondering how can merge several Rsubread generated txt files for Deseq2. In edgeR and limma, I have used readDGE but not understand very well. for example in edgeR and limma manual I have, the import way is:

files <- c("files1", "files2", "files3", "files4)
x <- readDGE(files, columns=c(1,3))

I thought that x is the count matrix but it turns out it is not. using colnames(x) I saw the x already contains my sample names as column names. I don't know how it works so I am also intrested in this question. It looks that feed Rsubread several bam files is a good approach, but if I already have the individual txt files from RSbread what is the solution?

ADD REPLY • link 5.0 years ago by Kai_Qi ▴ 130

0

Entering edit mode

Combine read.delim with lapply. Have a look at how readDGE works internally.

ADD REPLY • link 5.0 years ago by Devon Ryan 105k

score 3 · Answer 1 · 2022-10-13

Using R you can merge the individual output files of feature count into one:

library(purrr)
library(tidyverse)
f_files<- list.files("path/to/your/countfolder", pattern = "featureCount.txt", full.names = T)

read_in_feature_counts<- function(file){
        cnt<- read_tsv(file, col_names =T, comment = "#")
        cnt<- cnt %>% dplyr::select(-Chr, -Start, -End, -Strand, -Length)
        return(cnt)
}

raw_counts<- map(f_files, read_in_feature_counts)
raw_counts_df<- purrr::reduce(raw_counts, inner_join) 
write.table(raw_counts_df, "raw_counts.txt", sep="\t")