Question

combining quantification (featureCounts) result files into a single dataset

1

Entering edit mode

7.7 years ago

HK ▴ 40

Hey All,

I have almost 160 output files in a folder from the featurecounts (quantification of RNA-Seq) and now i want to put that in one datframe to be use for DESeq.

The format of the fetaure counts result is havinf 7 columns:

Geneid   Chr    Start   End Strand  Length  Sample1

What is need is a dataframe with 1st column as geneid and rest taking the 7th column from r=each file

Geneid   Sample1   Sample 2  Sample 3    Sample4 .......................

The code that i have used is :

library(limma)
library(edgeR)
myfiles=list.files(pattern = "*.txt" )  #reading files from the directory having 160 samples.
x <- readDGE(files, columns=c(1,7))
counts=as.data.frame(x$counts)

But i get the error

" Error in read.table(file = file, header = header, sep = sep, quote = quote,  : 
  more columns than column names" after running x <- readDGE(files, columns=c(1,7))

featurecounts matix combine • 14k views

ADD COMMENT • link updated 2.7 years ago by DareDevil ★ 4.4k • written 7.7 years ago by HK ▴ 40

1

Entering edit mode

FeatureCounts can take multiple alignment files as input.

featureCounts [options] -o counts.txt file1.bam file2.bam file3.bam

ADD REPLY • link 7.7 years ago by Sej Modha 5.3k

0

Entering edit mode

@sej i already have the result from the featurecounts and have multiple txt files, i need to combine all the txt files in one dataframe.

ADD REPLY • link 7.7 years ago by HK ▴ 40

0

Entering edit mode

You can merge multiple dataframes by rownames assuming that unique geneids are defined as rownames. https://stackoverflow.com/questions/7739578/merge-data-frames-based-on-rownames-in-r

ADD REPLY • link 7.7 years ago by Sej Modha 5.3k

0

Entering edit mode

While you could do what @Sej has already suggested, rerunning featureCounts with the right order of BAM files (so you should not need to mess with the matrix later) will be much easier to get a single matrix file.

ADD REPLY • link 7.7 years ago by GenoMax 151k

0

Entering edit mode

If you have run featurecounts on each BAM file separately and you would like to merge the counts from all individual files, check this article for merging the counts from all individual files https://www.reneshbedre.com/blog/featurecounts-matrix.html

ADD REPLY • link 3.8 years ago by Renesh ★ 2.2k

score 7 · Answer 1 · 2022-10-13

7

Entering edit mode

2.7 years ago

DareDevil ★ 4.4k

You can try bash script as well:

ls -1  *featureCount.txt | parallel 'cat {} | sed '1d' | cut -f7 {} > {/.}_clean.txt' 
ls -1  *featureCount.txt | head -1 | xargs cut -f1 > genes.txt
paste genes.txt *featureCount_clean.txt > output.txt

ADD COMMENT • link 2.7 years ago by DareDevil ★ 4.4k

score 2 · Answer 2 · 2017-10-11

2

Entering edit mode

7.7 years ago

Devon Ryan 105k

Below is the function I tend to use to read in multiple featureCounts outputs (one per sample):

DESeqDataSetFromFeatureCounts <- function (sampleTable, directory = ".", design, ignoreRank = FALSE, ...) 
{
    if (missing(design)) 
        stop("design is missing")
    l <- lapply(as.character(sampleTable[, 2]), function(fn) read.table(file.path(directory, fn), skip=2))
    if (!all(sapply(l, function(a) all(a$V1 == l[[1]]$V1)))) 
        stop("Gene IDs (first column) differ between files.")
    tbl <- sapply(l, function(a) a$V7)
    colnames(tbl) <- sampleTable[, 1]
    rownames(tbl) <- l[[1]]$V1
    rownames(sampleTable) <- sampleTable[, 1]
    dds <- DESeqDataSetFromMatrix(countData = tbl, colData = sampleTable[, 
        -(1:2), drop = FALSE], design = design, ignoreRank, ...)
    return(dds)
}

You'll note that it's essentially identical to DESeqDataSetFromHTSeqCount() or whatever that's called, just tweaked slightly to work with featureCounts output. You'll need a sampleTable, just as with the standard functions built into DESeq2.

ADD COMMENT • link 7.7 years ago by Devon Ryan 105k

0

Entering edit mode

Thanks @Devon. Can you please explain ths "You'll need a sampleTable, just as with the standard functions built into DESeq2." Should i just use this list.files() function and have all the .txt files saved in myfiles variable . And then pass in the function?? myfiles=list.files(pattern = "*.txt" )

ADD REPLY • link 7.7 years ago by HK ▴ 40

0

Entering edit mode

help(DESeqDataSetFromHTSeqCount)