Question

featurecounts output changes the underscores in the sample name into dots

0

Entering edit mode

5.5 years ago

wangdp123 ▴ 340

Hi there,

In the latest version Rsubread package, I run the featurecounts analysis on a few samples but the output has changed the underscores in the sample names into dots, which is very inconvenient.

For example,

the original bam file name: sample_1.bam

the new sample name: sample.1.bam

Is there any way to avoid this conversion?

Many thanks,

Tom

featurecounts • 1.6k views

ADD COMMENT • link updated 5.5 years ago by Gordon Smyth ★ 8.2k • written 5.5 years ago by wangdp123 ▴ 340

0

Entering edit mode

featureCounts() of Rsubread does not generate bam files but uses them as input.

You are probably using the align() function from the said package. If that is the case, you might want to check the output_file argument. Just change the default output_file = paste(readfile1,"subread",output_format,sep=".") to output_file = paste(readfile1,"subread",output_format,sep="_").

ADD REPLY • link 5.5 years ago by Haci ▴ 740

1

Entering edit mode

OP uses featureCounts and the issue is real, it replaces underscore by dot. The output (column) name in the count matrix is the name of the bam file. Will move this to comment.

ADD REPLY • link 5.5 years ago by ATpoint 88k

score 2 · Answer 1 · 2020-01-25

No, you can't avoid the conversion. featureCounts is trying to protect the names from systems that can't handle punctuation in variable names, but I agree it is not necessary here.

Of course you had to input the file names to featureCounts in the first place:

fc <- featureCounts(files, ...)

so you can easily put them back at the end:

colnames(fc$counts) <- files

I personally like to use

colnames(fc$counts) <- limma::removeExt(basename(files))

score 0 · Answer 2 · 2020-01-25

Why don't you simply grep the sample names from disk and then put them as colnames back into the output? Something like this:

tmp.files <- list.files(path = "~/Desktop/", pattern = ".bam", full.names = TRUE)

tmp.names <- sapply(strsplit(tmp.files, split="\\/"), function(x) rev(x)[1])

countmatrix<-featureCounts(files = tmp.files,
                           annot.ext = "~/Desktop/test.saf")

colnames(countmatrix$counts) <- c(tmp.names)
colnames(countmatrix$stat)   <- c("Status", tmp.names)