Hi there,
How can I import around 100 RSEM result files to R and combine their TPM values(column names should be file name) to single matrix by simple code?
Hi there,
How can I import around 100 RSEM result files to R and combine their TPM values(column names should be file name) to single matrix by simple code?
This will do it.
DF = do.call(cbind,
lapply( list.files(pattern=".*genes.results"),
FUN=function(x) {
aColumn = read.table(x,header=T)[,c("gene_id", "TPM")];
colnames(aColumn)[2] = x;
aColumn;
}
)
)
DF = DF[,!duplicated(colnames(DF))]
Result:
gene_id GSM2537147.genes.results GSM2537148.genes.results
1 ENSMUSG00000000001 31.44 29.18
2 ENSMUSG00000000003 0.00 0.00
3 ENSMUSG00000000028 1.30 1.93
4 ENSMUSG00000000031 0.82 0.32
5 ENSMUSG00000000037 0.71 0.43
6 ENSMUSG00000000049 0.29 0.71
GSM2537149.genes.results GSM2537150.genes.results GSM2537151.genes.results
1 32.22 30.51 28.42
2 0.00 0.00 0.00
3 0.04 2.17 1.34
4 0.00 0.39 0.05
5 0.66 0.72 0.53
6 0.00 1.33 0.41
GSM2537152.genes.results GSM2537153.genes.results GSM2537154.genes.results
1 34.46 28.95 32.44
2 0.00 0.00 0.00
3 2.95 1.46 1.34
4 0.18 0.74 0.00
5 0.43 0.50 0.34
6 0.14 0.72 0.38
GSM2537155.genes.results GSM2537156.genes.results GSM2537157.genes.results
1 27.64 30.24 26.87
2 0.00 0.00 0.00
3 1.96 2.20 1.40
4 0.13 0.19 0.44
5 0.76 1.46 0.43
6 0.83 0.30 0.95
GSM2537158.genes.results GSM2537159.genes.results GSM2537160.genes.results
1 27.96 29.52 28.74
2 0.00 0.00 0.00
3 2.01 1.18 1.81
4 0.19 0.25 0.35
5 0.42 0.88 0.67
6 0.25 0.27 0.41
GSM2537161.genes.results
1 31.17
2 0.00
3 2.24
4 0.11
5 0.40
6 0.83
A general solution will be something along the following lines
# create an empty dataframe
data<-NULL
# iterate through file names
for (f in c("file1","file2")){
# open each file
file<-read.table(f)
# append specific column of the file to the dataframe
data<-cbind(data,file[,1])
}
#rename column names
colnames(data)<-c("file1","file2")
If there are many files, you can write their name into a separate file, and read the names from that file.
just a small comment concerning your code. It's not a good practice to put a cbind within a loop (not very effective). It's faster to create a list ( data <- list()
) before the loop. Then replace data<-cbind(data,file[,1])
by data[[i]] <- file[,1]
and do a data <- do.call(cbind,data)
after the loop.
Paste all the files side by side, then import:
library(data.table)
myData <- fread("paste *.genes.results")
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Make the most of the
file.list()
command, and then import them via afor
orforeach
loop.foreach
can be parallelised when used with the%dopar%
operator. The actual command to read in could beread.table()
orfread()
.Then, Bob's your uncle.