how to sum up the columns to remove the duplicated row names in RSEM output?
2
0
Entering edit mode
6.6 years ago
John ▴ 270

Hi ,

In a RSEM output table I have 64 columns and 24833 rows. In that I have some duplicate row names, I want to remove the duplicates by sum up those duplicated rows (corresponding all 64 columns), here row names are gene names and column names are sample name. I am new to R, can you please help me with R code for this.

> all <-read.table(file="tpmat.xls",header=T)
> dim(all)
[1] 24833    64
R RNA-Seq • 24k views
ADD COMMENT
1
Entering edit mode

How to sum up the duplicated value while keep the other columns?

Play with suggestions in this thread. It should work.

ADD REPLY
4
Entering edit mode
6.6 years ago

Using dplyr you can use group_by and summarise_all.

Here's an example :

require(dplyr)

> a
# A tibble: 7 x 4
  gene  sample1 sample2 sample3
  <chr>   <int>   <int>   <int>
1 A           1       1       1
2 B           1       1       1
3 B           1       1       1
4 C           1       1       1
5 C           1       1       1
6 C           1       1       1
7 D           1       1       1

    a %>% 
     group_by(gene) %>% 
     summarise_all(funs(sum))

# A tibble: 4 x 4
      gene  sample1 sample2 sample3
      <chr>   <int>   <int>   <int>
    1 A           1       1       1
    2 B           2       2       2
    3 C           3       3       3
    4 D           1       1       1
ADD COMMENT
0
Entering edit mode

thanks alot, it helped

ADD REPLY
2
Entering edit mode
6.6 years ago
Zhilong Jia ★ 2.2k

The duplicated rownames are not allowed in the object of read.table got actaully.

The main idea is use dplyr::group_by, which gets the duplicated column group-wisely and dplyr::summarise_all(sum), which sums all values in group.

Example code as the following:

# rowname_duplicated is the colname you mentioned.    
dplyr::group_by(all, rowname_duplicated) %>% dplyr::summarise_all(sum)
ADD COMMENT

Login before adding your answer.

Traffic: 1737 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6