A possible script request for a task probably cannot be done in Excel
5
0
Entering edit mode
7.7 years ago
xiaoyonf ▴ 60

Dear all,

I have a dumb question that bothers me a lot. I have a set of genes with calculated ChIP-seq binding intensities. Notably, most of these genes have more than one value (i.e., because of multiple binding I thought). Now I want to clean these data by calculating the average of biding for each gene. The head of the data likes the following,

A2M -3.63
A2M -18.46
ACE 1.68
ACYP2   8.23
ACYP2   6.09
ADAMTS12    1.77
ADAMTS12    1.65
ADAMTS6 5.36
ADIPOQ  4.06
ADIPOQ  1.71
AGR2    13.71
AGR2    8.92
AGR2    5.05
AGR2    3.02
AGR2    2.23
AGR2    -4.22

I have these data in excel file. Obviously, it is annoying to do manually in Excel because of thousands of genes. How do I do this? I guess a simple script may help. But I have no experience in programming. Your help is much appreciated! Thank you! Xiaoyong Fu Houston, TX

excel • 3.1k views
ADD COMMENT
3
Entering edit mode

I hate excel but I think i can be easily done with "pivot table" isn't it ? https://support.office.com/en-us/article/Calculate-values-in-a-PivotTable-report-11f41417-da80-435c-a5c6-b0185e59da77

ADD REPLY
4
Entering edit mode

@Pierre recommending using excel for something. Well .. that is a first!

ADD REPLY
1
Entering edit mode

No I didn't ! No I didn't ! That wasn't me !

ADD REPLY
2
Entering edit mode

No worries everyone, I made a screenshot.

ADD REPLY
0
Entering edit mode

enter image description here

ADD REPLY
0
Entering edit mode

You are such a meme person!

ADD REPLY
0
Entering edit mode

Hi Pierre, Thank you so much! I think this is what I need to look for. I will check on it. BTW, is it possible to do so in R? Best, Xiaoyong

ADD REPLY
2
Entering edit mode

Everything can be done in Excel. Unless you go over 64k lines or whatever the latest Microsoft boss thinks is sufficient. But otherwise, everything.

ADD REPLY
7
Entering edit mode

I definitely love the convenience function of converting gene names to dates. Wouldn't know how to work without it!

ADD REPLY
0
Entering edit mode

The only thing you have to do when importing a file is to select the column containing the gene names and set data type to Text.

ADD REPLY
0
Entering edit mode

Yes, that I know. But it's an error which happens easily, silent and irreversible. That shouldn't be possible.

ADD REPLY
0
Entering edit mode

I agree, Excel is a powerful tool to work with. However, it takes some time to really grab its essence and logic. Since Excel 2007 tables can be "1,048,576 rows by 16,384 columns".

ADD REPLY
1
Entering edit mode

Hmm, I wasn't aware of that. But it seems that Excel is incapable of plotting more that 64k lines in a graph. It doesn't really matter to me how many you can put in a table, if you can't show them in a graph; data that can't be visualized may as well not exist.

"1,048,576 rows by 16,384 columns" is still completely arbitrary. And even if you can paste data in those fields, you can't use it in a graph because Microsoft either lacks skilled programmers or managers. There is no sane reason to restrict people to those limits. If I want to paste a billion lines into Excel, I should be able to do so (if I have sufficient RAM), and it baffles me why Microsoft is incapable of developing tools that can handle the needs of anyone outside of pedestrian users. They are stuck in the "640k is enough for anyone" mindset.

ADD REPLY
0
Entering edit mode

I love Excel for copying and pasting data into a sensible matrix so I can then plot it in something pretty (R - ggplot/plotly) ;) also if you're supervisor is among the less tech savvy, they inevitably want all experiments to produce and excel table they can play with

ADD REPLY
0
Entering edit mode

You say some of your genes have multiple values, but your example data doesn't. Can you give us a more representative example input file?

ADD REPLY
0
Entering edit mode

R dplyr: group_by -> summarise

ADD REPLY
3
Entering edit mode
7.7 years ago

Assuming the tab delimited file:

sort -k1,1 in.txt | groupBy -g 1 -c 2 -o mean > out.txt

out.txt:

A2M -11.045
ACE 1.68
ACYP2   7.16
ADAMTS12    1.71
ADAMTS6 5.36
ADIPOQ  2.885
AGR2    4.785

groupBy from bedtools

ADD COMMENT
0
Entering edit mode

Hi Goutham, This looks so cool! I will try it definitely. Thank you so much! Xiaoyong

ADD REPLY
0
Entering edit mode

The bedtools groupby command works beautifully to perform such work. Thank you for quick answer! Xiaoyong

ADD REPLY
1
Entering edit mode
7.7 years ago
steve ★ 3.5k

You can also do this easily in R.

# reading in the data; insert your data import method here:
gene_stats <- "gene\tvalue
A2M\t-3.63
A2M\t-18.46
ACE\t1.68
ACYP2\t8.23
ACYP2\t6.09
ADAMTS12\t1.77
ADAMTS12\t1.65
ADAMTS6\t5.36
ADIPOQ\t4.06
ADIPOQ\t1.71
AGR2\t13.71
AGR2\t8.92
AGR2\t5.05
AGR2\t3.02
AGR2\t2.23
AGR2\t-4.22
"
gene_stats_df_df <- read.table(textConnection(gene_stats), header = TRUE, sep = '\t')

# get the mean for each gene in the table
aggregate(value ~ gene, data = gene_stats_df_df, FUN = mean)

output:

      gene   value
1      A2M -11.045
2      ACE   1.680
3    ACYP2   7.160
4 ADAMTS12   1.710
5  ADAMTS6   5.360
6   ADIPOQ   2.885
7     AGR2   4.785
ADD COMMENT
1
Entering edit mode
7.7 years ago
aquaq ▴ 40

You can also copy the first column (A) to another column(D), then apply remove duplicates to this column. Then, in the E1 cell, you only have to use =AVERAGEIF(A:A, D1, B:B). With double click on the lower right corner of the E1 cell, you can copy this function for all the E column.

ADD COMMENT
0
Entering edit mode

Thank you, aquaq. This is I thought the quickest way to solve my problem!

ADD REPLY
1
Entering edit mode
7.7 years ago
arnstrm ★ 1.9k

In Excel, select the table, go to Inset, PivotTable, choose the location (Existing Worksheet and select a blank cell). A side bar section PivotTable Fiedls will appear. Drag the Gene field name to Rows and Value to the values. The default will be the sum of values for each gene. To change it click on the drop down menu and then value field settings and choose average

(suggesting only because Pierre mentioned this as the option :P)

screenshot

ADD COMMENT
0
Entering edit mode
7.7 years ago
theobroma22 ★ 1.2k

X <- aggregate ( x, data, mean)

ADD COMMENT

Login before adding your answer.

Traffic: 2231 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6