I have a data frame that I'm working within which I'd like to compare the contents inside two columns: PathwayName
and ExpressionData
. This comparison will be done across many rows (10,695,840 entries) using R language.
Here are the first few lines of my data frame where the contents inside are only separated by white space.
PathwayName ExpressionData
1 41bbPathway BLACK 215538_at 210671_x_at... 215538_at na 28.566616...
2 ace2Pathway BLACK 214533_at 215184_at... 215538_at na 28.566616...
3 acetPathway BLACK 215184_at 01502_s_at... 215184_at na 4.2084746...
4 achPathway BLACK 211570_s_at 215184_at... 215184_at na 4.2084746...
5 hoPathway BLACK 201968_at 214578_s_at... 201968_at na 472.4969...
As a final product, I want it to compare, copy and save into a new file where the output should be like this:
PathwayName ExpressionData
1 41bbPathway 215538_at 215538_at
2 acetPathway 215184_at 215184_at
3 achPathway 215184_at 215184_at
4 hoPathway 201968_at 201968_at
Everything that I'd done were failed because most of them compare by rows and not the contents inside.
Hope there are people who can help.
Thank you
You will have to specify what comparison you'd like to perform. Also, is it that your data frame is
Also, it is unclear what you want to get as a final product. Could you please clarify?
My data frame is like this, with only 2 columns and many rows.
The contents inside is only separated by a whitespace.
The final product that I want is, a new column which contain the pathway name and the expression data that is similar with the one inside pathway name like below.
It seems that you have two data sets and try to combine them in the most non-intuitive way.
As Sam already stated, the Expression data -which you say is one entry- is a set of Probe-ID and value entries.
The first part needs to be disentangled. You need an entry for each pathway-name probe-ID combination. Having then two tables you can get an join:
Pathway table:
Expression table
Sometimes you get per probeID a lot of values, than you have to proceed according to the underlying experiment. In case of x test and y control samples, you can compute the log fold-change, or differential expression. If it is only one underlying condition, you can compute the mean or median. Nevertheless, try to understand the data.
[Found a typo in the merge command]