Question

Subset data from data frame

0

Entering edit mode

8.6 years ago

1769mkc ★ 1.3k

I want to subset my data frame using some logical conditon but at the same time I want to keep the column which contains my gene name, so the rows with with values which are totally 0 should be removed from data frame after I subset them can i m doing it but im not sure if im doing it correctly .

gene     value_1     value_2
<chr>       <dbl>       <dbl>


ERCC-00003 2.17523e+02 2.62037e+02
ERCC-00004 1.54077e+03 1.89043e+03
ERCC-00009 1.07257e+02 1.31688e+02
ERCC-00012 4.08964e-02 0.00000e+00
ERCC-00013 1.95994e-01 1.92254e-01
ERCC-00014 6.20654e-01 5.46050e-01
ERCC-00016 0.00000e+00 0.00000e+00
ERCC-00017 0.00000e+00 2.61275e-02
ERCC-00019 4.05462e+00 5.89595e+00

Can any one tell me how do I do that? help would be highly appreciated

R • 2.4k views

ADD COMMENT • link updated 8.6 years ago by jonasmst ▴ 420 • written 8.6 years ago by 1769mkc ★ 1.3k

score 1 · Answer 1 · 2016-12-22

1

Entering edit mode

8.6 years ago

jonasmst ▴ 420

You can use the > operator to keep only values above 0. Assuming that your dataframe is called DF:

DF <- DF[DF$value_2 > 0,]

EDIT: To keep rows where both columns are above 0, you can combine two expressions:

DF <- DF[DF$value_2 > 0 | DF$value_1 > 0,]

EDIT2: Here's the full example, including printouts. Creating the data frame:

gene <- c("ERCC-0003", "ERCC-0004", "ERCC-0009", "ERCC-00012", "ERCC-00013", "ERCC-00014", "ERCC-00016", "ERCC-00017", "ERCC-00019")
value_1 <- c(2.17523e+02, 1.54077e+03, 1.07257e+02, 4.08964e-02, 1.95994e-01, 6.20654e-01, 0.00000e+00, 0.00000e+00, 4.05462e+00)
value_2 <- c(2.62037e+02, 1.89043e+03, 1.31688e+02, 0.00000e+00, 1.92254e-01, 5.46050e-01, 0.00000e+00, 2.61275e-02, 5.89595e+00)

DF <- data.frame(gene, value_1, value_2)
DF

Output:

        gene     value_1     value_2
1  ERCC-0003 2.17523e+02 2.62037e+02
2  ERCC-0004 1.54077e+03 1.89043e+03
3  ERCC-0009 1.07257e+02 1.31688e+02
4 ERCC-00012 4.08964e-02 0.00000e+00
5 ERCC-00013 1.95994e-01 1.92254e-01
6 ERCC-00014 6.20654e-01 5.46050e-01
7 ERCC-00016 0.00000e+00 0.00000e+00
8 ERCC-00017 0.00000e+00 2.61275e-02
9 ERCC-00019 4.05462e+00 5.89595e+00

Keeping only rows with values above 0 in value_1 and value_2:

DF <- DF[DF$value_2 > 0 | DF$value_1 > 0,]
DF

Output:

        gene     value_1     value_2
1  ERCC-0003 2.17523e+02 2.62037e+02
2  ERCC-0004 1.54077e+03 1.89043e+03
3  ERCC-0009 1.07257e+02 1.31688e+02
4 ERCC-00012 4.08964e-02 0.00000e+00
5 ERCC-00013 1.95994e-01 1.92254e-01
6 ERCC-00014 6.20654e-01 5.46050e-01
8 ERCC-00017 0.00000e+00 2.61275e-02
9 ERCC-00019 4.05462e+00 5.89595e+00

EDIT3: My bad, the previous code used the & operator, which seems (rather curious to me) to work as "OR", omitting rows where value_1 OR value_2 were above 0, resulting in the incorrect removal of e.g. ERCC-00017. I changed the code to use the | operator instead, which seems to do what you want, namely removing rows where both values are above 0. Having not used R before, this is really unintuitive behaviour to me, as it's the complete opposite of how every other programming language (that I know of) works.

ADD COMMENT • link 8.6 years ago by jonasmst ▴ 420

0

Entering edit mode

okay but I want it to use it for both values_1 and values_2 since one is control and other one is condition ,can I do it simultaneously ?

and if I do that can i still keep my column that is having my list of genes?since those are characters and I want then list of genes to be removed whose expression is 0 ? how do I do that?

ADD REPLY • link 8.6 years ago by 1769mkc ★ 1.3k

0

Entering edit mode

@krushnach80, I updated my answer to include how to filter both values_1 and values_2. However, I do not understand what you mean about the list of genes. The filtering method removes entries from all columns (i.e. gene, value_1, and value_2).

ADD REPLY • link 8.6 years ago by jonasmst ▴ 420

0

Entering edit mode

yes I got it I was little confused i did that and it removed the genes whose expressions are all 0. now I want to plot a heatmap from the above subset data ,using pheatmap , can you tell me how do I keep the name of all the genes in my heatmap .

ADD REPLY • link 8.6 years ago by 1769mkc ★ 1.3k

0

Entering edit mode

I'm afraid I can't help you with pheatmap, as I've never used it (frankly, I've never used R). I suggest you post another question asking about pheatmap, as it's another problem than this question. I'd also consider asking on Stack Overflow. Also, when phrasing your question, elaborate on what exactly you're looking for. It's still unclear to me what you mean by keeping the gene names. To me, it seems like you want to remove these entries, not keep them. EDIT: Previous code was erroneous, see EDIT3.

ADD REPLY • link 8.6 years ago by jonasmst ▴ 420

0

Entering edit mode

I would not recommend posting the same question on two forums. I would rather suggest OP to do some search for generating heatmaps from this kind of data (which is very easy to find with google). I strongly feel asking to solve the next step of analysis once the previous problem is solved is not a good idea unless there is a typical problem with the code.

ADD REPLY • link 8.6 years ago by venu 7.1k

0

Entering edit mode

I did plot with just the values i mean i took out both the columns converted them into a data matrix and plotted them into pheatmap .it makes the plot but without gene list i will try to make the plot , and let know if I have issues thank you for all your inputs

ADD REPLY • link 8.6 years ago by 1769mkc ★ 1.3k