You can use the >
operator to keep only values above 0. Assuming that your dataframe is called DF
:
DF <- DF[DF$value_2 > 0,]
EDIT:
To keep rows where both columns are above 0, you can combine two expressions:
DF <- DF[DF$value_2 > 0 | DF$value_1 > 0,]
EDIT2: Here's the full example, including printouts. Creating the data frame:
gene <- c("ERCC-0003", "ERCC-0004", "ERCC-0009", "ERCC-00012", "ERCC-00013", "ERCC-00014", "ERCC-00016", "ERCC-00017", "ERCC-00019")
value_1 <- c(2.17523e+02, 1.54077e+03, 1.07257e+02, 4.08964e-02, 1.95994e-01, 6.20654e-01, 0.00000e+00, 0.00000e+00, 4.05462e+00)
value_2 <- c(2.62037e+02, 1.89043e+03, 1.31688e+02, 0.00000e+00, 1.92254e-01, 5.46050e-01, 0.00000e+00, 2.61275e-02, 5.89595e+00)
DF <- data.frame(gene, value_1, value_2)
DF
Output:
gene value_1 value_2
1 ERCC-0003 2.17523e+02 2.62037e+02
2 ERCC-0004 1.54077e+03 1.89043e+03
3 ERCC-0009 1.07257e+02 1.31688e+02
4 ERCC-00012 4.08964e-02 0.00000e+00
5 ERCC-00013 1.95994e-01 1.92254e-01
6 ERCC-00014 6.20654e-01 5.46050e-01
7 ERCC-00016 0.00000e+00 0.00000e+00
8 ERCC-00017 0.00000e+00 2.61275e-02
9 ERCC-00019 4.05462e+00 5.89595e+00
Keeping only rows with values above 0 in value_1
and value_2
:
DF <- DF[DF$value_2 > 0 | DF$value_1 > 0,]
DF
Output:
gene value_1 value_2
1 ERCC-0003 2.17523e+02 2.62037e+02
2 ERCC-0004 1.54077e+03 1.89043e+03
3 ERCC-0009 1.07257e+02 1.31688e+02
4 ERCC-00012 4.08964e-02 0.00000e+00
5 ERCC-00013 1.95994e-01 1.92254e-01
6 ERCC-00014 6.20654e-01 5.46050e-01
8 ERCC-00017 0.00000e+00 2.61275e-02
9 ERCC-00019 4.05462e+00 5.89595e+00
EDIT3: My bad, the previous code used the &
operator, which seems (rather curious to me) to work as "OR", omitting rows where value_1
OR value_2
were above 0, resulting in the incorrect removal of e.g. ERCC-00017
. I changed the code to use the |
operator instead, which seems to do what you want, namely removing rows where both values are above 0. Having not used R before, this is really unintuitive behaviour to me, as it's the complete opposite of how every other programming language (that I know of) works.
okay but I want it to use it for both values_1 and values_2 since one is control and other one is condition ,can I do it simultaneously ?
and if I do that can i still keep my column that is having my list of genes?since those are characters and I want then list of genes to be removed whose expression is 0 ? how do I do that?
@krushnach80, I updated my answer to include how to filter both
values_1
andvalues_2
. However, I do not understand what you mean about the list of genes. The filtering method removes entries from all columns (i.e.gene
,value_1
, andvalue_2
).yes I got it I was little confused i did that and it removed the genes whose expressions are all 0. now I want to plot a heatmap from the above subset data ,using pheatmap , can you tell me how do I keep the name of all the genes in my heatmap .
I'm afraid I can't help you with pheatmap, as I've never used it (frankly, I've never used R). I suggest you post another question asking about pheatmap, as it's another problem than this question. I'd also consider asking on Stack Overflow. Also, when phrasing your question, elaborate on what exactly you're looking for. It's still unclear to me what you mean by keeping the gene names. To me, it seems like you want to remove these entries, not keep them. EDIT: Previous code was erroneous, see EDIT3.
I would not recommend posting the same question on two forums. I would rather suggest OP to do some search for generating heatmaps from this kind of data (which is very easy to find with google). I strongly feel asking to solve the next step of analysis once the previous problem is solved is not a good idea unless there is a typical problem with the code.
I did plot with just the values i mean i took out both the columns converted them into a data matrix and plotted them into pheatmap .it makes the plot but without gene list i will try to make the plot , and let know if I have issues thank you for all your inputs