Take out One percent of the tops of numbers from the column in the table
2
0
Entering edit mode
6.0 years ago

Hi All Dear,

I have a table containing seven columns. So, i want to take out One percent of the tops of numbers from Z_W_FST column and have it as a file.

CHROM BIN_START BIN_END N_VARIANTS WEIGHTED_FST  MEAN_FST    Z_W_FST
1         1   40000        423    0.0432044 0.0385187 0.26855435
1     20001   60000        436    0.0330460 0.0308111 0.03553729
1     40001   80000        421    0.0356009 0.0337698 0.09414251
1     60001  100000        371    0.0398566 0.0369522 0.19176130
1     80001  120000        384    0.0540215 0.0477625 0.51668091
1    100001  140000        370    0.0602620 0.0462757 0.6598277

What is the best idea?

SNP R • 1.6k views
ADD COMMENT
0
Entering edit mode

Please clarify what you mean by

One percent of the tops of numbers from Z_W_FST

ADD REPLY
0
Entering edit mode

What is the best idea?

What? Have? You? Tried?

ADD REPLY
1
Entering edit mode
6.0 years ago
zx8754 12k

Something like this, keep only rows that matches max of Z_W_FST, then get sample fraction, using set.seed(1) to make sampling reproducible:

library(dplyr)

set.seed(1); myData %>%
  filter(Z_W_FST == max(Z_W_FST)) %>% 
  sample_frac(0.01)
ADD COMMENT
0
Entering edit mode

Ok, many thanks.

Now, I have five tables that I want to connect the columns Z_W_FST to all the tables.

> head (x)
 CHROM BIN_START BIN_END N_VARIANTS WEIGHTED_FST  MEAN_FST    Z_W_FST
CM009840.1         1   40000        423    0.0432044 0.0385187 0.26855435
CM009840.1     20001   60000        436    0.0330460 0.0308111 0.03553729
CM009840.1     40001   80000        421    0.0356009 0.0337698 0.0941425
> head (K)
CHROM BIN_START BIN_END N_VARIANTS WEIGHTED_FST   MEAN_FST    Z_W_FST
CM009840.1         1   40000        354  0.005307040 0.00827061 -0.5935275
CM009840.1     20001   60000        370 -0.000567439 0.00141123 -0.7318390
CM009840.1     40001   80000        374  0.003076420 0.00491623 -0.6460463
> head (R)
CHROM BIN_START BIN_END N_VARIANTS WEIGHTED_FST  MEAN_FST   Z_W_FST
CM009840.1         1   40000        380    0.0374885 0.0378809 0.6056258
CM009840.1     20001   60000        393    0.0367538 0.0360242 0.5838103
CM009840.1     40001   80000        402    0.0416729 0.0410071 0.7298739
> head (N)
CHROM BIN_START BIN_END N_VARIANTS WEIGHTED_FST    MEAN_FST    Z_W_FST
CM009840.1         1   40000        330  -0.01317130 -0.00878714 -0.8229139
CM009840.1     20001   60000        349  -0.01877280 -0.01629740 -1.0109362
CM009840.1     40001   80000        366  -0.01627070 -0.01314870 -0.9269497
> head (M)
CHROM BIN_START BIN_END N_VARIANTS WEIGHTED_FST  MEAN_FST   Z_W_FST
CM009840.1         1   40000        327    0.0301674 0.0377050 0.2614902
CM009840.1     20001   60000        342    0.0254195 0.0327188 0.1296059
CM009840.1     40001   80000        344    0.0266245 0.0303194 0.1630776

What is your suggestion?

ADD REPLY
0
Entering edit mode

connect the columns

Sorry, what? Do you wish to combine the values in the column across the objects? You can just c(x$Z_W_FST, K$Z_W_FST,...). Please invest some effort in trying to solve your problem as well as expressing yourself a little more clearly.

ADD REPLY
0
Entering edit mode

I want to paste the columns Z_W_FST from all the tables.

For example, for the five tables above:

Z_W_FST
0.26855435
0.03553729
0.0941425
-0.5935275
-0.7318390
-0.6460463
0.6056258
0.5838103
.
.
.
ADD REPLY
2
Entering edit mode

Read about rbind, and as a general advice, please invest some time to learn basics of R.

ADD REPLY
0
Entering edit mode
6.0 years ago

I think you could transform your column into a vector ( Z_W_FST=as.vector(table['Z_W_FST']) ) Then sort the vector and take the N top values you are interested in by sorting the vector -> sort(Z_W_FST, decreasing = TRUE)[1:N]

So if you have 2000 values and you want one percent of the highest values, you will take the first 20 number etc..

ADD COMMENT
0
Entering edit mode

many thanks for your reply,

Please allow me to correct my question so that, if in the Z_W_FST column, the tops of numbers is 3 and higher than 3, How can i take out one percent of the tops of numbers?

ADD REPLY
1
Entering edit mode

Correct me if i am wrong :

So you have a vector like this : (1,2,1,2,3,4,5,1,2,1,2,1,3,1,0,1,2,1,9,1,3)

You want to create a subset of numbers that are higher than 3 and then, take one percent of this highest values among them?

Then what i would to is to create the subset before taking the top 1% with a loop : subset_vector=c()

for (x in Z_W_FST){
   if (x>=3){
      subset_vector=c(subset_vector, x)
   }
 }
ADD REPLY
0
Entering edit mode

What do you mean by "tops of numbers" ?

ADD REPLY
0
Entering edit mode

The largest numbers in the column Z_W_FST

ADD REPLY

Login before adding your answer.

Traffic: 1685 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6