Question

Take out One percent of the tops of numbers from the column in the table

0

Entering edit mode

6.6 years ago

mostafarafiepour ▴ 180

Hi All Dear,

I have a table containing seven columns. So, i want to take out One percent of the tops of numbers from Z_W_FST column and have it as a file.

CHROM BIN_START BIN_END N_VARIANTS WEIGHTED_FST  MEAN_FST    Z_W_FST
1         1   40000        423    0.0432044 0.0385187 0.26855435
1     20001   60000        436    0.0330460 0.0308111 0.03553729
1     40001   80000        421    0.0356009 0.0337698 0.09414251
1     60001  100000        371    0.0398566 0.0369522 0.19176130
1     80001  120000        384    0.0540215 0.0477625 0.51668091
1    100001  140000        370    0.0602620 0.0462757 0.6598277

What is the best idea?

SNP R • 2.0k views

ADD COMMENT • link updated 6.6 years ago by zx8754 12k • written 6.6 years ago by mostafarafiepour ▴ 180

0

Entering edit mode

Please clarify what you mean by

One percent of the tops of numbers from Z_W_FST

ADD REPLY • link 6.6 years ago by zx8754 12k

0

Entering edit mode

What is the best idea?

What? Have? You? Tried?

ADD REPLY • link 6.6 years ago by WouterDeCoster 48k

0

Entering edit mode

6.6 years ago

maxime.policarpo ▴ 200

I think you could transform your column into a vector ( Z_W_FST=as.vector(table['Z_W_FST']) ) Then sort the vector and take the N top values you are interested in by sorting the vector -> sort(Z_W_FST, decreasing = TRUE)[1:N]

So if you have 2000 values and you want one percent of the highest values, you will take the first 20 number etc..

ADD COMMENT • link 6.6 years ago by maxime.policarpo ▴ 200

0

Entering edit mode

many thanks for your reply,

Please allow me to correct my question so that, if in the Z_W_FST column, the tops of numbers is 3 and higher than 3, How can i take out one percent of the tops of numbers?

ADD REPLY • link 6.6 years ago by mostafarafiepour ▴ 180

1

Entering edit mode

Correct me if i am wrong :

So you have a vector like this : (1,2,1,2,3,4,5,1,2,1,2,1,3,1,0,1,2,1,9,1,3)

You want to create a subset of numbers that are higher than 3 and then, take one percent of this highest values among them?

Then what i would to is to create the subset before taking the top 1% with a loop : subset_vector=c()

for (x in Z_W_FST){
   if (x>=3){
      subset_vector=c(subset_vector, x)
   }
 }

ADD REPLY • link 6.6 years ago by maxime.policarpo ▴ 200

0

Entering edit mode

What do you mean by "tops of numbers" ?

ADD REPLY • link 6.6 years ago by zx8754 12k

0

Entering edit mode

The largest numbers in the column Z_W_FST

ADD REPLY • link 6.6 years ago by mostafarafiepour ▴ 180

score 1 · Accepted Answer · 2018-11-14

1

Entering edit mode

6.6 years ago

zx8754 12k

Something like this, keep only rows that matches max of Z_W_FST, then get sample fraction, using set.seed(1) to make sampling reproducible:

library(dplyr)

set.seed(1); myData %>%
  filter(Z_W_FST == max(Z_W_FST)) %>% 
  sample_frac(0.01)

ADD COMMENT • link 6.6 years ago by zx8754 12k

0

Entering edit mode

Ok, many thanks.

Now, I have five tables that I want to connect the columns Z_W_FST to all the tables.

> head (x)
 CHROM BIN_START BIN_END N_VARIANTS WEIGHTED_FST  MEAN_FST    Z_W_FST
CM009840.1         1   40000        423    0.0432044 0.0385187 0.26855435
CM009840.1     20001   60000        436    0.0330460 0.0308111 0.03553729
CM009840.1     40001   80000        421    0.0356009 0.0337698 0.0941425
> head (K)
CHROM BIN_START BIN_END N_VARIANTS WEIGHTED_FST   MEAN_FST    Z_W_FST
CM009840.1         1   40000        354  0.005307040 0.00827061 -0.5935275
CM009840.1     20001   60000        370 -0.000567439 0.00141123 -0.7318390
CM009840.1     40001   80000        374  0.003076420 0.00491623 -0.6460463
> head (R)
CHROM BIN_START BIN_END N_VARIANTS WEIGHTED_FST  MEAN_FST   Z_W_FST
CM009840.1         1   40000        380    0.0374885 0.0378809 0.6056258
CM009840.1     20001   60000        393    0.0367538 0.0360242 0.5838103
CM009840.1     40001   80000        402    0.0416729 0.0410071 0.7298739
> head (N)
CHROM BIN_START BIN_END N_VARIANTS WEIGHTED_FST    MEAN_FST    Z_W_FST
CM009840.1         1   40000        330  -0.01317130 -0.00878714 -0.8229139
CM009840.1     20001   60000        349  -0.01877280 -0.01629740 -1.0109362
CM009840.1     40001   80000        366  -0.01627070 -0.01314870 -0.9269497
> head (M)
CHROM BIN_START BIN_END N_VARIANTS WEIGHTED_FST  MEAN_FST   Z_W_FST
CM009840.1         1   40000        327    0.0301674 0.0377050 0.2614902
CM009840.1     20001   60000        342    0.0254195 0.0327188 0.1296059
CM009840.1     40001   80000        344    0.0266245 0.0303194 0.1630776

What is your suggestion?

ADD REPLY • link 6.6 years ago by mostafarafiepour ▴ 180

0

Entering edit mode

connect the columns

Sorry, what? Do you wish to combine the values in the column across the objects? You can just c(x$Z_W_FST, K$Z_W_FST,...). Please invest some effort in trying to solve your problem as well as expressing yourself a little more clearly.

ADD REPLY • link 6.6 years ago by Ram 45k

0

Entering edit mode

I want to paste the columns Z_W_FST from all the tables.

For example, for the five tables above:

Z_W_FST
0.26855435
0.03553729
0.0941425
-0.5935275
-0.7318390
-0.6460463
0.6056258
0.5838103
.
.
.

ADD REPLY • link 6.6 years ago by mostafarafiepour ▴ 180

2

Entering edit mode

Read about rbind, and as a general advice, please invest some time to learn basics of R.

ADD REPLY • link 6.6 years ago by zx8754 12k