Hello,
I have two files has the calculation of nucleotide diversity (pi) for each population (I have two population) I am trying to get the ratio of PI between the two population ( PI pop1 / PI pop2), in each window and chromosome (they should have the same start in BIN_START
in another word, I want to have the ratio of Pi from the two files (divide Pi in file 1 over Pi in file 2) if the CHROM and BIN_start columns in file 1 = to CHROM and BIN_start in file 2
Example: Here is population one:
CHROM BIN_START BIN_END N_VARIANTS PI
Dexi_CM05836_chr01A 120001 170000 14 3.4207e-05
Dexi_CM05836_chr01A 130001 180000 54 0.000130133
Dexi_CM05836_chr01A 140001 190000 99 0.000356213
Dexi_CM05836_chr01A 150001 200000 162 0.000652928
Dexi_CM05836_chr01A 160001 210000 174 0.000690181
Dexi_CM05836_chr01A 170001 220000 179 0.000721402
Dexi_CM05836_chr01A 180001 230000 244 0.00111935
Dexi_CM05836_chr01A 190001 240000 369 0.0016301
Dexi_CM05836_chr01A 200001 250000 402 0.0017491
Dexi_CM05836_chr01A 210001 260000 454 0.00205598
Dexi_CM05836_chr01A 220001 270000 498 0.00227197
CHROM BIN_START BIN_END N_VARIANTS PI
Dexi_CM05836_chr01A 20001 70000 1 1.20063e-07
Dexi_CM05836_chr01A 30001 80000 1 1.20063e-07
Dexi_CM05836_chr01A 40001 90000 1 1.20063e-07
Dexi_CM05836_chr01A 50001 100000 1 1.20063e-07
Dexi_CM05836_chr01A 60001 110000 1 1.20063e-07
Dexi_CM05836_chr01A 120001 170000 17 2.30029e-05
Dexi_CM05836_chr01A 130001 180000 51 9.37882e-05
Dexi_CM05836_chr01A 140001 190000 61 0.00010983
I am not sure how I would do that in bash ( might be for loop and awk) or R. How I should create an identifier? I need help
Thanks,
That's amazing! Thank you very much. It came to my mind another idea using classical bash command lines:
1- sort files:
awk 'BEGIN{FS="\t"}{print $1"-"$2"\t"$0}' < filename1 | sort -k1,1 > output1.sorted
awk 'BEGIN{FS="\t"}{print $1"-"$2"\t"$0}' < filename3 | sort -k1,1 > output3.sorted
2-join files col 1 and 2 from both file will be considered to create the new file:
join -j1 -o1.2,1.3,1.4,1.6,2.6 output1.sorted output2.sorted > joined_output1_2.txt
awk '{printf "%s\t%s\t%s\t%f\n", $1, $2, $3, $5/$4}' < joined_output1_2 > output_ratio.txt