To keep the model of minimum p-value from the file
1
0
Entering edit mode
4.8 years ago
khn ▴ 130

Hello

My file lists a millions of SNPs with their p-values by additive, dominant, and recessive models. I want to keep the SNPs with a model for which the p-value is the lowest.

Original file.txt

ID,y,z,P-value
rs1,a,b,0.22
rs1,a,b,0.35
rs1,a,b,0.45
rs2,c,d,0.06
rs2,c,d,0.20
rs2,c,d,0.46
rs3,e,f,0.002
rs3,e,f,0.98
rs3,e,f,1.0

The file.txt that I want is...

ID,y,z,P-value
rs1,a,b,0.22
rs2,c,d,0.06
rs3,e,f,0.002

How to do this? Thanks!

SNP gene • 1.0k views
ADD COMMENT
2
Entering edit mode

since the property of p-value is uniformness under the null hypothesis and it will be totally broken by chosin the min from 3 models, I worn you not to do so - you won't have any basis to do the further analysis after such selection

ADD REPLY
0
Entering edit mode

Thank you for your comment. I just wanted to draw a manhattan plot based on this. But I think I should just show three manhattan plots, based on the model.

ADD REPLY
0
Entering edit mode

Yeap, Manhattan plot would not make any sense if you choose the min =/ 3 plots sounds good!

ADD REPLY
1
Entering edit mode
ADD REPLY
0
Entering edit mode
4.8 years ago
khn ▴ 130
tail -n +2 original_file > txt sort -t, -k 4g txt | awk -F, '!visited[$1]++' | sort -k2,2 -k3,3 >> final_file
ADD COMMENT
0
Entering edit mode

Please format your code, and add some description on what each step is doing.

ADD REPLY

Login before adding your answer.

Traffic: 2610 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6