Hi, I'd like to write a loop to extract individuals from my PLINK.fam file based on the fam ID / population code into different .txt. files just using bash. I'm pretty stumped so would appreciate any suggestions.
For example, this is part of my dataset:
PUR HG01247 0 0 0 1
PUR HG01248 0 0 0 1
CLM HG01250 0 0 0 1
CLM HG01251 0 0 0 1
CLM HG01253 0 0 0 1
CLM HG01254 0 0 0 1
CLM HG01256 0 0 0 1
CLM HG01257 0 0 0 1
CLM HG01259 0 0 0 1
CLM HG01260 0 0 0 1
CLM HG01269 0 0 0 1
CLM HG01271 0 0 0 1
CLM HG01272 0 0 0 1
CLM HG01275 0 0 0 1
CLM HG01277 0 0 0 1
CLM HG01280 0 0 0 1
CLM HG01281 0 0 0 1
CLM HG01284 0 0 0 1
PUR HG01286 0 0 0 1
PUR HG01302 0 0 0 1
PUR HG01303 0 0 0 1
PUR HG01305 0 0 0 1
PUR HG01308 0 0 0 1
PUR HG01311 0 0 0 1
PUR HG01312 0 0 0 1
PUR HG01323 0 0 0 1
PUR HG01325 0 0 0 1
PUR HG01326 0 0 0 1
GBR HG01334 0 0 0 1
What I'd like to do is is extract the second column of IDs for each population separately and create a .txt file of just the IDs. E.g. in the PUR.txt file, you would just see
HG01247
HG01248
HG01286
HG01302
HG01303
HG01305
HG01308
HG01311
HG01312
HG01323
HG01325
HG01326
In the GBR.txt file, you'd just see this:
HG01334
I figured how to do it manually with
grep 'PUR' filename.fam | awk '{print $2}' > PUR.txt
However, I can't wrap my head around writing a loop to automate this (and it's necessary because I am dealing with lots of populations)
Thank you for your help!