I'm trying to populate the sixth column of a .fam file, as outlined here:
https://www.cog-genomics.org/plink/1.9/formats#fam
The column has to be binary ('1' or '2') depending on the case/control status. ('1' is control, '2' is case)
I have the .fam file with all of my samples, but the case/control column is currently just flagged as 'missing' (coded as '-9'). So the head of the .fam file looks like this:
<sample_1> <sample_1> 0 0 0 -9
<sample_2> <sample_2> 0 0 0 -9
<sample_3> <sample_3> 0 0 0 -9
<sample_4> <sample_4> 0 0 0 -9
<sample_5> <sample_5> 0 0 0 -9
<sample_6> <sample_6> 0 0 0 -9
I have a separate file with a list of samples that I know are 'case'. So these samples need to be coded as '2' in the sixth column, and all the rest of the samples need to therefore be coded as '1' in the sixth column.
Head of my 'case' samples file:
<sample_2>
<sample_8>
<sample_34>
<sample_47>
...etc
Is there a quick way to do this in bash?
Related post: