Hello,
I have been fumbling around to become familiar with the set-all-var-ids command in plink 2.0 to update rsids in a .bim file based on chrome end position but it doesn't seem to be working. Would somebody point me towards a better direction? I have tried....
system("./plink2 --bim originalFile.bim --set-all-var-ids @_# --make-just-bim -out newBim")
...but it seems to overwrite the rsid column with the chrome end positions so the newBim file now has 2 chrome end position columns with no rsid column. I have tried swapping the columns of both the file containing the updated rsids and chrome end positions and I have tried swapping the columns of the original bim file. The first swap doesn't change how --set-all-var-ids updates. The second swap doesn't do anything as it forces the column swapped .bim file back into its original column order.
Additionally, I have tried plink 1.9's command: system("./plink --bim oldBim --update-name updateRSIDsChrome.txt 2 4 --make-just-bim -out newBim")
...but it's not updating any rsids. Again any help would be greatly appreciated.
Thank you. So I have isolated the chromosome, rsids and chromosome positions from the original bim file into its own file (I'm not sure if that is the step 1 you suggested) and have extracted the corresponding chromosomes, rsids and chromosome positions from the updated database from ucsc.
And I have used awk to join the two files on chromosome and chromosome end positions (string concatenated) into its own file. awk 'FNR==NR{a[$1]=$2 FS $3;next}{ print $0, a[$1]}' updated.txt fromOriginalBim.txt > combined.txt .... which did not produce 3 columns.
So as a sanity check, I checked for matches between chromosome with chromosome end positions between the two files for the first 20 rows. I have not found a single match of chromosome with end position overlap between the two files. Would you know of what I may be doing wrong as the number of rsids in the bim file is quite small compared to the ucsc file? In addition to RSIDs in the bim id column there are ids with common variant numbers with a preface of 'CNVI' and ids with a preface of 'MITO'. I am new to genetic data preprocessing so forgive me if this is a novice mistake.