So I've moved this question to a new post. I am new to genetic data preprocessing so forgive me if this is a novice mistake.
I've been trying to update rsids on a bim file in ADNI based on chromosome and chromosome end positions.
I've grabbed the RSIDs and chrome and chrome end positions from the ADNI bim file and put them in a separate file.
I have additionally gone to UCSC and gotten every rsid with chrome and chrome positions via command:
curl -O https://hgdownload.gi.ucsc.edu/goldenPath/hg38/database/snp151Common.txt.gz
gunzip -c snp151Common.txt.gz | cut -f 2,4,5,12,17 | grep single.exact | cut -f 1-3 > onlySNPs.tsv
sort -k3 -u onlySNPs.tsv | sort -k1,2 -u > onlySNPs.uniqLocAndId.tsv
I have then tried to join the two files via chrome and chrome positions by concatenating the two fields in both files.
awk 'FNR==NR{a[$1]=$2 FS $3;next}{ print $0, a[$1]}' updatedonlySNPS.uniqLocAndId.txt fromOriginalBim.txt > combined.txt
Unfortunately, there seems to be no third column generated leading to the assumption that there is no overlap of chrome and chrome endposition. I tried a sanity check and searched for a match of the first 30 rows but indeed there are no matches. Does anybody know what I may be doing wrong? I have noticed that in addition to RSIDs in the bim id column there are ids with common variant numbers with a preface of 'CNVI' and ids with a preface of 'MITO'. Any help/education would be deeply appreciated.
use join https://linux.die.net/man/1/join