Hi everyone,
I recently downloaded the latest dbSNP VCF and when opening the file, I noticed the #CHROM column is filled with RefSeq ID instead of chr1, chr2 or so on. Here is how the VCF looks like:
#CHROM POS ID REF ALT QUAL FILTER INFO
NC_000001.11 10019 rs775809821 TA T . . RS=775809821;dbSNPBuildID=144;SSR=0;PSEUDOGENEINFO=DDX11L1:100287102;VC=INDEL
NC_000001.11 10039 rs978760828 A C . . RS=978760828;dbSNPBuildID=150;SSR=0;PSEUDOGENEINFO=DDX11L1:100287102;VC=SNV
NC_000001.11 10043 rs1008829651 T A . . RS=1008829651;dbSNPBuildID=150;SSR=0;PSEUDOGENEINFO=DDX11L1:100287102;VC=SNV
I would like to convert the RefSeq ID to its corresponding chromosome number:
#CHROM POS ID REF ALT QUAL FILTER INFO
chr1 10019 rs775809821 TA T . . RS=775809821;dbSNPBuildID=144;SSR=0;PSEUDOGENEINFO=DDX11L1:100287102;VC=INDEL
chr1 10039 rs978760828 A C . . RS=978760828;dbSNPBuildID=150;SSR=0;PSEUDOGENEINFO=DDX11L1:100287102;VC=SNV
chr1 10043 rs1008829651 T A . . RS=1008829651;dbSNPBuildID=150;SSR=0;PSEUDOGENEINFO=DDX11L1:100287102;VC=SNV
Is a tool or script available that can convert the RefSeq ID? Thank you in advance.
Huge thanks to you! This is so essential for anyone who wants to use dbSNP VCF, yet nothing was mentioned in the NCBI documentation!
Hi @rrbutleriii, How do you manage to use multiple threads (in bedtools) to process that single task?
Hi rrbutleriii, brilliant answer- 4.4 years later and it is still an issue (also the documentation is still unhelpful). Adding here an update for the most recent version, to get chromosome names as chrN rather than N: