Hi, I need to convert chromosomal location to gene name (preferably, HGNC nomenclature) in a txt file with 6 columns: From
1 chr1:183189 0 183189 G C
1 chr1:609407 0 609407 0 0
1 chr1:609434 0 609434 0 0
1 chr1:609435 0 609435 G G
to
1 genename 0 183189 G C
1 genename 0 609407 0 0
1 genename 0 609434 0 0
1 genename 0 609435 G G
Files are made in GRCH38. I looked in this forum for similar issue, but could not find appropriate solution. One option was with bedtools, but unfortunately, only bed files are suitable in that case. One more option included some outdated R packages, which I could not install.
Thank you!
use awk to convert reformat your input. Hint:
split($2,a,/[:]/);printf("%s\t%d\t%s\n",a[1],int(a[2])-1,a[2]);}'
Sorry, I tried that syntax, but get error..
Also, where can I get bed files with HGNC gene names? From UCSC Table Browser I can only download table with transcripts
First part of Alex Reynolds answer below gives you that.
My answer can provide multiple HGNC records for one gene that has multiple transcripts. More information would be needed to filter these for canonical transcripts. And one genomic position could overlap different genes. But
bedmap --echo-map-id-uniq
will remove duplicate gene names, in either case.