Hi all, I need your help.
I am trying to go from ucsc name convention to the ensembl one. I have a .bed file with an annotation and I have a .txt file with the convention equivalence in 2 different columns. The files look like:
.bed
chr2L 324 453
chr3R 65433 73563
chr4 5345 9854
... etc.txt equivalence
chr2L 2L
chr3L 3L
chr4 4L
... etc
I know you can use sed 's/chr2L/2L/g'
for replacing the patterns. However, doing it for all the chromosomes and scaffolds (approximately 2000 different ones) is not feasible.
I am looking for a script (I don't mind the programming language) or a tool that works as:
Read the equivalence file, store the strings. Read the .bed file and be able to perform the string replacement in the chromosome field.
Thank you in advance, have a great day! Best,
Jordi
using tsv-utils :
with awk:
Tank you as well! AWK is awesome and terribly powerful. I will spend some time and try to master it