Question

Substitution of multiple different words in a file

1

Entering edit mode

9.0 years ago

panda ▴ 10

Hey everybody

I'm pretty new to bioinformatics and I got a strange logic problem in a script. The goal is to just substitute all chromosome names from NC_... to chr1 etc.

#!/bin/sh
cat myfile.gff3>tmp
i=1
for ncNum in $(cat chr_NC_gi |awk '{ print $2 }'|grep "^NC");do      #collecting the terms that have to be exchanged
cat tmp|sed -E 's/$ncNum/chr${i}/' > tmp2
        cat tmp2 > tmp
        i=$(($i + 1))
done

I know that is a suboptimal solution for bigger files with too much of file reading (So I'll try a new solution with pyhton using a dictionary). However I'd like to know what my error was in the old approach. If someone spottes the error without too much of work that would be great!

Thanks for your help!

rna-seq • 1.4k views

ADD COMMENT • link 9.0 years ago by panda ▴ 10

1

Entering edit mode

Well, at least my sed doesn't know "-E" (it does know "-e" however). Also, I think the single quotes in your sed make it literal, i.e. it's looking for $ncNum instead of whatever value the $ncNum variable holds (same with the replacement, look here). While "i=$(($i + 1))" is not wrong, you could replace it with "((i++))". "mv tmp2 tmp" would probably be faster than "cat tmp2 > tmp" since (I think) it would just change the filename parameter in the file system instead of actually creating a new copy of the data (not sure this is true for all file systems).

ADD REPLY • link 9.0 years ago by 5heikki 11k