Substitution of multiple different words in a file
0
1
Entering edit mode
8.4 years ago
panda ▴ 10

Hey everybody

I'm pretty new to bioinformatics and I got a strange logic problem in a script. The goal is to just substitute all chromosome names from NC_... to chr1 etc.

#!/bin/sh
cat myfile.gff3>tmp
i=1
for ncNum in $(cat chr_NC_gi |awk '{ print $2 }'|grep "^NC");do      #collecting the terms that have to be exchanged
cat tmp|sed -E 's/$ncNum/chr${i}/' > tmp2
        cat tmp2 > tmp
        i=$(($i + 1))
done

I know that is a suboptimal solution for bigger files with too much of file reading (So I'll try a new solution with pyhton using a dictionary). However I'd like to know what my error was in the old approach. If someone spottes the error without too much of work that would be great!

Thanks for your help!

rna-seq • 1.3k views
ADD COMMENT
1
Entering edit mode

Well, at least my sed doesn't know "-E" (it does know "-e" however). Also, I think the single quotes in your sed make it literal, i.e. it's looking for $ncNum instead of whatever value the $ncNum variable holds (same with the replacement, look here). While "i=$(($i + 1))" is not wrong, you could replace it with "((i++))". "mv tmp2 tmp" would probably be faster than "cat tmp2 > tmp" since (I think) it would just change the filename parameter in the file system instead of actually creating a new copy of the data (not sure this is true for all file systems).

ADD REPLY

Login before adding your answer.

Traffic: 1644 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6