Entering edit mode
6.1 years ago
dllopezr
▴
130
Hi everyone
I have a file like this
>NC_003037.1:453555-454448 Sinorhizobium meliloti 1021 plasmid pSymA, complete sequence
>NC_007493.2:2279220-2278345 Rhodobacter sphaeroides 2.4.1 chromosome 1, complete sequence
>NC_007952.1:1763831-1762950 Paraburkholderia xenovorans LB400 chromosome 2, complete sequence
>NC_005791.1:844089-844916 Methanococcus maripaludis strain S2, complete sequence
that I replace the first two sections to obtain this:
>NC_003037.1:ChrStart-454448 Sinorhizobium meliloti 1021 plasmid pSymA, complete sequence
>NC_007493.2:ChrStart-2278345 Rhodobacter sphaeroides 2.4.1 chromosome 1, complete sequence
>NC_007952.1:ChrStart-1762950 Paraburkholderia xenovorans LB400 chromosome 2, complete sequence
>NC_005791.1:ChrStart-844916 Methanococcus maripaludis strain S2, complete sequence
I've tried to replace the numbers between the "-" and the space before the species name with the word "ChrStop". I've tried sed
with [[:blank:]] [[:space:]] and /s
options in this way:
sed -i 's/-.*[[:blank:]]/-ChrStop[[:blank:]]/g' filetxt
But always the command replace beyond I want, for example:
>NC_003037.1:ChrStart-ChrStop[[:blank:]]sequence
>NC_007493.2:ChrStart-ChrStop[[:blank:]]sequence
>NC_007952.1:ChrStart-ChrStop[[:blank:]]sequence
>NC_005791.1:ChrStart-ChrStop[[:blank:]]sequence
Can you help me with the correct way to match the space and replace between this character and "-"?
thank you so much.
Hi @Kevin, thank you for your help!
If you don't mind, could you explain to me how this command works, especially this part
ChrStart\-[0-9]*[[:blank:]]
?What I really want to do is to pass this numbers to different variables, say "ChrStart" = $1 and ChrStop = $2 to pass to another command. The use of "ChrStart" in the above code will spoil this objective?
ChrStart
is taken literally\-
is taken as a hyphen (-
). The backslash escapes its metacharacter behavior (not required here as-
is a metacharacter only within character classes but better safe than sorry.[0-9]*
matches any length of numbers between 0 and 9[[:blank:]]
matches a blank spaceChrStart-100000
is thus broken into 4 matches like so: