I have a directory with fasta files and these files have headers like this
> ID:WP_070393975.1 | [Moorea producens PAL-8-15-08-1] | PAL-8-15-08-1 | hypothetical protein | 351 | NZ_CP017599(9673108):5662931-5663281:-1 ^^ Moorea producens PAL-8-15-08-1 chromosome, complete genome.
First I wanted only the ID (WP_07039397531 for example) for each file, and then I did it with the next code line
for file in *.fna; do cut -d '|' -f1 $file | grep ">" | sed 's/ID/ /g' | sed 's/[:>]//g' > "${file/.fna/_ids.txt}"; done
and I get the a list like the following, I would like to replace the number before ".1 " by "[0-9]"
WP_012167065.1
WP_015214247.1
WP_015083735.1
WP_035159822.1
WP_096595623.1
WP_096613742.1
WP_096613838.1
WP_096694933.1
WP_015201116.1
WP_015173923.1
ADB95635.1
The output will be the next list_ids.txt
WP_01216706[0-9].1
WP_01521424[0-9].1
WP_01508373[0-9].1
WP_03515982[0-9].1
WP_09659562[0-9].1
WP_09661374[0-9].1
and then I want to do a grep with the next code line
for file in *.gbk; do cat list_ids.txt | while read line; do grep -B 2 "$line" "$file"; done ; done
I hope you can help me.
Just add another sed command to your first long pipe to do something like
s/./[0-9]./g
?You may need to backslash escape the square brackets because they have a special meaning to sed.
output:
input:
That output is not what the OP is looking for cpad. It needs to have the string
'[0-9]'
prepended before the period is all.jrj.healey You are right. Amended code below:
input remains the same as OP above.