If I have a file where I want to replace something in the middle of the header and the file is in proper order or sequence i.e. peg no is sequential peg_1,peg_2,peg_3 and so on.
>S_griseus__Contig_0001__peg_1__799__35__negative
atgcatgc
>S_griseus__Contig_0001__peg_2__1655__3444__posetive
gtcgtacg
Previously I used seqkit replace to solve a similar issue it but this time I am not able to solve this issue because I couldn't understand the previous command fully.
The command was
seqkit replace -p '.+' -r 'Contig_{nr}' --nr-width 3 assembly.fasta -o renamed.fasta
And want to change the peg_1 to peg_0001 in this header? What command will help me solve it?
Thank you for your valuable time. Please let me know how can it be done. Please explain the command that you are using so that I can understand and learn it.
see if this is okay with you:
if you want exactly the way it was, but with padding, try this:
Since it's 10000, you need padding by 4 zeros, not 3.
Thank you for your valuable time and suggestion. The command works perfectly. Can you explain the command please?
Splits the lines starting with ">" into fields (columns) with "_ and __" as delimiters. 6th field/column (from splitting) is then padded with zeros and current 6th field is replaced with padded 6th field. Then awk prints entire file.
First replacement replaces all the _ with __ from 2nd occurrence till last occurrence. Second replacement replaces peg__ with peg_. Third replacement replaces Contig__ with Contig_
Thanks for the explanation