Entering edit mode
7.0 years ago
Kenny
▴
30
Hi all,
I have a scaffold sequence named "oenopla_scaffold_112117.fa" and it has 192947 sequences.
The ID of the scaffolds are:
grep ">" oenopla_scaffold_112117.fa | head -5
>scaffold_0
>scaffold_1
>scaffold_2
>scaffold_3
>scaffold_4
And the length of the scaffolds are:
cat oenopla_scaffold_112117.fa | awk '$0 ~ ">" {print c; c=0;printf substr($0,2,100) "\t"; } $0 !~ ">" {c+=length($0);} END { print c; }' | head -6
scaffold_0 16608
scaffold_1 14918
scaffold_2 14554
scaffold_3 14024
scaffold_4 13894
What I want to do is the add the sequence length to my IDs, so the desire output will look like:
>scaffold_0_16608
>scaffold_1_14918
>scaffold_2_14554
>scaffold_3_14024
>scaffold_4_13894
...
>scaffold_192946_1500
How can I do this?
Best,
Kenny
It works perfectly. Thank you Pierre!