Hi guys,
I have a specific problem about using awk or sed to split a big file to different files. The big file is like this format(3 columns):
C SRR1_45/1 data...
U SRR2_34/2 data...
U SRR1_33/2 data...
C SRR3_22/1 data...
....
I want to extract lines with SRR1 to SRR1.txt, lines with SRR2 to SRR2.txt ...lines with SRRn to SRRn.txt. And the output lines should remove 'SRRi_' symbol. But we don't how many n are there.
e.g. SRR1.txt will contain:
C 45/1 data...
U 33/2 data...
I know it's easy to write a python or perl script to do it. But is there a shell way to do it? taking the advantages of awk or sed. Let me add some details: I have 10 such big files to be extracted. And each has more than 1000M lines. So I need to find a efficient way. The n is random which is not from sequential array.
Thanks! Tao
Thanks Alex! Your answer is amazing, especially the parallel way you introduced to me. Thank you so much! Best, Tao