Entering edit mode
2.1 years ago
genomes_and_MGEs
▴
10
Hi there,
I have a file with two columns as follows:
Yersinia_ruckeri_GCA_008086925.1.fna ./Yersinia_ruckeri_GCA_008086925.1.fna
Yersinia_ruckeri_GCA_017498685.1.fna ./Yersinia_ruckeri_GCA_017498685.1.fna
Yersinia_pestis_E_GCA_013421535.1.fna ./Yersinia_pestis_E_GCA_013421535.1.fna
Yersinia_pestis_E_GCA_013421545.1.fna ./Yersinia_pestis_E_GCA_013421545.1.fna
Yersinia_pestis_E_GCA_015274745.1.fna ./Yersinia_pestis_E_GCA_015274745.1.fna
...
My goal is to split the file into multiple individual files corresponding to each bacterial species, each one containing only the file names (col1) and file paths (col2) for the corresponding species. In the example above, I would like to create two files, one named Yersinia_ruckeri.txt, containing the two entries
Yersinia_ruckeri_GCA_008086925.1.fna ./Yersinia_ruckeri_GCA_008086925.1.fna
Yersinia_ruckeri_GCA_017498685.1.fna ./Yersinia_ruckeri_GCA_017498685.1.fna
And another file named Yersinia_pestis_E.txt, with the three entries
Yersinia_pestis_E_GCA_013421535.1.fna ./Yersinia_pestis_E_GCA_013421535.1.fna
Yersinia_pestis_E_GCA_013421545.1.fna ./Yersinia_pestis_E_GCA_013421545.1.fna
Yersinia_pestis_E_GCA_015274745.1.fna ./Yersinia_pestis_E_GCA_015274745.1.fna
Thanks in advance!
While someone will give you a ready answer .... have you thought of
cut
ting the columns out and thengrep
separating the file names. You can then ultimatelypaste
theruckeri
andpestis
files back together if you actually wanted the two columns. Not sure what is the significance of two columns.Thanks for the quick reply. The thing is that I have multiple species with an uneven number of columns before 'GCA'. In the example above, the species names have two and three columns (Yersinia_ruckeri and Yersinia_pestis_E), respectively. I would have to find a way to group entries in the large file and split into multiple files according to the columns shown before 'GCA'.