Hi guys!
I have a file like this one (obtained from bed file using awk):
scaffold00002b 209798 209823 HWUSI-EAS1825_0024_FC:8:1:6009:1105
scaffold00002b 209802 209838 HWUSI-EAS1825_0024_FC:8:1:6009:1105
scaffold00002d 43627 43652 HWUSI-EAS1825_0024_FC:8:1:8703:1105
scaffold00008e 22741 22767 HWUSI-EAS1825_0024_FC:8:1:14128:1104
scaffold00008e 22740 22768 HWUSI-EAS1825_0024_FC:8:1:14128:1104
(note that the rows 1-2 and 4-5 have the same record in the 4th field).
I wish to convert it to a tab file like this one:
HWUSI-EAS1825_0024_FC:8:1:6009:1105 scaffold00002b 209798 209823 scaffold00002b 209802 209838
HWUSI-EAS1825_0024_FC:8:1:8703:1105 scaffold00002d 43627 43652
HWUSI-EAS1825_0024_FC:8:1:14128:1104 scaffold00008e 22741 22767 scaffold00008e 22740 22768
in which the fields belonging to lines with the same records in $4 column are printed in a single row.
Since the rows with the same 4th field are always consecutive, I tried to test if the 4th field of the previous row is == to the same field of the actual row and to iterate this process over all the rows of my input file.
BUT...
unfortunately I have no idea on how to print the records of the actual row alongside the records of the previous row (if the "==" condition is satisfied).
Any idea?
Thanks in advance,
Luke
Dear Sean, thank you! I'm studing python, it's a very powerful language! I tried your script, but it works fine only for the first two record of my file. In the following rows it prints the first read, then it prints the second read twice in the following row. Is it a problem of iteration? Or variable substitution one?