Hi everyone, I'm very newbie at programming, so my task is to transform this genetic data data:
START END GENE
69346001 69366001 SMN2
140222001 140240001 PCDHA1 PCDHA@ PCDHA2 PCDHA3 PCDHA4 PCDHA5 PCDHA6 PCDHA7 PCDHA8 PCDHA9 PCDHA10
Into this:
START END GENE
69346001 69366001 SMN2
140222001 140240001 PCDHA1
140222001 140240001 PCDHA@
140222001 140240001 PCDHA2
140222001 140240001 PCDHA3
140222001 140240001 PCDHA4
140222001 140240001 PCDHA5
140222001 140240001 PCDHA6
140222001 140240001 PCDHA7
140222001 140240001 PCDHA8
140222001 140240001 PCDHA9
140222001 140240001 PCDHA10
So, basically the scrip has to write each new gene name from column 4 to the last one in a new line, with their respective genomic START and END position... An awk or perl script would be nice, the input has ca. 250 lines containig lines with 3 or more gene names that need to be arranged in the way I just showed.
Thanks in advance
If you think I'm not triying to solve the problem myself, your are very wrong, I'm in a hurry so I just posted the problem, but if you need more information or PROFF that Im working too, this is the AWK line that i'm triying to use but unsuccesfully:
#! /bin/bash/
for i in {3..18}
do
gawk '{if ($i != "") print $1 "\t" $2 "\t" $i;}' input > output
done
the count start in 3 because its the first gene name and end in 18 because the longest line has 18 columns, and the bucle is to "walk" through columns, but its clear that Im missing something because the imput is any other thing:
69346001 69366001 69346001 69366001 SMN2
140222001 140240001 140222001 140240001 PCDHA1 PCDHA@ PCDHA2 PCDHA3 PCDHA4 PCDHA5 PCDHA6 PCDHA7 PCDHA8 PCDHA9 PCDHA10
Thanks for posting evidence that you have tried to solve the problem. Next time, just include the information in your question - you'll get a much better response.