I have a tab separated dataset, although the GO terms are comma separated.
GENEID1 GO:XXXXX,GO:YYYYYY,GO:ZZZZZZ
I want to make it so that the dataset becomes a tab-seperated dataset where each GO term is represented on a new line with the gene identifier:
GENEID1 GO:XXXXX
GENEID1 GO:YYYYYY
GENEID1 GO:ZZZZZZ
Many thanks.
I added markup to your post for increased readability. You can do this by selecting the text and clicking the 101010 button. When you compose or edit a post that button is in your toolbar, see image below:
Is it all tab separated? there are what look like tabs and commas in your example input.
A perl one liner could be
perl -ane '{print map {$F[0]."\t".$_."\n" } @F[1..$#F] }' your_input_file |sed -s 's/,$//'
This outputs the same as the input if
your_input_file
is created withecho -e "GENEID1\tGO:XXXXX,GO:YYYYYY,GO:ZZZZZZ" > your_input_file
. Might there be a slight typo?