Hello there, I have a .CSV file in this format:
"Sample" "Variant" "Haplogroup"
"KAsq0001" "146, 152, 195, 247, 249d, 309+CC, 315+C, 769, 825T, 1005, 1018, 1824, 2758, 2885, 3594, 3970, 4104, 4312, 6216, 6392, 7146, 7256, 7521, 7828, 8468, 8655, 8701, 9540, 10310, 10398, 10535, 10586, 10664, 10688, 10810, 10873, 10915, 11914, 12338, 12705, 13105, 13276, 13506, 13650, 13708, 13928C, 16129, 16187, 16189, 16203, 16223, 16230, 16278, 16291, 16304, 16311" "F2a"
"Kasq0002" "146, 152, 153, 195, 200, 247, 309+CC, 315+C, 489, 709, 769, 825T, 1018, 2758, 2885, 3594, 4104, 4312, 5108, 7146, 7220, 7256, 7521, 7867, 8200, 8468, 8655, 9527, 10400, 10664, 10688, 10810, 10915, 11914, 13105, 13276, 13506, 13650, 14569, 14783, 15043, 15301, 15323, 15497, 16129, 16184, 16187, 16189, 16214, 16230, 16278, 16311, 16362" "G1a3"
As you can see I have three fields. The first one is the name of the sequence sample, the second is the variants that are detected in this sample and the third one is the haplogroup that this sample belongs in. I have to parse that file and have an output like this:
KAsqu0001 146 F2a
KASqu0001 152 F2a
.
.
.
KAsqu0002 146 G1a3
KAsqu0002 152 G1a3
,so have each variant linked with its sample and haplogroup.
I am trying to do that with perl but as you can see the second field has multiple values and more than one lines. The sep-delimiter is {tab} and the text delimiter is ". Should I use a hash in order to have the output that I want?
And just for the future, this seems to be a programming (perl) question. stackoverflow.com is a very appropriate website for such questions.
...and this is an inappropriate question here at the BioStars forum? I think bioinformatic programming questions are completely valid here in this context.
I would have preferred to use Python over Perl, but knowing a little Perl doesn't hurt.
I am sorry if I dint quite get that right. I din't say anything about it being inappropriate here. I merely mentioned that it is more appropriate at stackoverflow than here, meaning that you have more possibility to get better, nicer answers. For example, while its nicer to learn and know and write a perl code yourself to read a CSV file, most people would advice against reinventing the wheel and use a package when available. But that's just what I think.