Csv File Parsing With Perl
2
1
Entering edit mode
12.4 years ago
annamantsoki ▴ 40

Hello there, I have a .CSV file in this format:

"Sample" "Variant" "Haplogroup"

"KAsq0001"    "146, 152, 195, 247, 249d, 309+CC, 315+C, 769, 825T, 1005, 1018, 1824, 2758, 2885, 3594, 3970, 4104, 4312, 6216, 6392, 7146, 7256, 7521, 7828, 8468, 8655, 8701, 9540, 10310, 10398, 10535, 10586, 10664, 10688, 10810, 10873, 10915, 11914, 12338, 12705, 13105, 13276, 13506, 13650, 13708, 13928C, 16129, 16187, 16189, 16203, 16223, 16230, 16278, 16291, 16304, 16311"    "F2a"

"Kasq0002"    "146, 152, 153, 195, 200, 247, 309+CC, 315+C, 489, 709, 769, 825T, 1018, 2758, 2885, 3594, 4104, 4312, 5108, 7146, 7220, 7256, 7521, 7867, 8200, 8468, 8655, 9527, 10400, 10664, 10688, 10810, 10915, 11914, 13105, 13276, 13506, 13650, 14569, 14783, 15043, 15301, 15323, 15497, 16129, 16184, 16187, 16189, 16214, 16230, 16278, 16311, 16362"    "G1a3"

As you can see I have three fields. The first one is the name of the sequence sample, the second is the variants that are detected in this sample and the third one is the haplogroup that this sample belongs in. I have to parse that file and have an output like this:

KAsqu0001 146 F2a

KASqu0001 152 F2a

.

.

.

KAsqu0002 146 G1a3

KAsqu0002 152 G1a3

,so have each variant linked with its sample and haplogroup.

I am trying to do that with perl but as you can see the second field has multiple values and more than one lines. The sep-delimiter is {tab} and the text delimiter is ". Should I use a hash in order to have the output that I want?

perl variant • 6.5k views
ADD COMMENT
0
Entering edit mode

And just for the future, this seems to be a programming (perl) question. stackoverflow.com is a very appropriate website for such questions.

ADD REPLY
4
Entering edit mode

...and this is an inappropriate question here at the BioStars forum? I think bioinformatic programming questions are completely valid here in this context.

I would have preferred to use Python over Perl, but knowing a little Perl doesn't hurt.

ADD REPLY
0
Entering edit mode

I am sorry if I dint quite get that right. I din't say anything about it being inappropriate here. I merely mentioned that it is more appropriate at stackoverflow than here, meaning that you have more possibility to get better, nicer answers. For example, while its nicer to learn and know and write a perl code yourself to read a CSV file, most people would advice against reinventing the wheel and use a package when available. But that's just what I think.

ADD REPLY
7
Entering edit mode
12.4 years ago

My perl is a bit rusty, but here is something close:

#!/usr/bin/perl
open (FILE,'data.txt');

sub removeQuotes {
    my $string = shift;
    return substr $string,1,length($string)-2;
}

while (<FILE>) {
    chomp $_;
    ($id, $variants, $haplo) = split('\t',$_);
    $id = removeQuotes($id);
    $haplo = removeQuotes($haplo);
    $variants = removeQuotes($variants);
    @variants = split(', ',$variants);
    for $variant (@variants) {
    print "$id\t$variant\t$haplo\n"
    }
}

Sample output looks like:

KAsq0001    146    F2a
KAsq0001    152    F2a
KAsq0001    195    F2a
KAsq0001    247    F2a
KAsq0001    249d    F2a
KAsq0001    309+CC    F2a
KAsq0001    315+C    F2a
KAsq0001    769    F2a
KAsq0001    825T    F2a
KAsq0001    1005    F2a
KAsq0001    1018    F2a
KAsq0001    1824    F2a
KAsq0001    2758    F2a
KAsq0001    2885    F2a
KAsq0001    3594    F2a
ADD COMMENT
0
Entering edit mode

Thank you so much...It works perfectly...The only thing that I replaced is the (&lt;FILE&gt;) with (<FILE>). Thanks, again!!!

ADD REPLY
0
Entering edit mode

Looks good to me! However, just to be on the paranoid side, I tend to write that function slightly differently, so that it still returns something even if there are no quotes found (on the off-chance bits of the dataset are inconsistent):

sub extractQuote {
    my $sentence = $_[0];
    if ($sentence =~ /"(.+)"/) {
        $sentence = $1;
    }
    return $sentence;
}
ADD REPLY
3
Entering edit mode
12.4 years ago
Arun 2.4k

Or you could use Text::CSV; you can find a nicer usage example here.

ADD COMMENT

Login before adding your answer.

Traffic: 2997 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6