Question

Perl, extract specific columns

2

Entering edit mode

9.3 years ago

elhamidihay ▴ 30

Please help.

I have two files (file1 and file2). I would like to extract the columns from file2 that have their IDs listed in file1. These are big files, with thousands of columns and lines.

file1

Id123B

Id124A

Id125A

file2

Code sex id123B id127 id125A

desired output file:

id123B id125A

Code i have tried:

     #!/usr/bin/perl
     use warnings;
     use strict;
     use feature qw{ say };

    open my $COLUMNS, '<', shift or die $!;
    chomp( my @columns = <$COLUMNS> );

    open my $DATA, '<', shift or die $!;
    my @header = split /\t/, <$DATA>;
    my %column_index;
    @column_index{ @header } = 0 .. $#header;

    @columns = grep exists $column_index{$_}, @columns;

    while (<$DATA>) {
    chomp( my @cells = split /\t/ );
    say join "\t", @cells[ @column_index{ @columns } ];
    }

perl • 7.0k views

ADD COMMENT • link updated 9.3 years ago by novice ★ 1.1k • written 9.3 years ago by elhamidihay ▴ 30

0

Entering edit mode

I think these kind of tasks are easily done with R then to perl, (If it is ok with you, use the following code in R)

ids=read.table("id_file.txt", header=T)
column.file=read.table("columns.txt", header=T)
ids=as.list(ids)
write.table(column.file[unlist(ids)], file="result.txt", sep="\t", quote=FALSE, row.names=FALSE)

Here result.txt file contains the columns whose ids are present in id_file.txt.

ADD REPLY • link 9.3 years ago by venu 7.1k

0

Entering edit mode

Perl' PDL and Python's pandas should also be ok with those kind of task.

ADD REPLY • link 9.3 years ago by Echo ▴ 70

0

Entering edit mode

Thank you for the suggestion and for your help. It's just that R is much slower when dealing with large datasets.

ADD REPLY • link 9.3 years ago by elhamidihay ▴ 30

score 1 · Accepted Answer · 2016-04-30

1

Entering edit mode

9.3 years ago

novice ★ 1.1k

Your code should work fine, but you are not including the headers in the output. Just add say join "\t", @columns; before the while-loop.

BTW, split /\t/ is equivalent to split. :)

ADD COMMENT • link 9.3 years ago by novice ★ 1.1k

0

Entering edit mode

@novice, thank you very much, it works :)

ADD REPLY • link 9.3 years ago by elhamidihay ▴ 30

0

Entering edit mode

Happy to help. You can click the tick to accept it.

ADD REPLY • link 9.3 years ago by novice ★ 1.1k