Entering edit mode
8.6 years ago
elhamidihay
▴
30
Please help.
I have two files (file1 and file2). I would like to extract the columns from file2 that have their IDs listed in file1. These are big files, with thousands of columns and lines.
file1
Id123B
Id124A
Id125A
file2
Code sex id123B id127 id125A
desired output file:
id123B id125A
Code i have tried:
#!/usr/bin/perl
use warnings;
use strict;
use feature qw{ say };
open my $COLUMNS, '<', shift or die $!;
chomp( my @columns = <$COLUMNS> );
open my $DATA, '<', shift or die $!;
my @header = split /\t/, <$DATA>;
my %column_index;
@column_index{ @header } = 0 .. $#header;
@columns = grep exists $column_index{$_}, @columns;
while (<$DATA>) {
chomp( my @cells = split /\t/ );
say join "\t", @cells[ @column_index{ @columns } ];
}
I think these kind of tasks are easily done with
R
then to perl, (If it is ok with you, use the following code in R)Here
result.txt
file contains the columns whose ids are present inid_file.txt
.Perl' PDL and Python's pandas should also be ok with those kind of task.
Thank you for the suggestion and for your help. It's just that R is much slower when dealing with large datasets.