Entering edit mode
9.3 years ago
fufuyou
▴
110
I want to extract gene expression values from a txt file based on gene names.
For example:
A 1500
B 2500
C 3500
D 3400
I want to extract A and C gene expression values using gene name file extract it.
A 1500
C 3500
I have written some perl code, but it is not work well. Could you help me modify it or give me a new code?
Thanks,
#!/usr/bin/perl -w
use strict;
open(IN,"file1");
open(NAME,"file2");
open(OUT,">result");
while(<IN>){
chomp;
@data=split / /,$_;
$hash{$data[0]}=$data[1];
}
while(<NAME>){
chomp;
print OUT "$_ $hash{$_}\n";
}
close IN;
close NAME;
close OUT;
you are reinventing the wheel: the solution is sorting both files and using linux join: http://linux.die.net/man/1/join
Thanks Pierre Lindenbaum and Zaag,
What is wrong of my perl code?
Please note that I vote for Pierre's answer! Nonetheless, I will try to shed some light on your programming experiments.
I am not sure, that
works as you expect. I believe
/ /
really tries to match $_ against the regular expression containing a single space. Using some parenthesis for your code, one gets:The reg-ex matching returns the result by means of 1 and '' (or something similar) which evaluate to boolean TRUE and FALSE in Perl. Consequently, Perl will evaluate the statement to:
or
depending on the content of $_, i.e., does $_ contain a single space. Finally, the line will be splitted by '1' or '' which is not what you want, right?!
Tip 1:
Try to debug your script using Data::Dumper but be aware that you might get a long list of results:
Tip 2:
If you really want to do it by yourself, think about loading the file into memory which is smaller (I guess these are the names). Applying your script to very large data sets may eventually crash your computer.
Tip 3:
Really try to go for Pierre's solution ;-)
Hello fufuyou!
It appears that your post has been cross-posted to another site: http://seqanswers.com/forums/showthread.php?t=62682
This is typically not recommended as it runs the risk of annoying people in both communities.