I am a beginner with Perl programming. The problem I am working on right now is how to get the gene length from a text file. Text file contains the gene name (column 10), start site (column 6), end site (column 7). The length can be derived from the difference of column 6 and 7. But my problem is how to match the gene name (from column 10) with the corresponding difference derived from the difference of column 6 and column 7. Thank you very much!
open (IN, "Alu.txt");
open (OUT, ">Alu_subfamlength3.csv");
while ($a = <IN>){
@data = split (/\t/, $a);
$list {$data[10]}++;
$genelength {$data[7] - $data[6]};
}
foreach $sub (keys %list){
$gene = join ($sub, $genelength);
print "$gene\n";
}
close (IN);
close (OUT);
If you are now starting out with programming and can choose which language to learn, Perl is not the best choice. It was amazing for bioinformatics 10 years ago, but nowadays a better choice would be Python or R.
Good advice from @WouterDeCoster.
Or at least plan to learn Perl and Python or R. I started off learning Perl (early 2000s), then Python and R. I would not recommend Perl today. All data analytics is focused on Python or R. If you want to learn programming (and possible machine learning) learn Python. If you want to learn about data analysis, statistics, and data visualization learn R. R is not a good way to learn programming as the language was not designed for learning how to program but how to analyze data. See: https://en.wikipedia.org/wiki/Python_(programming_language)#Features_and_philosophy.
If I understand your question right, why are you not processing the file line by line so you can keep track of gene name and associate it with the gene length for each line?
The processing of the file line by line is the obtaining the difference of data 7- data 6. However, my problem is how to affix the corresponding gene name or gene id which is in column 10. Looping or joining the columns in order using Perl code is what I'm figuring out right now. Thank you!
Hello genomics_student!
It appears that your post has been cross-posted to another site: https://stackoverflow.com/questions/57076449
This is typically not recommended as it runs the risk of annoying people in both communities.