To answer the second part of your question: I don't think it is possible to go from a NM_ record (RefSeq mRNA) to chromosomal coordinates. RefSeq mRNAs represent the processed, mature mRNA, so the coordinates are relative to the processed sequence, after splicing.
To take a small example, NM_007102. Download in GenBank format and try the following code:
#!/usr/bin/perl -w
use strict;
use Bio::SeqIO;
use Data::Dumper;
my @exons;
my $file = "NM_007102.gb";
my $seqio = Bio::SeqIO->new(-file => $file, -format => "genbank");
my $seq = $seqio->next_seq;
for my $feat($seq->get_SeqFeatures) {
if($feat->primary_tag eq "exon") {
push(@exons, $feat->location);
}
}
print Dumper @exons;
Result:
$VAR1 = bless( {
'_strand' => 1,
'_seqid' => 'NM_007102',
'_location_type' => 'EXACT',
'_start' => '1',
'_end' => '120',
'_root_verbose' => 0
}, 'Bio::Location::Simple' );
$VAR2 = bless( {
'_strand' => 1,
'_seqid' => 'NM_007102',
'_location_type' => 'EXACT',
'_start' => '121',
'_end' => '307',
'_root_verbose' => 0
}, 'Bio::Location::Simple' );
$VAR3 = bless( {
'_strand' => 1,
'_seqid' => 'NM_007102',
'_location_type' => 'EXACT',
'_start' => '308',
'_end' => '597',
'_root_verbose' => 0
}, 'Bio::Location::Simple' );
You'll see that the exons are contiguous. Without information on intron coordinates, this cannot be mapped back to the chromosome.
The only answer I have just now is that I feel your pain. I just spent a couple of hours trying to make this module do something...hope to have an actual answer before too long!
Let's hope so...
Does anyone know if there is a bioperl cookbook for these modules?
I have searched extensively: there are no cookbooks, how-tos or tutorials, only the sparse module documentation. There is also very little information on the mailing list, suggesting the module is not widely-used and some discussion of a bug when converting between peptide/nucleotide coordinates which may not have been fixed. You might be better off using another approach.
I have talked to the developers of these modules; they are used quite a bit, just wish they would document a bit more. There is some additional slideshare docs that Aaron Mackey wrote up.
Maybe we just need to write up some docs with those basics in mind, and hopefully they will be added to over time.