Hi
I have been trying to estimate Tajima-Nei distance for my data (if you wanna see the files I leave the link below).
I´m following this protocol from BioPerl: https://metacpan.org/pod/Bio::Align::DNAStatistics
I have 314 sequences in a fasta file and another file with the list of IDs. Fasta:
>AVP78031.1
----atgttgtttttcttgtttcttcagttcgccttagtaaactc---------------
------------------------------------ccagtgtgttaacttgacaggcag
a----------------accccactcaatcccaattat--actaattcttcacaaagagg
...
IDs:
AVP78031.1
...
I´m using this Perl script to calculate the Tajima-Nei Distance in a pairwise comparison (314 * 314):
use strict;
use warnings;
use Bio::AlignIO;
use Bio::Align::DNAStatistics;
my $file = $ARGV[0];
my $idfile = $ARGV[1];
if ($file eq "" ) {
$file = "NT_MSA_S_protein.fasta";
} elsif ($idfile eq "" ) {
$idfile = "NT_ID_S_protein.csv";
}
#### Considerando un archivo
my @contentIDS;
open (LIST, $idfile) or die;
while (my $l = <LIST>) {
$l =~ s/\n//g; # delete newline
$l =~ s/\r//g; # delete CR
next if (length($l) < 1);
push @contentIDS, $l;
}
close LIST;
#### .... IDs list
my $stats = Bio::Align::DNAStatistics->new();
my $alignin = Bio::AlignIO->new(-format => 'fasta', -file => $file); ### $file: MSA file
while (my $aln = $alignin->next_aln) {
#print "reading...A\n"; ### DIAG
my $matrix = $stats->distance(-align => $aln, -method => 'Tajima-Nei');
#print "reading...B\n"; ### DIAG
### Obtaining values for each pair (DISTANCE!)
WL1:
foreach my $aaa (@contentIDS) { ### ID #1
WL2:
foreach my $baa (@contentIDS) { ### ID #2
next (WL2) if ($aaa eq $baa);
my $data = $matrix->get_entry($aaa, $baa);
#($data = 0) if ($data < 0);
print "DISTANCE\t$aaa\t$baa\t$data\n";
} # END WL2
} # END WL1
}
exit;
#
This script work it when I tried with small data, however, when I try with my real data this is the error message
MSG: Must provide a DNA alignment to Bio::Align::DNAStatistics, you provided a protein
---------------------------------------------------
Can't locate object method "get_entry" via package "0" (perhaps you forgot to load "0"?) at Tajima-Nei_Distance_NV.pl lin$
This is weird because I review my data for ambiguous characters and the characters are in majority "atcg" and on some occasions "n", at least that there are other ambiguous characters (maybe) that represent protein sequence. I really don´t understand the message because the fasta file is clearly a nucleotide sequence.
Link: https://github.com/MauriAndresMU1313/Example_Tajima-Nei_Distance_Bioperl/tree/main
Anyone with experience using Bioperl and estimation Tajima-Nei distance?
Any comment or help is welcome!! Thank!