I have two matrix. The first one is a genotype matrix in which:
- Rows represent locus
- Columns represent samples
- Each value represents a genotype which could be either P1/P1, P2/P2, P1/P2 or NA if the genotype is not determined.
The second matrix is a matrix of counts. As the first one:
- Rows represent locus
- Columns represent genotypes
- Each value represent of a count in each locus for each sample.
I would like to use the genotypic information to treat the second matrix. The aim is to replace the count value with NA when the genotype is not determined (i.e NA).
Here is an example of my two matrix:
-Genotypic Matric
CDS BC1-III BC1-IV BC10-II
LOC105031928 P1/P2 P1/P2 P1/P2
LOC105031930 NA NA NA
LOC105031931 P1/P1 P1/P1 P1/P1
LOC105031933 P1/P1 P1/P1 P1/P1
LOC105031934 NA NA NA
LOC105031935 P1/P1 P1/P1 P1/P1
LOC105031937 NA NA NA
LOC105031938 P1/P1 P1/P1 P1/P1
-Matrix of Counts
CDS BC1-III BC1-IV BC10-II
LOC105031928 175 181.5 99
LOC105031930 10 50 0
LOC105031931 401 691 572
LOC105031933 17 69 15.75
LOC105031934 0 0 0
LOC105031935 6 0 17
LOC105031937 0 0 0
LOC105031938 408 520.1 165
What my script should give:
CDS BC1-III BC1-IV BC10-II
LOC105031928 175 181.5 99
LOC105031930 NA NA NA
LOC105031931 401 691 572
LOC105031933 17 69 15.75
LOC105031934 NA NA NA
LOC105031935 6 0 17
LOC105031937 NA NA NA
LOC105031938 408 520.1 165
I could read the genotypic matrix line by line and link the two matrix by their CDS as ID but i want to make sure that one value is specifi to its CDS and its sample. I am beginner to perl and by now i don't know yet how to extract from a matrix the header and row information and then assign them to one value. Thanks for your help.
PS: This is what I have done from now:
open(GENOTYPE, '<', "$matrix_geno") or die ("Cannot open $matrix_geno\n");
my %hash_Loc_line = ();
while (my $line = <GENOTYPE>)
{
chomp $line;
next if ($line =~ /^CDS/);
my @columns = split (/\s+/, $line);
my $nb_col = scalar(@columns)-1;
my $locus = $columns[0];
my @BC = @columns[1..$nb_col];
foreach my $BC (@BC)
{
push @{$hash_Loc_line{$locus}}, $BC;
}
}
Hi Amy,
If an answer was helpful you should upvote it, if the answer resolved your question you should mark it as accepted.
Have you tried anything? From your description it seems you want to simply have NA in the same locations as Table 1 and values from Table 2 otherwise. That is very easy to solve in R for example.
Yes I'm currently working on it but still can't find the right way to solve the problem. Unfortunately i must write the script only in perl. I'am going to add in my post what I did from now. Thanks.
So this is an assignment? Because it would be a one-liner in R, that's a pitty.
Yes it is. Unfortunately :(