I tried to create a function to extract SNP
from fasta
file witch content alignments sequences.
Well, I create a function that make statistics for a list of N aligned sequences.
Using Rcpp api
, the code is:
// [[Rcpp::export]]
NumericMatrix alin_stat(CharacterVector &alin,int &N_sequence)
{
int sequence_size=alin[0].size();
NumericMatrix stat_m (5,sequence_size); // set at 0 by default
for(int i=0;i<N_sequence;i++){
for(int j=0;j<sequence_size;j++){
switch (alin[i][j]){
case 'A' : case 'a': // first row store the frequency of A
stat_m(0,j)++;
break;
case 'C' : case 'c': // second row store the frequency of C
stat_m(1,j)++;
break;
case 'G' : case 'g': // third row store the frequency of G
stat_m(2,j)++;
break;
case 'T' : case 't': // forth row store the frequency of G
stat_m(3,j)++;
break;
default:
stat_m(4,j)++; // non identified base '-','N'...
break;
}
}
}
return stat_m/N_sequence;
}
Note that:
alin contains aligned sequences
N_sequence is number of sequences
Question:
- Witch condition I have to rich to select an SNP?
- Is the alignment procedure (parameters) affects the target result?