I worked in DNA K-mers counting and I prepare this formulation to solve the counting using perfect hash table:
Using Rcpp API(C++) to integrate the code into R:
#include <Rcpp.h>
using namespace Rcpp;
/*
this code can be used with c++ by replacing IntegerVector by std::vector<int>
*/
//************************************************
inline const short int V (const char x){
switch(x){
case 'A':case 'a':
return 0;
break;
case 'C':case 'c':
return 1;
break;
case 'G':case 'g':
return 2;
break;
default:
return 3;
break;
}
}
inline unsigned int X0( const std::string A,const int k ,const int n){
unsigned int result=0;
int j=k;
for( int i=n-1;i>n-k-1;i--) {
result+= pow(4,k-j)*V(A[i]);
j--;
}
return result;
}
// [[Rcpp::export]]
inline IntegerVector kmer4(const std::string A,const int n,const int k)
{
IntegerVector P(pow(4,k));
int x=X0(A,k,n);
P[x]++;
const int N=pow(4,k-1);
for( int i=n-k-1;i>-1;i--){
x=N*V(A[i])+x/4-x%4/4;
P[x]++;
}
return P;
}
There are two questions:
Assuming the index x of a kmer,the compliment of x is then (4^k)-x-1. Can we get the reverse using a numeric operation like preceding formula?
There are two problems in running time: iteration over the string and vector creation where the k is over then 8. Are there ideas to solve these problems?
I'm not sure if you are doing this as a programming problem, or if you are more interested in the functionality of the software. If it is the latter, consider taking a look at the Bioconductor Biostrings package.
I'm also not sure about your goal. What are you trying to accomplish?
I am not very smart in bit transformation. I look to get the index of the reverse compliment of a k-mer parting from the original index.The compliment is solved but the reverse is more complicated. Second, I hope a collaboration to fast up the code overcoming principal bugs: string iteration and vector creation.
Are you just asking:
"Can you encode DNA, somehow, such that you can use a bitwise operator to reverse compliment it?"
?
Yes, how we can count kmers for the sequence and it 's reverse compliment at the same time?