Finding smallest sequence which distinguish any DNA sample
1
0
Entering edit mode
5.9 years ago
melihoten • 0

Hi,

I have 5 different bacteria genome sequences and I need to find smallest sequences which distinguish each of these from other. I need to use R while doing this. Finally, I should get 5 different small sequences,each of them are special and able to distinguish it from others, how can i solve this problem ?

R sequence • 870 views
ADD COMMENT
1
Entering edit mode
5.9 years ago
michael.ante ★ 3.9k

This sounds like a homework for me...

The brute force method would be generating all occurring k-mers for each species (k= {1 ... n}) and look for the smallest k where you have unique sequences for each species.

ADD COMMENT
0
Entering edit mode

I am doing an intern right now and Professor asked me to learn that.

How can I perform Brute Force in R ? I couldn't find any package etc. Do you know any way to do it ?

ADD REPLY
0
Entering edit mode

Learn R-basics.

Find a package that can count the kmers and install it.

Make a test set (small part of the genomes) to test your approach.

Start with a small k, compare the genomes' kmers, increase k if requested condition not met.

ADD REPLY

Login before adding your answer.

Traffic: 2650 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6