I have a quite big dataset consisting of 23nt long DNA sequences. Can I use Bowtie2 to find out sequences from this dataset which differ from a given query by upto 4 mismatches? The query will also be a 23nt long sequence. I am new to Bowtie. Kindly help! Thanks in advance.
Can you please tell me if we can view alignment between the query and the hits obtained when using cd-hit-est-2d. It is difficult to guess the position of mismatches and gaps using the current format.
That corresponds to a minimum score of around -24, so --very-sensitive --score-min C,-24,0 -N 1 or something like that should work. You'll probably need to decrease the seed length to something like -L 15 and then play with some of the other settings. Are these miRNAs or do you really just have extremely short reads?
You may have to play around with the word size parameter (-n), but I think 4 should work.
I'm almost sure I saw a similar question to yours a few days ago but I can't find it. On this question, SWARM was mentioned and the OP seemed happy with the results.
I have my own dataset with millions of DNA sequences (each of which is 23 nt) within which I would like to find sequences similar to a new query. Can this be used for it? Can you please explain me the various parameters being used here?
-i input filename for db1 in fasta format, required
-i2 input filename for db2 in fasta format, required
-o output filename, required
-c sequence identity threshold, default 0.9
-n word_length, default 10, see user's guide for choosing it
You have to use a small word size, as you want somewhat low similarity and have very short sequences. Set your identity threshold according to the level of similarity you want (19/23)
ADD REPLY
• link
updated 4.9 years ago by
Ram
44k
•
written 8.9 years ago by
h.mon
35k
Hi hi.mon,
Can you please tell me if we can view alignment between the query and the hits obtained when using cd-hit-est-2d. It is difficult to guess the position of mismatches and gaps using the current format.
Thanks in advance!!