Entering edit mode
6.7 years ago
odoluca
▴
20
Hello all,
I have millions of short DNA sequences (20-150 nt) and I suspect some of these show similarity. I would like to compare them using pairwise alignment method, however, I am having trouble in determining which match score, mismatch penalty and gap penalty (linear method) I should use. My research didnt result any guideline for such short sequences, so any help is appreciated.
Are these in fasta format? You may want to use CD-HIT to reduce redundancy instead of trying to do pair-wise alignments.
I checked the CD-HIT. Its webservers don't allow any file bigger than 50 MBs. So I have to compile myself. However, I already coded a pairwise comparison algorithm, so I want to stick with it, in which otherwise would take longer. Thanks anyways.