I am running bowtie
with the following parameters, to look for up to, say, 10 exact matches of a 36-base nucleotide string to a GRCh37/hg19 index, _e.g._:
$ bowtie -S hg19 -v 0 -k 10 -f sequence.fa > hits.sam
As a sanity check, I sample the 36 base sequence from the same assembly of hg19 (using the same FASTA files used to create the bowtie index) in order to verify that I receive all matches, using UCSC BLAT and NCBI BLAST searches as confirmation.
Some questions:
The docs say that
bowtie
accepts read lengths with an upper bound of 1000 bp. In practice, what is the lower bound of query sequence lengths that it will accept and reliably align?Is the
-oneOff
parameter to the BLAT command-line tool used to limit mismatches to 0?Is there a way to translate accession code hits from an NCBI BLAST search to genomic coordinates (chromosome, start, stop)?
Is there a parameter to limit BLAST+ command-line tool searches for hits that are the same length as the query sequence, or otherwise limit results to exact matches to the query sequence?
Hmm... You know my post on the evaluation of finding all hits. I actually wrote that post in particular for you, but you do not trust me. Interesting.
Actually, I probably missed part of your post which addresses some of these questions. I apologize for my oversight and will take another look.
Hey, I re-read your question and you do address some aspects of my question. Thanks for reminding me about it.