I am trying to use BBMap to map some degenerate primers to an assembled reference genome. The primers are for the adenylation domain of NRPS biosynthetic gene clusters. I have used SeqKit amplicon with this primer pair and got the expected results (I can share if needed), however BBMap mapps nothing.
I am running BBMap twice, once for each primer, would it be better to run in a single go as forward and reverse pair?
Input
bbmap.sh ref=$refSeq in=$input out=$output
Forward primer: GCSTACSYSATSTACACSTCSGG
Reverse primer: SASGTCVCCSGTSCGGTA
Output
I am omitting the output about index generation.
------------------ Results ------------------
Genome: 1
Key Length: 13
Max Indel: 16000
Minimum Score Ratio: 0.56
Mapping Mode: normal
Reads Used: 15755 (7876943 bases)
Mapping: 0.351 seconds.
Reads/sec: 44903.74
kBases/sec: 22450.28
Read 1 data: pct reads num reads pct bases num bases
mapped: 0.0000% 0 0.0000% 0
unambiguous: 0.0000% 0 0.0000% 0
ambiguous: 0.0000% 0 0.0000% 0
low-Q discards: 0.0000% 0 0.0000% 0
perfect best site: 0.0000% 0 0.0000% 0
semiperfect site: 0.0000% 0 0.0000% 0
Match Rate: NA NA NaN% 0
Error Rate: NaN% 0 NaN% 0
Sub Rate: NaN% 0 NaN% 0
Del Rate: NaN% 0 NaN% 0
Ins Rate: NaN% 0 NaN% 0
N Rate: NaN% 0 NaN% 0
What could be causing this to occur as I assume it is something I am doing wrong?
These are degenerated sequences and very short, so my guess is that an NGS aligner such as BBMap will never align these as alignment scores are super small, as a consequence of short length and minimal matches. I think it is simply the wrong tool as not developed with this task in mind.
Okay this makes sense. Is it possible to force
BBMap
to allow such alignments? If not do you have any other tools you would recommend?What was wrong with SeqKit amplicon?
The end goal is to search within a metagenome for these domains. SeqKit returned nothing from a large polled metagenomic assembly so I am wanting to check with a different alignment tool.
I am not sure about BBMap but bowtie was developed for short ungapped alignments, but it does not work on degenerate sequences. You could write a little piece of code to (rather than the degenerate primers) create every possible sequence based on the degenerated sequences, and then run these sequences with bowtie against your metagenomes. bowtie, not bowtie2!
If you do write a program to generate all possible combinations of those IUPAC codes then BBMap can be tried with following parameters (which are normally used for miRNA)