BBMap not mapping reads to sequences known to exist in reference
1
0
Entering edit mode
3.7 years ago

I am trying to use BBMap to map some degenerate primers to an assembled reference genome. The primers are for the adenylation domain of NRPS biosynthetic gene clusters. I have used SeqKit amplicon with this primer pair and got the expected results (I can share if needed), however BBMap mapps nothing.

I am running BBMap twice, once for each primer, would it be better to run in a single go as forward and reverse pair?

Input

bbmap.sh ref=$refSeq in=$input out=$output

Forward primer: GCSTACSYSATSTACACSTCSGG

Reverse primer: SASGTCVCCSGTSCGGTA

Output

I am omitting the output about index generation.

   ------------------   Results   ------------------

Genome:                 1
Key Length:             13
Max Indel:              16000
Minimum Score Ratio:    0.56
Mapping Mode:           normal
Reads Used:             15755   (7876943 bases)

Mapping:                0.351 seconds.
Reads/sec:              44903.74
kBases/sec:             22450.28


Read 1 data:            pct reads       num reads       pct bases          num bases

mapped:                   0.0000%               0         0.0000%                  0
unambiguous:              0.0000%               0         0.0000%                  0
ambiguous:                0.0000%               0         0.0000%                  0
low-Q discards:           0.0000%               0         0.0000%                  0

perfect best site:        0.0000%               0         0.0000%                  0
semiperfect site:         0.0000%               0         0.0000%                  0

Match Rate:                   NA               NA            NaN%                  0
Error Rate:                  NaN%               0            NaN%                  0
Sub Rate:                    NaN%               0            NaN%                  0
Del Rate:                    NaN%               0            NaN%                  0
Ins Rate:                    NaN%               0            NaN%                  0
N Rate:                      NaN%               0            NaN%                  0

What could be causing this to occur as I assume it is something I am doing wrong?

alignment • 1.6k views
ADD COMMENT
0
Entering edit mode

These are degenerated sequences and very short, so my guess is that an NGS aligner such as BBMap will never align these as alignment scores are super small, as a consequence of short length and minimal matches. I think it is simply the wrong tool as not developed with this task in mind.

ADD REPLY
0
Entering edit mode

Okay this makes sense. Is it possible to force BBMap to allow such alignments? If not do you have any other tools you would recommend?

ADD REPLY
0
Entering edit mode

What was wrong with SeqKit amplicon?

ADD REPLY
0
Entering edit mode

The end goal is to search within a metagenome for these domains. SeqKit returned nothing from a large polled metagenomic assembly so I am wanting to check with a different alignment tool.

ADD REPLY
0
Entering edit mode

I am not sure about BBMap but bowtie was developed for short ungapped alignments, but it does not work on degenerate sequences. You could write a little piece of code to (rather than the degenerate primers) create every possible sequence based on the degenerated sequences, and then run these sequences with bowtie against your metagenomes. bowtie, not bowtie2!

ADD REPLY
0
Entering edit mode

If you do write a program to generate all possible combinations of those IUPAC codes then BBMap can be tried with following parameters (which are normally used for miRNA)

ambig=all vslow perfectmode maxsites=1000
ADD REPLY
0
Entering edit mode
3.7 years ago
GenoMax 147k

BBMap aligner is not the correct tool for this. You could try bbduk.sh in filter mode to see if that helps. Replace all IUPAC characters with N's in your query. Since you have a lot of degenerate characters this may not work at all.

From BBDuk guide. Matching degenerate sequences such as primers:

bbduk.sh in=reads.fq out=matching.fq literal=ACGTTNNNNNGTC copyundefined k=13 mm=f

This will clone the reference sequences to represent every possibility for the degenerate bases (Ns and other non-ACGT IUPAC symbols). For example, this would create ACGTTAAAAAGTC, ACGTTAAAACGTC, ACGTTAAAAGGTC, and so forth (all 1024 possibilities). If you are interested in seaching for new life by mining shotgun metagenomic reads for 16s sequences that do not quite match your primers... this (and hdist) might be a good place to start! But it's also useful for adapters with barcodes.

You can also try the tools mentioned in this thread: Automated Primer-Blast like funcionnality You will need to have your metagenome data in fasta format.

ADD COMMENT
0
Entering edit mode

It it possible to get the "mapping" locations in terms of bp on the contig through this method? I tried and it just give me back my input fasta sequence as the output

ADD REPLY
0
Entering edit mode

No. bbduk is not a mapping tool. It is simply matching sequence and recovering any reads that contain that sequence.

If you need mapping then you will need to create an input set of all degenerate sequences based on your motif. Have you tried any of primer tools in thread I linked in answer above?

ADD REPLY

Login before adding your answer.

Traffic: 1822 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6