Question

How to align DNA reads against a database of protein references

0

Entering edit mode

9.5 years ago

bioinfo ▴ 840

I was wondering whether bowtie, BWA etc. can map nucleotide reads to protein reference database? or they are just simply DNA aligners? I found one called PAUDA that possibly could be useful but have anyone of you used that before?

alignment bowtie BWA • 7.4k views

ADD COMMENT • link updated 2.1 years ago by Ram 44k • written 9.5 years ago by bioinfo ▴ 840

Ram · Answer 1 · 2015-07-09

0

Entering edit mode

9.5 years ago

thackl ★ 3.0k

Bowtie2, BWA etc. only do DNA-DNA. I don't know about PAUDA, but from the doc, it sounds reasonable.

Update: I thought about reverse translation a bit more and like to revise my original statement - probably not a good idea ;)

(My idea would be to convert protein to pseudo transcripts by translating them to DNA and then try a standard mapper. But of course, there are ambiguity issues regarding the genetic code. Still, a sensitive mapper, for example bwa mem, could work)

ADD COMMENT • link 9.5 years ago by thackl ★ 3.0k

2

Entering edit mode

I wrote a tool for this purpose - TranslateSixFrames. It translates back and forth between amino acids and nucleotides. Theoretically, the way you would use it in order to do mapping with a nucleotide aligner is:

Translate the reads to proteins in all six frames.

Translate the aa-encoded reads back to nucleotides, selecting one canonical codon per nucleic acid (TranslateSixFrames does this automatically for aa->nt translation). So, for each initial read, you end up with 6 nucleotide reads.

Translate the proteins to nt-space.

Finally, map the double-translated reads to the translated proteins, and select the best mapping of each of the six read frames.

Theoretically... this should work fine, at least for RNA-seq reads. For DNA reads, most of them will be intronic, but the coding ones should still generally be OK. I wrote this with the intention of integrating it into BBMap to do this automatically, but I have not had time. I might do it in the future. You can still follow this workflow using translate6frames.sh as a standalone tool.

ADD REPLY • link updated 2.1 years ago by Ram 44k • written 9.5 years ago by Brian Bushnell 20k

1

Entering edit mode

I very much like the nt*->aa->nt + aa->nt idea to get consistent codon usage.

ADD REPLY • link updated 2.1 years ago by Ram 44k • written 9.5 years ago by thackl ★ 3.0k

score 0 · Answer 2 · 2015-07-09

The RTG metagenomics tools include a command called mapx which is analogous to (but orders of magnitude faster than) blastx, which we developed for use on the HMP project. It internally translates the DNA reads into amino acids on the possible frames and performs protein alignment against your protein database (including support for protein scoring matrices such as blosum, which your alternative approach would not permit).