Hi everyone, I got some binning data and I want to retrieve 16s sequences from them. The binning data are supposed to be pure single genome, but each of them is composed of several contigs. So it may cause problem if I submit them to 16sr RNA identifier like RNAmmer which requires single genome sequence file. Do you know any program serves this purpose? Many thanks:)~
Thank you. I don't want to remove them, I just want to extract them for phylogenetic classification.
I put a link to a set of ribosomal kmers on Google drive:
https://drive.google.com/file/d/0B3llHR93L14wS2NqRXpXakhFaEk/view?usp=sharing
I made it mainly from Silva. It's small (9MB) and you can use it with BBDuk like this:
It has roughly 99.94% sensitivity against the full Silva database.
This works quite nice! I would like to build the ribokmer reference also for other genes. I tried it with kmercountexact from bbmap but I can not replicate the file provided on google drive. Can you shed some light on how to this reference set of ribosomal kmers? Thanks!
The process was a little involved. I started with the Silva ribosomal database, and followed this procedure:
The file on my google drive is, I think, the version in which I kept only kmers present at least 3 times in the deduplicated Sliva database.
Do you know which release of silva this file corresponds with? https://www.arb-silva.de/no_cache/download/archive
Would you recommend a different approach today since Silva provide non-redundant files?:
Using those 2 files only take an extra 2 minute in comparison to the 9.5 MB you shared, and it removes the same amount of reads 4.7%
Cool,have you got documentation or publication for this tool?
There's no paper yet. Documentation is in the shell script (it is printed if you run it with no arguments). There's also a thread here explaining common uses.
Edit: There is also now a BBDuk usage guide in the /docs/ directory.
Sorry to resurrect a thread. Would you happen to have the information regarding organisms that each of those reads map to? @Brian Bushnell