Entering edit mode
2.5 years ago
Amro
•
0
Hi everyone,
I have a fasta file containing reads generated by PacBio whole genome sequencing of a feces sample from mouse. I need to extract the reads which belong to the Anaeroplasma species and generate an assembled circularized Anaeroplamsa genome. I am familiar with Linux and R. I would very much appreciate your advice on the needed steps and the required tools or R packages.
Many thanks
I think there are multiple ways to solve this problem so please try to consider other solutions.
I would try to cluster the PacBio reads into bins by using MetaBCC-LR. Then, classify your bins with GTDB-tk. Finally, Run a long-reads assembler (canu or flye or metaFlye) using as input the long-reads coming from the bins classified by GTDB-tk as Anaeroplasma.
If a reasonably good/complete genome for a Anaeroplasma is available, you could align your reads to that genome using
minimap2
and then choose only those reads that show more or less full length alignments.As far as the filtering out reads that belong to Anaeroplasma species part is concerned,
bloomfilter.sh
from BBTools might also work nicely - although I have never tested it with PacBio reads, so I can't vouch for it. Also make sure to have a look at /bbmap/docs/guides/TadpoleGuide.txt as one option for the assembly step.