Extract reads for Anaeroplasma species from whole genome sequencing data

0

Entering edit mode

2.5 years ago

Amro • 0

Hi everyone,

I have a fasta file containing reads generated by PacBio whole genome sequencing of a feces sample from mouse. I need to extract the reads which belong to the Anaeroplasma species and generate an assembled circularized Anaeroplamsa genome. I am familiar with Linux and R. I would very much appreciate your advice on the needed steps and the required tools or R packages.

Many thanks

PacBio Assembly Mouse WGS • 816 views

ADD COMMENT • link updated 2.5 years ago by Matthias Zepper 5.0k • written 2.5 years ago by Amro • 0

0

Entering edit mode

I think there are multiple ways to solve this problem so please try to consider other solutions.

I would try to cluster the PacBio reads into bins by using MetaBCC-LR. Then, classify your bins with GTDB-tk. Finally, Run a long-reads assembler (canu or flye or metaFlye) using as input the long-reads coming from the bins classified by GTDB-tk as Anaeroplasma.

ADD REPLY • link 2.5 years ago by andres.firrincieli 3.8k

0

Entering edit mode

If a reasonably good/complete genome for a Anaeroplasma is available, you could align your reads to that genome using minimap2 and then choose only those reads that show more or less full length alignments.

ADD REPLY • link 2.5 years ago by GenoMax 147k

0

Entering edit mode

As far as the filtering out reads that belong to Anaeroplasma species part is concerned, bloomfilter.sh from BBTools might also work nicely - although I have never tested it with PacBio reads, so I can't vouch for it. Also make sure to have a look at /bbmap/docs/guides/TadpoleGuide.txt as one option for the assembly step.

ADD REPLY • link 2.5 years ago by Matthias Zepper 5.0k

Login before adding your answer.