Extract reads for Anaeroplasma species from whole genome sequencing data
0
0
Entering edit mode
2.5 years ago
Amro • 0

Hi everyone,

I have a fasta file containing reads generated by PacBio whole genome sequencing of a feces sample from mouse. I need to extract the reads which belong to the Anaeroplasma species and generate an assembled circularized Anaeroplamsa genome. I am familiar with Linux and R. I would very much appreciate your advice on the needed steps and the required tools or R packages.

Many thanks

PacBio Assembly Mouse WGS • 820 views
ADD COMMENT
0
Entering edit mode

I think there are multiple ways to solve this problem so please try to consider other solutions.

I would try to cluster the PacBio reads into bins by using MetaBCC-LR. Then, classify your bins with GTDB-tk. Finally, Run a long-reads assembler (canu or flye or metaFlye) using as input the long-reads coming from the bins classified by GTDB-tk as Anaeroplasma.

ADD REPLY
0
Entering edit mode

If a reasonably good/complete genome for a Anaeroplasma is available, you could align your reads to that genome using minimap2 and then choose only those reads that show more or less full length alignments.

ADD REPLY
0
Entering edit mode

As far as the filtering out reads that belong to Anaeroplasma species part is concerned, bloomfilter.sh from BBTools might also work nicely - although I have never tested it with PacBio reads, so I can't vouch for it. Also make sure to have a look at /bbmap/docs/guides/TadpoleGuide.txt as one option for the assembly step.

ADD REPLY

Login before adding your answer.

Traffic: 1810 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6