Question

inputs to genomicconsensus arrow algorithm

0

Entering edit mode

6.1 years ago

peri.tobias ▴ 10

Firstly apologies, this is a cross-post from here as I was not sure if I had sent to the correct forum. If I get a useful answer I will make sure it is across both platforms.

https://github.com/PacificBiosciences/pbcore/issues/118

I have assembled a de novo genome (1.98 Gb) with canu v1.6 using pacbio reads. I am in the polish stage and have aligned raw subreads.bam to the assembly with blasr in batches, as the process was running out of allocated walltime when all reads submitted. I therefore have 6 large alignment.bam files 342G, 284G, 240G, 117G, 154G, 78G.

I was trying to merge the bam files using pbmerge, however this too was a very long process and at one stage failed. Is it possible to run these individual alignment.bam files as inputs with the arrow algorithm and get 6 fasta outputs? My thinking is that these are going to be much smaller files to merge but I am not sure if this is valid.

Alternatively, is there a more efficient method to do the genomic consensus? I had data from both RSII and Sequel as starting files.

pacbio assembly polish arrow • 1.9k views

ADD COMMENT • link updated 5.6 years ago by harish ▴ 470 • written 6.1 years ago by peri.tobias ▴ 10

score 0 · Answer 1 · 2019-04-03

0

Entering edit mode

5.6 years ago

harish ▴ 470

Hi,

Why not slice the bam files to submit each contig as it's own reference to run arrow? After this you can merge back the genome fasta.

ADD COMMENT • link 5.6 years ago by harish ▴ 470