inputs to genomicconsensus arrow algorithm
1
0
Entering edit mode
6.1 years ago
peri.tobias ▴ 10

Firstly apologies, this is a cross-post from here as I was not sure if I had sent to the correct forum. If I get a useful answer I will make sure it is across both platforms.

https://github.com/PacificBiosciences/pbcore/issues/118

I have assembled a de novo genome (1.98 Gb) with canu v1.6 using pacbio reads. I am in the polish stage and have aligned raw subreads.bam to the assembly with blasr in batches, as the process was running out of allocated walltime when all reads submitted. I therefore have 6 large alignment.bam files 342G, 284G, 240G, 117G, 154G, 78G.

I was trying to merge the bam files using pbmerge, however this too was a very long process and at one stage failed. Is it possible to run these individual alignment.bam files as inputs with the arrow algorithm and get 6 fasta outputs? My thinking is that these are going to be much smaller files to merge but I am not sure if this is valid.

Alternatively, is there a more efficient method to do the genomic consensus? I had data from both RSII and Sequel as starting files.

pacbio assembly polish arrow • 1.9k views
ADD COMMENT
0
Entering edit mode
5.6 years ago
harish ▴ 470

Hi,

Why not slice the bam files to submit each contig as it's own reference to run arrow? After this you can merge back the genome fasta.

ADD COMMENT

Login before adding your answer.

Traffic: 1744 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6