Aligning PacBio hifiasm CCS to reference genome using pbmm2
2
1
Entering edit mode
3.7 years ago
K ▴ 10

Hi,

I had consensus reads (ccs.bam) from a PacBio sequencing. I used the assembler hifiasm to create an assembly on these CCS. I got five .gfa files. I have transformed .gfa file using bandage and awk command. Contigs from de novo assemblies (.fasta file) were used in standalone blast against reference genome. I have the coordinates which mapped to the reference genome.

I am interested to align .fasta and reference genome using pbmm2

A. Generate index file for reference and reuse it to align reads

$ pbmm2 index xxx.fasta xxx.mmi

B. Align contig yyy.fasta to reference genome file (xxx.mmi)

pbmm2 align xxx.mmi xxx.fasta

However, I am getting this error.

|> 20210517 18:05:07.939 -|- WARN -|- CheckPositionalArgs -|- 0x11a162dc0|| -|- Input is FASTA. Output BAM file cannot be used for polishing with GenomicConsensus!

How to fix this issue.

Thanks

assembly pbmm2 pacbio hifiasm • 5.2k views
ADD COMMENT
4
Entering edit mode
3.7 years ago
gconcepcion ▴ 410

This is just a legacy warning (pre-HiFi data) and will not prevent the command from completing.

In general, assemblies generated from HiFi data do not need to be polished.

ADD COMMENT
0
Entering edit mode

I have assemblies from IPA as well as from Hifiasm. However, hifiasm are longer as compared to IPA therefore, I am proceeding with hifiasm.

However, the command is being killed

zsh: killed pbmm2 align xxx.fasta yyy.fasta

ADD REPLY
0
Entering edit mode

It might help to get a more detailed log from pbmm2. You can get a debug trace with --log-level DEBUG that might help you spot where/why pbmm2 is being killed.

https://github.com/PacificBiosciences/pbmm2#can-i-get-progress-output

ADD REPLY
0
Entering edit mode

pbmm2 align yyy.fasta xxx.fasta --log-level DEBUG

|> 20210518 22:31:50.529 -|- INFO -|- AlignSettings -|- 0x11407cdc0|| -|- Using 8 threads for alignments. |> 20210518 22:31:50.529 -|- WARN -|- CheckPositionalArgs -|- 0x11407cdc0|| -|- Input is FASTA. Output BAM file cannot be used for polishing with GenomicConsensus! |> 20210518 22:31:50.529 -|- INFO -|- CheckPositionalArgs -|- 0x11407cdc0|| -|- READ input file: CGD-scaffold_3.fasta |> 20210518 22:31:50.529 -|- INFO -|- CheckPositionalArgs -|- 0x11407cdc0|| -|- REF input file: ctg1.fasta |> 20210518 22:31:50.530 -|- INFO -|- Index -|- 0x11407cdc0|| -|- Start reading/building index |> 20210518 22:31:51.361 -|- INFO -|- Index -|- 0x11407cdc0|| -|- Finished reading/building index |> 20210518 22:31:51.407 -|- DEBUG -|- PostInit -|- 0x11407cdc0|| -|- Minimap2 parameters based on preset: SUBREADS |> 20210518 22:31:51.407 -|- DEBUG -|- PostInit -|- 0x11407cdc0|| -|- Kmer size : 19 |> 20210518 22:31:51.407 -|- DEBUG -|- PostInit -|- 0x11407cdc0|| -|- Minimizer window size : 10 |> 20210518 22:31:51.407 -|- DEBUG -|- PostInit -|- 0x11407cdc0|| -|- Homopolymer compressed : true |> 20210518 22:31:51.407 -|- DEBUG -|- PostInit -|- 0x11407cdc0|| -|- Gap open 1 : 5 |> 20210518 22:31:51.407 -|- DEBUG -|- PostInit -|- 0x11407cdc0|| -|- Gap open 2 : 56 |> 20210518 22:31:51.407 -|- DEBUG -|- PostInit -|- 0x11407cdc0|| -|- Gap extension 1 : 4 |> 20210518 22:31:51.407 -|- DEBUG -|- PostInit -|- 0x11407cdc0|| -|- Gap extension 2 : 1 |> 20210518 22:31:51.407 -|- DEBUG -|- PostInit -|- 0x11407cdc0|| -|- Match score : 2 |> 20210518 22:31:51.407 -|- DEBUG -|- PostInit -|- 0x11407cdc0|| -|- Mismatch penalty : 5 |> 20210518 22:31:51.407 -|- DEBUG -|- PostInit -|- 0x11407cdc0|| -|- Z-drop : 400 |> 20210518 22:31:51.407 -|- DEBUG -|- PostInit -|- 0x11407cdc0|| -|- Z-drop inv : 50 |> 20210518 22:31:51.407 -|- DEBUG -|- PostInit -|- 0x11407cdc0|| -|- Bandwidth : 2000 |> 20210518 22:31:51.407 -|- DEBUG -|- PostInit -|- 0x11407cdc0|| -|- Max gap : 5000 |> 20210518 22:31:51.407 -|- DEBUG -|- PostInit -|- 0x11407cdc0|| -|- Long join flank ratio : 0.5

ADD REPLY
0
Entering edit mode

Sorry I missed this message. None of the messages in your log are fatal. These are all DEBUG/INFO/WARN messages. Is this the entire log before the command is killed?

ADD REPLY
2
Entering edit mode
3.5 years ago
Billy Rowell ▴ 360

For aligning assembly fasta to the reference, I'd recommend minimap2.

minimap2 -t 16 -L --secondary=no --eqx -ax asm5 ref.fasta asm.fasta \
    | samtools sort -@ 3 > asm.aln.sorted.bam
ADD COMMENT
0
Entering edit mode

Can you tell me if it's bettter to assemby with minimap2 or with pbmm2 (wrapper)?

ADD REPLY
0
Entering edit mode

Hi Alison, I think we need some clarification on what you are trying to do. minimap2 or the wrapper pbmm2 don't do assembly, they do mapping (also known as alignment) of reads or sequences to a reference genome.

If you want to assemble a dataset, then there is an entirely different suite of algorithms / programs that handle that type of analysis. If you have HiFi data, I would suggest you use hifiasm or HiCanu. If you have Continuous Long Reads (CLR) then I would suggest you use Falcon / Canu or Flye to de novo assemble your genome of interest.

ADD REPLY

Login before adding your answer.

Traffic: 3397 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6