Question

pbmm2 alignment of long reads

0

Entering edit mode

5 months ago

priya.bmg ▴ 60

Hello,

I have been using the pbmm2 tool for alignment of long reads. It takes input of the movie.bam files and does the alignment

I get this error:

>|> 20240621 15:14:23.782 -|- INFO -|- AlignSettings -|- 0x2b85319d8340|| -|- Using 6 threads for alignments, 2 threads for sorting, and 1.5G bytes RAM for sorting.
>|> 20240621 15:14:23.783 -|- INFO -|- CheckPositionalArgs -|- 0x2b85319d8340|| -|- READ input file: smrtcells/ready/0006-104/m84177_240228_161820.hifi_reads.bam
>|> 20240621 15:14:23.783 -|- INFO -|- CheckPositionalArgs -|- 0x2b85319d8340|| -|- REF  input file: reference/human_GRCh38_no_alt_analysis_set.fasta
>|> 20240621 15:14:23.866 -|- FATAL -|- CheckPositionalArgs -|- 0x2b85319d8340|| -|- pbmm2 align ERROR: Could not determine read input type(s). Please do not mix data types, such as BAM+FASTQ. File of files may only contain BAMs or datasets.

Has anyone solved this issue. I tried converting the hifi_reads_movie bam file to fasta and fastq formats to check. Deletion or inclusion of fastq and fasta files does not change anything. It will be helpful if someone can share their experience with this issue

Thank you

Long-reads pbmm2 alignment • 1.3k views

ADD COMMENT • link 5 months ago by priya.bmg ▴ 60

1

Entering edit mode

Have you checked the bam file with samtools view to make sure it looks ok. Running samtools quickcheck -vvvv may also be useful.

ADD REPLY • link 5 months ago by GenoMax 147k

0

Entering edit mode

Thank you. I ran with samtools, the bam file looks fine

samtools quickcheck -vvvv pacbio/smrtcells/ready/04/m84177.hifi_reads.bam

           verbosity set to 4
           checking pacbio/smrtcells/ready/04/m84177.hifi_reads.bam
           opened pacbio/smrtcells/ready/04/m84177.hifi_reads.bam
           pacbio/smrtcells/ready/04/m84177.hifi_reads.bam
           is sequence data
          pacbio/smrtcells/ready/04/m84177.hifi_reads.bam
          had no targets in header.
          pacbio/smrtcells/ready/04/m84177.hifi_reads.bam
          has good EOF block.
          pacbio/smrtcells/ready/04/m84177.hifi_reads.bam

ADD REPLY • link 5 months ago by priya.bmg ▴ 60

0

Entering edit mode

Which command did you use with pbmm2?

ADD REPLY • link 5 months ago by WouterDeCoster 47k

0

Entering edit mode

I run the command in snakemake pipeline. Please find the attached snapshot that is shown as error in the rules

enter image description here

log file is posted along with question above

This is rules given in snakemake file:

> 
> ruleorder: pbmm2_align_ubam > pbmm2_align_fastq
> 
> 
> rule pbmm2_align_ubam:
>     input:
>         reference = config['ref']['fasta'],
>         ref_index = config['ref']['index'],
>         query = lambda wildcards: ubam_dict[wildcards.sample][wildcards.movie]
>     output:
>         bam = f"samples/{{sample}}/aligned/{{movie}}.{ref}.bam",
>         bai = f"samples/{{sample}}/aligned/{{movie}}.{ref}.bam.bai"
>     log: f"samples/{{sample}}/logs/pbmm2/align/{{movie}}.{ref}.log"
>     benchmark: f"samples/{{sample}}/benchmarks/pbmm2/align/{{movie}}.{ref}.tsv"
>     params:
>         sample = lambda wildcards: wildcards.sample,
>         preset = "CCS",
>         extra = "--sort --unmapped -c 0 -y 70",
>         loglevel = "INFO"
>     threads: 24
>     conda: "envs/pbmm2.yaml"
>     shell:
>         """
>         (pbmm2 align --num-threads {threads} \
>             --preset {params.preset} \
>             --sample {params.sample} \
>             --log-level {params.loglevel} \
>             {params.extra} \
>             {input.reference} \
>             {input.query} \
>             {output.bam}) > {log} 2>&1
>         """
> 
> 
> rule pbmm2_align_fastq:
>     input:
>         reference = config['ref']['fasta'],
>         ref_index = config['ref']['index'],
>         query = lambda wildcards: fastq_dict[wildcards.sample][wildcards.movie]
>     output:
>         bam = f"samples/{{sample}}/aligned/{{movie}}.{ref}.bam",
>         bai = f"samples/{{sample}}/aligned/{{movie}}.{ref}.bam.bai"
>     log: f"samples/{{sample}}/logs/pbmm2/align/{{movie}}.{ref}.log"
>     benchmark: f"samples/{{sample}}/benchmarks/pbmm2/align/{{movie}}.{ref}.tsv"
>     params:
>         sample = lambda wildcards: wildcards.sample,
>         preset = "CCS",
>         extra = "--sort --unmapped -c 0 -y 70",
>         loglevel = "INFO"
>     threads: 24
>     conda: "envs/pbmm2.yaml"
>     shell:
>         """
>         (pbmm2 align --num-threads {threads} \
>             --preset {params.preset} \
>             --sample {params.sample} \
>             --log-level {params.loglevel} \
>             {params.extra} \
>             {input.reference} \
>             {input.query} \
>             {output.bam}) > {log} 2>&1
>         """

.

ADD REPLY • link 5 months ago by priya.bmg ▴ 60

1

Entering edit mode

Posting errors as screenshots makes thing very difficult to see. If is unclear if there are weird characters in your command or you appear to have tried to redact part of the sample names (which is fine to do and easier when dealing with text). Looks like the error is in your rule. I am not a snakemake person so someone else may chime in.

ADD REPLY • link 5 months ago by GenoMax 147k

score 1 · Answer 1 · 2024-06-23

1

Entering edit mode

5 months ago

Billy Rowell ▴ 330

This snakemake workflow has been deprecated and no longer receives updates or support. There is a new, supported and maintained PacificBiosciences/HiFi-human-WGS-WDL workflow. Please email support@pacb.com with support questions.

If you're trying to distinguish between an input problem or a pbmm2 problem, try pulling the most recent release of pbmm2 (v.1.13.1) directly from GitHub and directly running the command, like:

./pbmm2 align --num-threads 24 \
    --preset CCS \
    --sample 0006-104 \
    --log-level DEBUG \
    --sort --unmapped -c 0 -y 70 \
    reference/human_GRCh38_no_alt_analysis_set.fasta \
    smrtcells/ready/0006-104/m84177_240228_161820.hifi_reads.bam \
    samples/0006-104/aligned/m84177_240228_161820.GRCh38.bam \
    > pbmm2_manual_debug.log

If this fails, please provide the debug log generated above, as well as the BAM header, in your email to support@pacb.com. You can extract the header with:

samtools view -H smrtcells/ready/0006-104/m84177_240228_161820.hifi_reads.bam > header.sam

I also notice in your initial error message that you're trying to use 6 threads total, with only ~1.5GB for sorting. pbmm2 alignment for a full human WGS ~30x Revio BAM takes ~4 hours on 24 threads with 96GB RAM, so with the resources you've provided, it would run >24 hours.

ADD COMMENT • link 5 months ago by Billy Rowell ▴ 330

0

Entering edit mode

Thank you Billy, I will contact the PacBio Support. I could see the issue is in running the data sequenced from Revio in snakemake workflow. We are almost nearing the completion of the project, where we sequenced 8 families in Sequel2 and had no issues running in the snakemake pipeline. For one family, we have data sequenced from sequel2 as well as from Revio and we have problem running the revio data in the snakemake workflow. Data structure of the Hifi reads bam files looks same (from Sequel2 and Revio). Is it okay to combine the reads from sequel2 and Revio and run it snakemake/cromwell workflow? Thank you

ADD REPLY • link 5 months ago by priya.bmg ▴ 60

1

Entering edit mode

I think it's likely that the version of the snakemake workflow you're running pre-dates Revio data and some of the tool versions used are not compatible, which is probably causing the problem in your initial error message.

The WDL workflow can combine reads from Sequel / Sequel II / Sequel IIe / Revio without any issues.