sniffles process cannot end with warnings
1
0
Entering edit mode
2.6 years ago
Maxine ▴ 50

python 3.7.9 sniffles 2.0.6

I run sniffles with BAM+CSI index on cluster. I don't know why the process is always stuck at ”building of index for XXX failed“. Here is the commands and reactions below:

module load python/3.7
virtualenv --no-download p37
source p37/bin/activate
pip install --no-index --upgrade pip
pip install ~/sniffles-2.0.6-py3-none-any.whl

sniffles --input /home/maxine91/scratch/sorted_align_bam.dir/aln.sorted.cy201704.bam \
    --vcf cy201704.vcf.gz \
    --snf cy201704.snf \
    --tandem-repeats /home/maxine91/projects/def-jfu/data/bufo_genome/01.repeat_annotation/Trf.bed \
    --reference /home/maxine91/projects/def-jfu/data/bufo_genome/genome.fa \
    -t 31

reactions:

Running Sniffles2, build 2.0.6
  Run Mode: call_sample
  Start on: 2022/05/04 23:40:41
  Working dir: /scratch/maxine91/call.dir
  Used command: /home/maxine91/p37/bin/sniffles --input /home/maxine91/scratch/sorted_align_bam.dir/aln.sorted.cy201704.bam --vcf cy201704.vcf.gz --snf cy201704.snf --tandem-repeats /home/maxine91/projects/def-jfu/data/bufo_genome/01.repeat_annotation/Trf.bed --reference /home/maxine91/projects/def-jfu/data/bufo_genome/genome.fa -t 31
==============================
Opening for reading: /home/maxine91/scratch/sorted_align_bam.dir/aln.sorted.cy201704.bam
Opening for reading: /home/maxine91/projects/def-jfu/data/bufo_genome/01.repeat_annotation/Trf.bed (tandem repeat annotations for 746 contigs)
Opening for reading: /home/maxine91/projects/def-jfu/data/bufo_genome/genome.fa
Opening for writing: cy201704.vcf.gz (single-sample, sorted, bgzipped, tabix-indexed)
Opening for writing: cy201704.snf
Info: 746 of 747 contigs in the input sample have associated tandem repeat annotations.

Analyzing 54732154 alignments total...

 3452674/54732154 alignments processed (6%, 15013/s); 737/747 tasks done; parallel 10/31; 457309 candidates. 99616 SVs. 
54732154/54732154 alignments processed (100%, 34536/s); 747/747 tasks done; parallel 0/31; 7416274 candidates. 1740031 SVs. 
Took 1584.74s.

WARNING: Unable to assign call at original_scaffold_2172_pilon:-8 to unambiguous task. (got 0 intervals). SVCall=SVCall(contig='original_scaffold_2172_pilon', pos=-8, id='DEL.1B4S131', ref='N', alt='<DEL>', qual=58, filter='GT', info={'STDEV_POS': 9.451631252505216, 'STDEV_LEN': 47.57099956906519, 'AF': 0.1282051282051282}, svtype='DEL', svlen=-1039, end=1031, genotypes={0: (0, 0, 44, 34, 5, None)}, precise=False, support=5, rnames=None, qc=True, nm=-1, postprocess=None, fwd=2, rev=3, coverage_upstream=None, coverage_downstream=43, coverage_start=None, coverage_center=37, coverage_end=41)
ERROR: 1 calls ignored, but only 0 were reassigned to correct tasks
Generating index for cy201704.vcf.gz...
[E::hts_idx_check_range] Region 872550..587513178 cannot be stored in a tbi index. Try using a csi index with min_shift = 14, n_lvls >= 6
Traceback (most recent call last):
  File "/home/maxine91/p37/bin/sniffles", line 613, in <module>
    Sniffles2_Main(config.from_cmdline(),processes)
  File "/home/maxine91/p37/bin/sniffles", line 588, in Sniffles2_Main
    pysam.tabix_index(config.vcf,preset="vcf",force=True)
  File "pysam/libctabix.pyx", line 1035, in pysam.libctabix.tabix_index
OSError: building of index for cy201704.vcf.gz failed

The process always stacked on "building of index for cy201704.vcf.gz failed". And the process will not end either, unless it is manually forced to quit.

So, my questions are,

  1. Do the warning (WARNING: Unable to assign call at original_scaffold_2172_pilon:-8 to unambiguous task. (got 0 intervals). ) and ERROR (1 calls ignored, but only 0 were reassigned to correct tasks) matter?
  2. Why does the index generate failed?
  3. Is the stacking of process because of index building failed?

Thanks very much for helping.

Maxine

SV calling sniffles • 1.1k views
ADD COMMENT
1
Entering edit mode
2.6 years ago

I think, the relevant error is Region 872550..587513178 cannot be stored in a tbi index. Try using a csi index with min_shift = 14, n_lvls >= 6.

The problem is, that many genomic tools were written with human or mouse genomes in mind, which do not have really huge chromosomes / contigs. Therefore, the authors of these tools believed that allowing for a maximum length of 512 million bases per chromosome will be sufficient. This e.g. also applies to samtools, so when running samtools index /home/maxine91/scratch/sorted_align_bam.dir/aln.sorted.cy201704.bam you should also get a similar error saying samtools index: failed to create index for aln.sorted.cy201704.bam: Numerical result out of range.

In the case of your error, the tabix index can't be build. In a Github issue about this topic, the author of Sniffles recently mentioned that they just use htslib to index the sequences, so if you already provide a pre-generated CSI index of your bam, it might work. See the --csi option of tabix.

ADD COMMENT
0
Entering edit mode

Thank you for the answer! The problem is indeed because of huge chromosomes as you mentioned. The developer suggest me to add csi=True in line 588 of sniffles, and pysam version 0.17.0+ is neccessary (see github issue). I tried to run sniffles again with the two suggestions. The CSI index generated successfully. But the warning and error part show again:

WARNING: Unable to assign call at original_scaffold_2172_pilon:-8 to unambiguous task. (got 0 intervals). SVCall=SVCall(contig='original_scaffold_2172_pilon', pos=-8, id='DEL.1B4S131', ref='N', alt='<DEL>', qual=58, filter='GT', info={'STDEV_POS': 9.451631252505216, 'STDEV_LEN': 47.57099956906519, 'AF': 0.1282051282051282}, svtype='DEL', svlen=-1039, end=1031, genotypes={0: (0, 0, 44, 34, 5, None)}, precise=False, support=5, rnames=None, qc=True, nm=-1, postprocess=None, fwd=2, rev=3, coverage_upstream=None, coverage_downstream=43, coverage_start=None, coverage_center=37, coverage_end=41)
ERROR: 1 calls ignored, but only 0 were reassigned to correct tasks

What do you think about? Do those matter?

ADD REPLY
1
Entering edit mode

I am sadly not familiar with Sniffles or toad genome assemblies and structural variations, but if the software run finishes otherwise normal, I would not be overly concerned.

Judged solely by the error message, it seems that Sniffles detects a deletion with the ID DEL.1B4S131, which is supposedly 1039 bases long, but ends at position 1031 of the contig original_scaffold_2172_pilon. Since a chromosomal position start position of -8 is not reasonable, Sniffles ignores this call.

I have no idea about the quality of the reference genome you are using, but as the call refers to a deletion at the distal end of an unplaced contig, I think it can be safely ignored. If Sniffles doesn't finish the run because of this call, removing those alignments from the source BAM file is probably the most straightforward solution. On the other hand, It might be interesting to see where the first part of those reads map, because it could shed light on a possible position of the unplaced contig?

ADD REPLY

Login before adding your answer.

Traffic: 1886 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6