I am setting up a Snakemake
pipeline for sequencing reads alignment and variants calling. But the samtools index
rule is not activated, and the subsequent haplotype caller
rule fail.
I think it is because the samtools index
rule is not perceived as necessary to execute the output of rule all
by Snakemake
, but not sure about it.
Has anyone experienced the same issue?
import os
import snakemake.io
import glob
import pandas as pd
###### Config file and sample sheets #####
configfile: "config/config.yaml"
# Load the samples table:
table=pd.read_csv("config/table_reads.tsv", delim_whitespace=True, header=None, index_col=False)
# Use the samples table to make lists of samples names/lists:
SAMPLES=table.loc[:, 0].to_list()
READS=["1","2"]
# The first rule (here rule all) specifies the files that you would like to create during your snakemake workflow.
rule all:
input:
"calls/all.vcf"
#expand("mapped/{sample}.bam", sample=SAMPLES)
# ALIGNMENT:
rule fastqc:
input:
expand("reads/{sample}.{read}.fastq.gz", sample=SAMPLES, read=READS)
output:
html="qc/fastqc/{sample}.html",
zip="qc/fastqc/{sample}_fastqc.zip" # the suffix _fastqc.zip is necessary for multiqc to find the file. If not using multiqc, you are free to choose an arbitrary filename
params:
extra = "--quiet"
log:
"logs/fastqc/{sample}.log"
threads: 1
resources:
mem_mb = 1024
wrapper:
"v2.6.0/bio/fastqc"
rule bwa_index:
input:
config["reference"]
#"{genome}.fasta",
output:
idx=multiext("{genome}", ".amb", ".ann", ".bwt", ".pac", ".sa"),
log:
"logs/bwa_index/{genome}.log",
params:
algorithm="bwtsw",
wrapper:
"v2.6.0/bio/bwa/index"
# https://snakemake-wrappers.readthedocs.io/en/stable/wrappers/bwa/mem.html
rule bwa_mem:
input:
reads=["reads/{sample}.1.fastq.gz", "reads/{sample}.2.fastq.gz"],
idx=multiext("genome", ".amb", ".ann", ".bwt", ".pac", ".sa"),
output:
"mapped/{sample}.bam",
log:
"logs/bwa_mem/{sample}.log",
params:
extra=r"-R '@RG\tID:{sample}\tSM:{sample}'",
sorting="none", # Can be 'none', 'samtools' or 'picard'.
sort_order="queryname", # Can be 'queryname' or 'coordinate'.
sort_extra="", # Extra args for samtools/picard.
threads: 8
wrapper:
"v2.9.1/bio/bwa/mem"
# CALLING VARIANTS
# https://snakemake-wrappers.readthedocs.io/en/stable/wrappers/samtools/sort.html
rule samtools_sort:
input:
"mapped/{sample}.bam",
output:
"mapped/{sample}.sorted.bam",
log:
"{sample}.log",
params:
extra="-m 4G",
threads: 8
wrapper:
"v2.10.0/bio/samtools/sort"
# https://snakemake-wrappers.readthedocs.io/en/stable/wrappers/samtools/
rule samtools_index:
input:
"mapped/{sample}.sorted.bam",
output:
"mapped/{sample}.sorted.bam.bai",
log:
"logs/samtools_index/{sample}.log",
params:
extra="", # optional params string
threads: 4 # This value - 1 will be sent to -@
wrapper:
"v2.10.0/bio/samtools/index"
# https://snakemake-wrappers.readthedocs.io/en/stable/wrappers/gatk/haplotypecaller.html
rule haplotype_caller_gvcf:
input:
# single or list of bam files
bam="mapped/{sample}.sorted.bam",
ref=config["reference"],
# known="dbsnp.vcf" # optional
output:
gvcf="calls/{sample}.g.vcf",
# bam="{sample}.assemb_haplo.bam",
log:
"logs/gatk/haplotypecaller/{sample}.log",
params:
extra="", # optional
java_opts="", # optional
threads: 4
resources:
mem_mb=1024,
wrapper:
"v2.6.0/bio/gatk/haplotypecaller"
# https://snakemake-wrappers.readthedocs.io/en/stable/wrappers/gatk/genomicsdbimport.html
rule genomics_db_import:
input:
gvcfs=expand("calls/{sample}.g.vcf", sample=SAMPLES),
output:
db=directory("db"),
log:
"logs/gatk/genomicsdbimport.log",
params:
intervals=config["reference"],
db_action="create", # optional
extra="", # optional
java_opts="", # optional
threads: 2
resources:
mem_mb=lambda wildcards, input: max([input.size_mb * 1.6, 200]),
wrapper:
"v2.6.0/bio/gatk/genomicsdbimport"
# https://snakemake-wrappers.readthedocs.io/en/stable/wrappers/gatk/genotypegvcfs.html
rule genotype_gvcfs:
input:
gvcf="db", # combined gvcf over multiple samples
# N.B. gvcf or genomicsdb must be specified
# in the latter case, this is a GenomicsDB data store
ref=config["reference"],
output:
vcf="calls/all.vcf",
log:
"logs/gatk/genotypegvcfs.log"
params:
extra="", # optional
java_opts="", # optional
resources:
mem_mb=1024
wrapper:
"v2.6.0/bio/gatk/genotypegvcfs"
It seems that your first solution works well, and does not break the wrapper. I have added
idx="mapped/{sample}.sorted.bam.bai"
to the inputs of the haplotype caller, and it works well. Thank you for the information!