HIsat2 bam and summary output - multiple output file type
2
0
Entering edit mode
5.4 years ago
Payal ▴ 160

Hello,

The general command for Hisat2 alignment is:

hisat2 --threads 8 -t -p 8 --summary-file summary_txt -x /hisat2_index/index  -1 read1.fastq.gz -2 read2.fastq.gz -S aligned.sam

I use the following script to run my Hisat2 pipelines. The bam files I am getting are fine. My question is how can I get the summary file for alignment too? How to frame the output to capture the summary.txt files for each alignment?

HISAT2_INDEX_PREFIX = "/hisat2_index/index"

SAMPLES, *_=glob_wildcards('/samples/{sample}.read1.fastq.gz')

rule align_all_samples:
    input: expand("samples/{sample}.bam", sample=SAMPLES)

rule align_hisat2:
input:
    hisat2_index=expand(f"{HISAT2_INDEX_PREFIX}.{{ix}}.ht2", ix=range(1, 9)),
    fastq1="/samples/{sample}.read1.fastq.gz",
    fastq2="/samples/{sample}.read2.fastq.gz",
output:
    bam = "/samples/align_hisat2_update1/{sample}.bam",
log: "/samples/snakemake_log.txt"
threads: 8
shell:
    "hisat2 -p {threads} -x {HISAT2_INDEX_PREFIX}"
    " -1 {input.fastq1} -2 {input.fastq2}  |"
    "samtools sort -@ {threads} -o {output.bam}"

I tried taking another output variable like summary file under the outputs section, but that gives me error! Can I club two different types of output files in the rule_all?

Thanks, Payal

RNA-Seq snakemake sequencing • 4.6k views
ADD COMMENT
1
Entering edit mode

Isn't that printed to stderr? So just redirect that to a new file.

ADD REPLY
0
Entering edit mode

I am sorry, didn't understand. Can you please elaborate?

ADD REPLY
1
Entering edit mode

Please google what stderr is in the Unix world and how you can save things printed to screen to a file. https://askubuntu.com/questions/625224/how-to-redirect-stderr-to-a-file

ADD REPLY
0
Entering edit mode

Thank you Devon and AT.

I see what you are saying. I will try this. Will this output all sample summaries in one summary file like a log file or individual samples will output their individual summary report?

ADD REPLY
1
Entering edit mode
5.4 years ago

The stderr method suggested by Devon should work, but it is cleaner to just use the built-in --summary-file option. Just modify the "shell" part of your script into something like this:

"hisat2 -p {threads} --summary-file {sample}.summary.txt -x {HISAT2_INDEX_PREFIX} -1 {input.fastq1} -2 {input.fastq2}  | samtools sort -@ {threads} -o {output.bam}"
ADD COMMENT
0
Entering edit mode

Thank you Carlo. I tried this but it gives me

NameError:
The name 'sample' is unknown in this context. Please make sure that you defined that variable. Also note that braces not used for variable access have to be escaped by repeating them, i.e. {{print $1}}

How should I fix this?

ADD REPLY
1
Entering edit mode

I'm not really familiar with snakemake, so I'm not sure, but you can perhaps try this:

rule align_hisat2:
input:
    hisat2_index=expand(f"{HISAT2_INDEX_PREFIX}.{{ix}}.ht2", ix=range(1, 9)),
    fastq1="/samples/{sample}.read1.fastq.gz",
    fastq2="/samples/{sample}.read2.fastq.gz",
output:
    bam = "/samples/align_hisat2_update1/{sample}.bam",
    txt = "/samples/align_hisat2_update1/{sample}.txt",
log: "/samples/snakemake_log.txt"
threads: 8
shell:
    "hisat2 -p {threads} -x {HISAT2_INDEX_PREFIX}"
    " -1 {input.fastq1} -2 {input.fastq2}  --summary-file {output.txt} |"
    "samtools sort -@ {threads} -o {output.bam}"

The alignment summary should (hopefully) be saved in the "/samples/align_hisat2_update1/{sample}.txt" file.

ADD REPLY
1
Entering edit mode

Thank you very much Carlo. I am also new to snakemake, so I test and periodically try to improve my script.

Actually, I just tested my scirpt using the same logic that you have described here. It works perfectly now. I was going to post my updated script, coincidentally you have already posted it with the same logic.

Cheers! Payal

ADD REPLY
0
Entering edit mode
2.9 years ago

Hi, may I know how do you interpret the output using HISAT2? I use the data from the public paper (Pertea et al. - 2016 - Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown) to learn RNAseq analysis.

The command is 'hisat2 -p 8 --dta -x chrX_data/indexes/chrX_tran -1 chrX_data/samples/ERR188044_chrX_1.fastq.gz -2 chrX_data/samples/ERR188044_chrX_2.fastq.gz -S ERR188044_chrX.sam'. and the log is shown below.

1321477 reads; of these: 1321477 (100.00%) were paired; of these:

121980 (9.23%) aligned concordantly 0 times
1183228 (89.54%) aligned concordantly exactly 1 time
16269 (1.23%) aligned concordantly >1 times
----
121980 pairs aligned concordantly 0 times; of these:
  4166 (3.42%) aligned discordantly 1 time
----
117814 pairs aligned 0 times concordantly or discordantly; of these:
  235628 mates make up the pairs; of these:
    119930 (50.90%) aligned 0 times
    113038 (47.97%) aligned exactly 1 time
    2660 (1.13%) aligned >1 times

95.46% overall alignment rate

Sample id Uniquely aligned reads (%) Multimapped reads (%) Overall alignment rate (%)

ERR188044 73.1 21.9 95.0

ADD COMMENT
0
Entering edit mode

Please post this as a new question and mention which part you do not understand.

ADD REPLY

Login before adding your answer.

Traffic: 2045 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6