Question

Getting Successfully Aligned Percentage for htseq-count output

0

Entering edit mode

2.6 years ago

nishat04bmb15 • 0

I am using htseq-count for analyzing some plant RNA-seq data. After running it on a sortedBam file with-

htseq-count -m union -f bam -c htseq_output.csv filename.bam predicted_transcript_exons.gtf

my final output line in the terminal is-

4082373 alignments  processed

When I check the last few line of the output csv file, it shows-

enter image description here

As far as I understand, successfully assigned read percentage for this file is {4082373 - (1845051+433107)} / 4082373 , that is 44.19%

But if I can't keep track of the command-line and I want to count successfully aligned read percentage from the output file solely, what should be the way? The way I thought of is at first getting the total read number present in the file with-

cat htseq_output.csv | awk -F\\t '{sum += $2} END {print sum}' Which should give me the number 4082373 (successful+unsucessfully assigned reads). But I am getting the number 2278407 as output. I thought that I could use this number as,

[2278407 (total_reads_in_this_file) - {1845051 (no_feature) + 433107 (alignment_not_unique)}] / 2278407 (total_reads_in_this_file)

that gives me close to 0% successfully assigned read percentage.

What am I missing?

Sorry if the question is too dumb and too long. Thanks in advance.

transcriptomics htseq htseq-count rna-seq workflow • 703 views

ADD COMMENT • link updated 2.6 years ago by Matthias Zepper 5.0k • written 2.6 years ago by nishat04bmb15 • 0

score 0 · Answer 1 · 2022-04-27

0

Entering edit mode

2.6 years ago

Matthias Zepper 5.0k

Before starting to process your output with a custom awk script, I would give MultiQC a try. It has a module for HTSeq and the reports are really helpful, in particular if you have log files from multiple tools.

ADD COMMENT • link 2.6 years ago by Matthias Zepper 5.0k