Getting Successfully Aligned Percentage for htseq-count output
1
0
Entering edit mode
2.6 years ago

I am using htseq-count for analyzing some plant RNA-seq data. After running it on a sortedBam file with-

htseq-count -m union -f bam -c htseq_output.csv filename.bam predicted_transcript_exons.gtf

my final output line in the terminal is-

4082373 alignments  processed

When I check the last few line of the output csv file, it shows-

enter image description here

As far as I understand, successfully assigned read percentage for this file is {4082373 - (1845051+433107)} / 4082373 , that is 44.19%

But if I can't keep track of the command-line and I want to count successfully aligned read percentage from the output file solely, what should be the way? The way I thought of is at first getting the total read number present in the file with-

cat htseq_output.csv | awk -F\\t '{sum += $2} END {print sum}' Which should give me the number 4082373 (successful+unsucessfully assigned reads). But I am getting the number 2278407 as output. I thought that I could use this number as,

[2278407 (total_reads_in_this_file) - {1845051 (no_feature) + 433107 (alignment_not_unique)}] / 2278407 (total_reads_in_this_file)

that gives me close to 0% successfully assigned read percentage.

What am I missing?

Sorry if the question is too dumb and too long. Thanks in advance.

transcriptomics htseq htseq-count rna-seq workflow • 703 views
ADD COMMENT
0
Entering edit mode
2.6 years ago

Before starting to process your output with a custom awk script, I would give MultiQC a try. It has a module for HTSeq and the reports are really helpful, in particular if you have log files from multiple tools.

ADD COMMENT

Login before adding your answer.

Traffic: 2341 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6