Question

Interpreting HTSeq output file

0

Entering edit mode

5.8 years ago

makwana.kd ▴ 60

I have an output file (text format) which I exported into excel spreadsheet. I see three columns, but I do not see the numeric value for the counts. Is this normal?

Column1 Column2 Column3

"   XF"     Z   __ambiguous[ENSMUSG00000098178.1+ENSMUSG00000106106.2]

"   XF"     Z   __ambiguous[ENSMUSG00000098178.1+ENSMUSG00000106106.2]

"   XF"     Z   __alignment_not_unique

"   XF"        Z    __alignment_not_unique

"   XF"     Z   __alignment_not_unique

"   XF"       Z __alignment_not_unique

"   XF"       Z __no_feature

"   XF"       Z __no_feature

"   XF"     Z   __alignment_not_unique

RNA-Seq • 3.4k views

ADD COMMENT • link updated 5.8 years ago by brianj.park ▴ 60 • written 5.8 years ago by makwana.kd ▴ 60

0

Entering edit mode

3 (5) words: "never use excel" ( for this)

importing this kind of data files into excel can often cause unexpected behaviour.

You're better of processing this file commandline in your linux environment. I assume you ran the previous steps commandline as well, no?

Now for your specific issue: can you post the output of head <your htseq output file>

ADD REPLY • link 5.8 years ago by lieven.sterck 15k

0

Entering edit mode

Hi Lieven, Following is the command i used :

htseq-count -m union -f bam -s no -r name ALZT22-2Cunsorted.bam geneassembly.gff3 -o counread.text

The bam file is name sorted

head command gives me the following output:

XF:Z:__ambiguous[ENSMUSG00000098178.1+ENSMUSG00000106106.2] XF:Z:__ambiguous[ENSMUSG00000098178.1+ENSMUSG00000106106.2] XF:Z:__alignment_not_unique XF:Z:__alignment_not_unique XF:Z:__alignment_not_unique XF:Z:__alignment_not_unique XF:Z:__no_feature XF:Z:__no_feature XF:Z:__alignment_not_unique XF:Z:__alignment_not_unique

ADD REPLY • link 5.8 years ago by makwana.kd ▴ 60

0

Entering edit mode

from which file is this the head ?

It does not looks to be from counread.text , is it? If so then the output from your htseq command is not correct

ADD REPLY • link 5.8 years ago by lieven.sterck 15k

0

Entering edit mode

Sorry, there was a misspelling in the above-mentioned command. This is the corrected one:

htseq-count -m union -f bam -s no -r name ALZT22-2Cunsorted.bam geneassembly.gff3 -o countread.text

Yes, the head command output was for countread.text

krishna@dntdaretouchit:/mnt/e/cannon$ head countread.text XF:Z:__ambiguous[ENSMUSG00000098178.1+ENSMUSG00000106106.2] XF:Z:__ambiguous[ENSMUSG00000098178.1+ENSMUSG00000106106.2] XF:Z:__alignment_not_unique XF:Z:__alignment_not_unique XF:Z:__alignment_not_unique XF:Z:__alignment_not_unique XF:Z:__no_feature XF:Z:__no_feature XF:Z:__alignment_not_unique XF:Z:__alignment_not_unique krishna@dntdaretouchit:/mnt/e/cannon$

ADD REPLY • link 5.8 years ago by makwana.kd ▴ 60

0

Entering edit mode

Did you not mention in a previous post you converted the bam file to sam format. If so then you need to change your htseq command accordingly.

In any case the output of your countread.text file is not correct (looks like a kind of sam format?)

ADD REPLY • link 5.8 years ago by lieven.sterck 15k

0

Entering edit mode

That was a different BAM file which was giving me an error, so I converted it to SAM file and I ran through HTSeq, that file gave me the following output:

chr1 3206084 255 1S139M = 3206084 -139 NTACAGTTAACCAACTTATACAGTTAACCAACTCCTACACTAGGTTCCTGAGCATTTCCTTAAACTTGCTAGTTCTGGTTTCCTGGCATGTGAGAGTAAGTCACATGGTAGGAGGCTGCCTTTCTATCJJJFJJFJJJAJFJJJFJFFJFJJJJJJJJJJJJJJJJJJJJJJJFJJJJJJJJFJJJJJA<<jjjjjjjjjjjjjjjjjjjjjjjjjjjjjjfjjjjjjjffjjjjjfjfjjjjjjjjfjjjjjjjjjjjjjjafaaa nh:i:1="" hi:i:1="" as:i:276="" nm:i:0="" xf:z:ensmusg00000051951.5="" gwnj-0965:181:gw180227920:7:2124:9881:65265="" 163="" chr1="" 3206084="" 255="" 139m1s="3206084" 139="" tacagttaaccaacttatacagttaaccaactcctacactaggttcctgagcatttccttaaacttgctagttctggtttcctggcatgtgagagtaagtcacatggtaggaggctgcctttctatcattcaattttagn<="" p="">

Because I wanted to bypass the BAM-SAM conversion step (I have 36 files and each SAM file would be around 40,000,000KB), I wanted to try a different BAM file and hence I generated a new BAM file STAR aligner which was sorted by name and ran it through the HTSeq. And this is the file which is giving me above mentioned output in this post.

ADD REPLY • link 5.8 years ago by makwana.kd ▴ 60

score 2 · Answer 1 · 2019-03-21

OK, clearly I was not paying attention :/

You are looking at the wrong file counread.text which, despite its name, will contain the SAM alignments, not the read counts. The read count output is written to STDOUT (== your screen in this case) , you will have to capture that in a file to have the read count table

try a cmdline as follows:

htseq-count -m union -f bam -s no -r name ALZT22-2Cunsorted.bam geneassembly.gff3 -o alns.sam > countread.text

now your read count table will be in the file counread.text