Understanding the output from HTseq-count
3
0
Entering edit mode
4.4 years ago
r.barton17 • 0

Hello,

I'm currently analysing some RNAseq data for differential gene expression analysis. I have completed the alignment stage using RNA STAR and have counted the reads mapped to each gene using HTseq-count but I am having real trouble understanding the output files. The file extension is ".counts" which doesn't help much but I can open them in R as tab delimited file. However the files don't seem to have any column names which means that I have no idea what each of the columns contains. I've looked through the documentation and online and I can't find anywhere that tells me exactly what these are. This is the first line of the file which is the first entry for each of the 16 columns: A00917:211:H35WCDSXY:2:1156:23773:13855 99 chr1 14481 255 150M chr1 14616 285 GGAGCCGTCCCCCCATGGAGCACAGGCAGACAGAAGTCCCCACCCCAGCTGTGTGGCCTCAGGCCAGCCTTCCGCTCCTTGAAGCTGGTCTCCGCACAGTGCTGGTTCCATCACCCCCACCCAGGGAAGCAGGTCTGAGCAGCTTGTCCT FF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF NH:i:1 HI:i:1 AS:i:282 nM:i:8 XF:Z:ENSG00000227232.

Does anyone have experience using HTseq-counts and can tell me what this information means? The files are also massive (~24 GB), is that normal? Thank you!

RNA-Seq • 2.6k views
ADD COMMENT
1
Entering edit mode
4.4 years ago
2nelly ▴ 350

This looks like the alignment file. Are you sure that having a look at HTseq output? Normally, you should get a two column file (gene name or ID and counts per gene).

ADD COMMENT
1
Entering edit mode
4.4 years ago
Shalu Jhanwar ▴ 540

The output file size is huge and is not normal. Could you share the command line used to generate counts from HTseq?

ADD COMMENT
0
Entering edit mode
4.4 years ago

Despite the name, that file is a sam file. It's supposed to be massive. (It also really should be converted to a compressed bam file)

The gene counting of STAR is supposed to mimic HTSeq-count, so you shouldn't have to run them both, unless you are doing something really clever when running HTSeq-count.

Does your STAR command line have --outSAMtype BAM or --quantMode GeneCounts?

ADD COMMENT

Login before adding your answer.

Traffic: 2408 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6