Hello,
I'm currently analysing some RNAseq data for differential gene expression analysis. I have completed the alignment stage using RNA STAR and have counted the reads mapped to each gene using HTseq-count but I am having real trouble understanding the output files. The file extension is ".counts" which doesn't help much but I can open them in R as tab delimited file. However the files don't seem to have any column names which means that I have no idea what each of the columns contains. I've looked through the documentation and online and I can't find anywhere that tells me exactly what these are. This is the first line of the file which is the first entry for each of the 16 columns: A00917:211:H35WCDSXY:2:1156:23773:13855 99 chr1 14481 255 150M chr1 14616 285 GGAGCCGTCCCCCCATGGAGCACAGGCAGACAGAAGTCCCCACCCCAGCTGTGTGGCCTCAGGCCAGCCTTCCGCTCCTTGAAGCTGGTCTCCGCACAGTGCTGGTTCCATCACCCCCACCCAGGGAAGCAGGTCTGAGCAGCTTGTCCT FF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF NH:i:1 HI:i:1 AS:i:282 nM:i:8 XF:Z:ENSG00000227232.
Does anyone have experience using HTseq-counts and can tell me what this information means? The files are also massive (~24 GB), is that normal? Thank you!