Hey guys! (new to RNA-seq) I'm using featureCounts on the command line to try and quantify my aligned .SAM reads (generated in bowtie2) and I'm trying to process the resulting output files. However, it looks like my output .txt file looks nothing like what I'm seeing online. the following was my command:
$ featureCounts -T 20 -a PATH/to/Homo_sapiens.GRCh38.93.chr.gtf -t exon -g gene_id -d 35 -D 105 -M -O -o ctrl_GRCh38_counts_multimap_overlap.txt PATH/to/ctrl_GRCh38_alignment.sam
The .txt file looked like the following:
ENSG00000223972 1;1;1;1;1;1;1;1;1 11869;12010;12179;12613;12613;12975;13221;13221;13453 12227;12057;12227;12721;12697;13052;13374;14409;13670 +;+;+;+;+;+;+;+;+ 1735 0
ENSG00000227232 1;1;1;1;1;1;1;1;1;1;1 14404;15005;15796;16607;16858;17233;17606;17915;18268;24738;29534
Firstly, is that a normal .txt file? It looks nothing like the tab delimited outputs I'm seeing online. Is there some sort of clean up process I have to do here. Secondly, what can I use/how can I get data I can represent in the form of a graph/table with regards to the output file?
Thanks in advance!!!
Looks pretty normal to me. The output is:
It looks unintuitive because you summarize at the gene level, so all the start/end positions, the chromosome and strands of all exons are included. The 1735 is the length of the feature, and 0 is the total gene coverage. It is tab-delim, even though it may not look like it. The question now is what you want to do with the matrix, but that depends on your scientific question and only you can answer it. Please give some details. To have it a bit more organized, you could vizualize it in R, e.g.:
Thank you so much ATpoint, I see it now!! (btw my input file was .sam) I've now combined all of my .sam files (i did them seperately before) to get a new output.txt. I'm sequencing tRNAs so I was expecting multimapping and overlapping. I'm trying to generate a figure of some sort that shows which genes my reads are aligning to. Specifically, I want to see the difference in the no. of reads pre and post treatment and show there are changes in sequence length of the two sets of reads pre and post treatment (length of the tRNA fragment and hence read length should increase post-treatment).
Did you add the
-M
flag to count multi-mappers and-p
if this was a paired-end dataset?Yes. I used
-M
but not-p
since I'm working with single end data