Question

featureCounts output processing (command line)

0

Entering edit mode

6.3 years ago

2405592M ▴ 150

Hey guys! (new to RNA-seq) I'm using featureCounts on the command line to try and quantify my aligned .SAM reads (generated in bowtie2) and I'm trying to process the resulting output files. However, it looks like my output .txt file looks nothing like what I'm seeing online. the following was my command:

$ featureCounts -T 20 -a PATH/to/Homo_sapiens.GRCh38.93.chr.gtf -t exon -g gene_id -d 35 -D 105 -M -O -o ctrl_GRCh38_counts_multimap_overlap.txt PATH/to/ctrl_GRCh38_alignment.sam

The .txt file looked like the following:

ENSG00000223972 1;1;1;1;1;1;1;1;1       11869;12010;12179;12613;12613;12975;13221;13221;13453   12227;12057;12227;12721;12697;13052;13374;14409;13670   +;+;+;+;+;+;+;+;+       1735    0
ENSG00000227232 1;1;1;1;1;1;1;1;1;1;1   14404;15005;15796;16607;16858;17233;17606;17915;18268;24738;29534

Firstly, is that a normal .txt file? It looks nothing like the tab delimited outputs I'm seeing online. Is there some sort of clean up process I have to do here. Secondly, what can I use/how can I get data I can represent in the form of a graph/table with regards to the output file?

Thanks in advance!!!

RNA-Seq command line featureCounts • 6.6k views

ADD COMMENT • link updated 6.3 years ago by h.mon 35k • written 6.3 years ago by 2405592M ▴ 150

1

Entering edit mode

Looks pretty normal to me. The output is:

Geneid Chr Start End Strand Length BAM_File

It looks unintuitive because you summarize at the gene level, so all the start/end positions, the chromosome and strands of all exons are included. The 1735 is the length of the feature, and 0 is the total gene coverage. It is tab-delim, even though it may not look like it. The question now is what you want to do with the matrix, but that depends on your scientific question and only you can answer it. Please give some details. To have it a bit more organized, you could vizualize it in R, e.g.:

require(data.table)
tmp <- fread("~/count_matrix.txt, header = T, skip = 1)

View(tmp)

ADD REPLY • link 6.3 years ago by ATpoint 85k

1

Entering edit mode

Thank you so much ATpoint, I see it now!! (btw my input file was .sam) I've now combined all of my .sam files (i did them seperately before) to get a new output.txt. I'm sequencing tRNAs so I was expecting multimapping and overlapping. I'm trying to generate a figure of some sort that shows which genes my reads are aligning to. Specifically, I want to see the difference in the no. of reads pre and post treatment and show there are changes in sequence length of the two sets of reads pre and post treatment (length of the tRNA fragment and hence read length should increase post-treatment).

ADD REPLY • link 6.3 years ago by 2405592M ▴ 150

1

Entering edit mode

Did you add the -M flag to count multi-mappers and -p if this was a paired-end dataset?

ADD REPLY • link 6.3 years ago by GenoMax 147k

0

Entering edit mode

Yes. I used -M but not -p since I'm working with single end data

ADD REPLY • link 6.3 years ago by 2405592M ▴ 150

score 2 · Answer 1 · 2018-07-24

2

Entering edit mode

6.3 years ago

h.mon 35k

Yes, it is a regular text file, just with some particular formatting regarding tabs and columns - see some explanation at featureCounts output interpretation .

If you pass all your bams at once to featureCounts, it will output a complete table with counts for all samples. Typically one use the first and last (or n-last, if you counted n samples simultaneously) columns for differential gene expression, the most common downstream analysis.

ADD COMMENT • link 6.3 years ago by h.mon 35k

0

Entering edit mode

Thank you h.mon!! I've now combined all of my .sam files to get a new output.txt. I'm sequencing tRNAs so I was expecting multimapping and overlapping reads. I'm trying to generate a figure of some sort that shows which genes my reads are aligning to. Specifically, I want to see the difference in the no. of reads pre and post treatment (I think this is referred to as coverage) and show there are changes in sequence length of the two sets of reads pre and post treatment (length of the tRNA fragment and hence read length should increase post-treatment).

ADD REPLY • link 6.3 years ago by 2405592M ▴ 150

1

Entering edit mode

You would want to do some sort of normalization before you start comparing numbers.

show there are changes in sequence length of the two sets of reads pre and post treatment (length of the tRNA fragment and hence read length should increase post-treatment).

This part is going to be tricky to demonstrate since you only have short reads and I assume they won't span the region you are looking at.

ADD REPLY • link 6.3 years ago by GenoMax 147k