Question

featureCounts with 3' tag sequencing

0

Entering edit mode

5.6 years ago

luca ▴ 70

Hi everyone, I performed an RNA-seq approach on mice using 3'tag sequencing. I mapped the reads on the mouse genome using STAR (on average >70% reads mapped uniquely) and I wanted to get the raw counts with featureCounts. The code I am using is this:

featureCounts -a Mus_musculus.GRCm38.95.gtf -t exon -g gene_id --primary -T 16 -o counts_w_extraAttributes-primary.txt E12.5/Aligned.sortedByCoord.out.bam E14.5/Aligned.sortedByCoord.out.bam E18.5/Aligned.sortedByCoord.out.bam...

The output from featureCount is kind of strange (to me at least) because it says that the "Successfully assigned alignments" is, on average, 40%. I think it is quite low as number, so I was wondering if I am doing something incorrect?

Thanks for your helpful replies, Best Luca

RNA-Seq alignment • 1.7k views

ADD COMMENT • link 5.6 years ago by luca ▴ 70

0

Entering edit mode

Have you tried to add -M option to see how the counts change? Also important to keep in mind that while STAR may have been able to map a certain % of reads unless there is a feature defined for a region, reads will not be counted. Is 3'-tag sequencing capturing a certain strand (top/bottom) then you should specify that as well (-s option). By default featureCounts treats data as unstranded (-s 0).

Edit: I am going to edit this post since I have hit my post limit for the day.

If your kit was stranded then definitely use the right -s option (sounds like -s 1 is that option).

ADD REPLY • link 5.6 years ago by GenoMax 147k

0

Entering edit mode

Dear genomax, Thanks for your reply. I tried adding the -M option and the % of Successfully assigned alignments increases on average by 15/20%. I have not specified any strand with the -s option but the kit is strand specific. I checked and the best results are with -s 1. Do you think I should include -M and count also the multi mapping reads?

ADD REPLY • link 5.6 years ago by luca ▴ 70

0

Entering edit mode

Thanks genomax! In relation to the multi mapping reads, is there a "gold standard" procedure (i.e. to include them or exclude them)? Thanks Luca

ADD REPLY • link 5.6 years ago by luca ▴ 70

0

Entering edit mode

Multi-mapped reads are generally excluded since you can't be sure of the gene/region they originated from. Some aligners allow you to place them at a random spot out of all the places that they map to.

There are alternate strategies (e.g. mapping instead of alignment in salmon, https://salmon.readthedocs.io/en/latest/ ) which can be used to deal with them. Since you have 3'-end specific data I am not sure you can use that option.