Question

FeatureCounts output and downstream analysis

0

Entering edit mode

3.7 years ago

kb_93 ▴ 10

Hello!

I used featureCounts to assign features to my mapped reads and I had a varying % ( ~ 20 -65%) of unassigned ambiguity across my files using the command shown below.

featureCounts -T 8 -s 1 *.BAM -a combo_2.gtf   -g gene_id  -o counts.txt

I then added -O --largestOverlap to the command line to assign reads to overlapping features and across all files there was 100% assigned reads. So my question is for downstream analysis is it ok to use the counts files that includes reads that overlapped with features or is it more appropriate to use the count files where there was no overlapping? (I will be using DESeq2 in downstream analysis)

Also, is there a way to get the reads that weren't assigned in featureCounts into a separate bam file?

Any feedback/help would be great!

Thanks

RNA-Seq sequencing featureCounts gene DESeq2 • 2.0k views

ADD COMMENT • link updated 3.7 years ago by GenoMax 147k • written 3.7 years ago by kb_93 ▴ 10

1

Entering edit mode

If it was not assigned then the reads do not overlap exons. I would therefore make a BED file out of the intervals of the GTF file being classified as "exon", then get the complement of this file with the genome (bedtools complement). The result is a "non-exon" BED file which you can use with samtools view and its -L option. This option will only output reads that overlap with the provided BED file, therefore in your case the non-exon-overlapping reads which should I guess be the unassigned reads.

ADD REPLY • link 3.7 years ago by ATpoint 85k

0

Entering edit mode

Thank you for the quick response! I'm working with small RNAs so the GTF files I have don't have exons. Would you be able to suggest another method?

Thanks

ADD REPLY • link 3.7 years ago by kb_93 ▴ 10

1

Entering edit mode

Also, is there a way to get the reads that weren't assigned in featureCounts into a separate bam file?

Not directly but you can use following option when running featureCounts to get that information and then reads.

# Assignment results for each read

  -R <format>         Output detailed assignment results for each read or read-
                      pair. Results are saved to a file that is in one of the
                      following formats: CORE, SAM and BAM. See Users Guide for
                      more info about these formats.

  --Rpath <string>    Specify a directory to save the detailed assignment
                      results. If unspecified, the directory where counting
                      results are saved is used.