Hello!
I used featureCounts to assign features to my mapped reads and I had a varying % ( ~ 20 -65%) of unassigned ambiguity across my files using the command shown below.
featureCounts -T 8 -s 1 *.BAM -a combo_2.gtf -g gene_id -o counts.txt
I then added -O --largestOverlap to the command line to assign reads to overlapping features and across all files there was 100% assigned reads. So my question is for downstream analysis is it ok to use the counts files that includes reads that overlapped with features or is it more appropriate to use the count files where there was no overlapping? (I will be using DESeq2 in downstream analysis)
Also, is there a way to get the reads that weren't assigned in featureCounts into a separate bam file?
Any feedback/help would be great!
Thanks
If it was not assigned then the reads do not overlap exons. I would therefore make a BED file out of the intervals of the GTF file being classified as "exon", then get the complement of this file with the genome (
bedtools complement
). The result is a "non-exon" BED file which you can use withsamtools view
and its-L
option. This option will only output reads that overlap with the provided BED file, therefore in your case the non-exon-overlapping reads which should I guess be the unassigned reads.Thank you for the quick response! I'm working with small RNAs so the GTF files I have don't have exons. Would you be able to suggest another method?
Thanks
Not directly but you can use following option when running
featureCounts
to get that information and then reads.Thank you very much!