Here's my bowtie2 & featurecounts commands:
( bowtie2 -x output/reference.fa.gz -1 cleaned_1.fastq.gz -2 cleaned_2.fastq.gz --threads 2 --seed 0 --met-file bowtie2/metrics.txt --no-unal --un-conc-gz bowtie2/unmapped_%.fastq.gz ) | ( samtools sort --threads 2 --reference output/reference.fa.gz -T tmp/samtools_sort > bowtie2/mapped.sorted.bam )
featureCounts -a output/reference.gff -o featurecounts/featurecounts.orfs.tsv -F GTF --tmpDir tmp -T 2 -g gene_id -t CDS bowtie2/mapped.sorted.bam
Here's my featurecounts summary:
Assigned 2506757
Unassigned_Unmapped 0
Unassigned_Read_Type 0
Unassigned_Singleton 0
Unassigned_MappingQuality 0
Unassigned_Chimera 0
Unassigned_FragmentLength 0
Unassigned_Duplicate 0
Unassigned_MultiMapping 0
Unassigned_Secondary 0
Unassigned_NonSplit 0
Unassigned_NoFeatures 202251
Unassigned_Overlapping_Length 0
Unassigned_Ambiguity 189598
It says there's no multimapped reads for all my samples but there's a lot of different strains of the same species in there so I would expect there to be at least 1 read that's multimapped in the entire dataset which makes me think I'm doing something incorrect.
Is there an association between Unassigned_Ambiguity
and Unassigned_MultiMapping
?
A multi-mapping read is a read that maps to more than one location in the reference genome. There are multiple options for counting such reads.
When assigning reads to genes or exons, most reads can be successfully assigned without ambiguity. However if reads are to be assigned to transcripts, due to the high overlap between transcripts from the same gene, many reads will be found to overlap more than one transcript and therefore cannot be uniquely assigned.
https://bioconductor.org/packages/release/bioc/vignettes/Rsubread/inst/doc/SubreadUsersGuide.pdf