I reopen an old topic for me that I need to clarify.
From the code of bbsplit : https://github.com/BioInfoTools/BBMap/blob/master/sh/bbsplit.sh
In Rnaseq Nextflow pipeline, the parameter was set as followed :
ambiguous2=all
My understanding is that it keeps both reads mapped in both genomes.
https://github.com/nf-core/rnaseq/issues/1408
So I set up ambiguous2=toss but I get more reads that previously with "ambiguous2=all".
That's not what we should expected.
I should have less reads because these reads are considered unmapped if you set ambiguous2=toss.
Am I understanding something wrong ?
What would be the good set up to count only reads map unambigously on one reference only ?
This is NOT the official repository of BBTools. SF is the official repo : https://sourceforge.net/projects/bbmap/
What does this mean? Can you post output of the results stats for both runs :
These outputs are correct and don't help to resolve the whole stuff. I'm using a part of the nextflow rnaseq pipeline, then I try to extract the counts from the bam (expected to be with contaminent reads) with Rsubread::featureCounts. I do that because I don't want to use the output of salmon and the star count outputs have been erased at some point. FeatureCounts can give various counts depending of the options. It's hard to reproduce the same exact way STAR counts. Make things hard to compare. It's a side project where I didn"t want to spend much time. So anyway it's a dead end. Thank you for your comment.