bbsplit : How to configure ambiguous/ambiguous2 parameters in order to consider as unmapped reads, reads mapped on both human/mouse references.
0
0
Entering edit mode
25 days ago
ZheFrench ▴ 590

I reopen an old topic for me that I need to clarify.

From the code of bbsplit : https://github.com/BioInfoTools/BBMap/blob/master/sh/bbsplit.sh

In Rnaseq Nextflow pipeline, the parameter was set as followed :

ambiguous2=all

My understanding is that it keeps both reads mapped in both genomes.

https://github.com/nf-core/rnaseq/issues/1408

So I set up ambiguous2=toss but I get more reads that previously with "ambiguous2=all".
That's not what we should expected.
I should have less reads because these reads are considered unmapped if you set ambiguous2=toss.
Am I understanding something wrong ?
What would be the good set up to count only reads map unambigously on one reference only ?

bbsplit nextflow • 1.5k views
ADD COMMENT
0
Entering edit mode

From the code of bbsplit : https://github.com/BioInfoTools/BBMap/blob/master/sh/bbsplit.sh

This is NOT the official repository of BBTools. SF is the official repo : https://sourceforge.net/projects/bbmap/

So I set up ambiguous2=toss but I get more reads that previously with "ambiguous2=all".

What does this mean? Can you post output of the results stats for both runs :

refstats=<file>     Write statistics on how many reads were assigned to which reference to this file.
ADD REPLY
0
Entering edit mode
#name   %unambiguousReads   unambiguousMB   %ambiguousReads ambiguousMB unambiguousReads    ambiguousReads  assignedReads   assignedBases
primary 99.59388    5772.091869 0.10761 5.819571    76708282    82880   76708282    5772091869
mm39    0.01374 0.767658    0.10761 5.819571    10584   82880   93464   6587229

These outputs are correct and don't help to resolve the whole stuff. I'm using a part of the nextflow rnaseq pipeline, then I try to extract the counts from the bam (expected to be with contaminent reads) with Rsubread::featureCounts. I do that because I don't want to use the output of salmon and the star count outputs have been erased at some point. FeatureCounts can give various counts depending of the options. It's hard to reproduce the same exact way STAR counts. Make things hard to compare. It's a side project where I didn"t want to spend much time. So anyway it's a dead end. Thank you for your comment.

ADD REPLY

Login before adding your answer.

Traffic: 2729 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6