Question

BBSplit xenograft Human-Mouse- RawCount

0

Entering edit mode

5.5 years ago

GiV17 ▴ 50

Hi, I have a problem with BBsplit. I have xenograft mouse-human rna-seq samples (paired fastq) and I had thought to using BBSplit to delete the mouse contamination.

So I used this command line:

bbsplit.sh in1=reads1.fq in2=reads2.fq ref=human.fa,mouse.fa ambiguous2=toss basename=out_%.fq refstats=Statistics_%.txt

Than, I have remapped the output fastq file for the human reference with STAR and then I would like to use FeatureCount to recostruct the rawcount of the genes, but it doesn't work well.

Can you recommend a pipeline to follow for rna-seq data after using bbsplit? Thaks so much for the reply.

RNA-Seq • 2.2k views

ADD COMMENT • link updated 5.5 years ago by Biostar 20 • written 5.5 years ago by GiV17 ▴ 50

0

Entering edit mode

but it doesn't work well.

What does not work well? After you bin the reads they should be able to map to human/mouse genomes normally. Can you post what the refstats looked like?

There is also XenofilteR (https://github.com/PeeperLab/XenofilteR ) but it sounds like your problem is not with the binning.

ADD REPLY • link 5.5 years ago by GenoMax 147k

0

Entering edit mode

RefStats of one of my samples:

name    %unambiguousReads   unambiguousMB   %ambiguousReads ambiguousMB unambiguousReads    ambiguousReads  assignedReads   assignedBases
HG38    90.69745    4525.569508 6.03784 301.545669  59646218    3970720 63616938    4827115177
mm10    2.89875 144.634382  6.03784 301.545669  1906330 3970720 1906330 144634382

It seems that 90% of the total reads map on HG38 ok? Then, I use this single fastq file to map to Gencode reference using STAR and I obtain: Uniquely mapped reads % | 83.15%

Then, I use FeatureCount and I obtain:

Total alignments : 77189115                                           
Successfully assigned alignments : 8748229 (11.3%)

It Seems very low respect to STAR map...why?

Instead, if I don't use BBSplit and I map directly the fastq original file with human I obtain: 75.6% of Uniquely mapped reads % and with FeatureCount I obtaine 23.1%.

How is it possible? How can I fix it? Which of the two analyzes is better?

ADD REPLY • link updated 5.5 years ago by GenoMax 147k • written 5.5 years ago by GiV17 ▴ 50

0

Entering edit mode

Are you using the correct stand option (-s) when counting with featureCounts?

bbsplit is clearly able to assign 90% of your reads (which uses bbmap.sh under the covers) so there should be no reason why the split file should not align well directly.

ADD REPLY • link 5.5 years ago by GenoMax 147k

0

Entering edit mode

My original fastq files are prepared with reverse-stranded kit, maybe when I remove mouse of them and get a single fastq, Can I lose this info?

ADD REPLY • link 5.5 years ago by GiV17 ▴ 50

0

Entering edit mode

Fastq files themselves are not reverse stranded. The kit that was used for prep captured the reverse strand. Did you try using -s 2 option when counting reads? Have you examined your aligned files to ensure that reads are aligning properly (there is no general alignment outside of exons, i.e. possibility of DNA contamination in your prep)?

ADD REPLY • link 5.5 years ago by GenoMax 147k

0

Entering edit mode

yes, it is so!!! However, I used -s 2 and I obtained a low level count. I don't know if there is a DNA contamination... but what would this mean?

ADD REPLY • link 5.5 years ago by GiV17 ▴ 50

0

Entering edit mode

Have you examined your aligned files in IGV? Go to genes you know should be there and see what the alignments there look like. What happens if you use -s 0? Does the assignment % go up? Just to be sure we are discussing all this for the human part of your data? The mouse part has been separated?

ADD REPLY • link 5.5 years ago by GenoMax 147k