Entering edit mode
5.5 years ago
GiV17
▴
50
Hi, I have a problem with BBsplit. I have xenograft mouse-human rna-seq samples (paired fastq) and I had thought to using BBSplit to delete the mouse contamination.
So I used this command line:
bbsplit.sh in1=reads1.fq in2=reads2.fq ref=human.fa,mouse.fa ambiguous2=toss basename=out_%.fq refstats=Statistics_%.txt
Than, I have remapped the output fastq file for the human reference with STAR and then I would like to use FeatureCount to recostruct the rawcount of the genes, but it doesn't work well.
Can you recommend a pipeline to follow for rna-seq data after using bbsplit? Thaks so much for the reply.
What does not work well? After you bin the reads they should be able to map to human/mouse genomes normally. Can you post what the
refstats
looked like?There is also XenofilteR (https://github.com/PeeperLab/XenofilteR ) but it sounds like your problem is not with the binning.
RefStats of one of my samples:
It seems that 90% of the total reads map on HG38 ok? Then, I use this single fastq file to map to Gencode reference using STAR and I obtain: Uniquely mapped reads % | 83.15%
Then, I use FeatureCount and I obtain:
It Seems very low respect to STAR map...why?
Instead, if I don't use BBSplit and I map directly the fastq original file with human I obtain: 75.6% of Uniquely mapped reads % and with FeatureCount I obtaine 23.1%.
How is it possible? How can I fix it? Which of the two analyzes is better?
Are you using the correct stand option (
-s
) when counting withfeatureCounts
?bbsplit
is clearly able to assign 90% of your reads (which usesbbmap.sh
under the covers) so there should be no reason why the split file should not align well directly.My original fastq files are prepared with reverse-stranded kit, maybe when I remove mouse of them and get a single fastq, Can I lose this info?
Fastq files themselves are not reverse stranded. The kit that was used for prep captured the reverse strand. Did you try using
-s 2
option when counting reads? Have you examined your aligned files to ensure that reads are aligning properly (there is no general alignment outside of exons, i.e. possibility of DNA contamination in your prep)?yes, it is so!!! However, I used -s 2 and I obtained a low level count. I don't know if there is a DNA contamination... but what would this mean?
Have you examined your aligned files in IGV? Go to genes you know should be there and see what the alignments there look like. What happens if you use
-s 0
? Does the assignment % go up? Just to be sure we are discussing all this for thehuman
part of your data? The mouse part has been separated?