Hello,
I have Illumina fastq files from some RNA-seq, ATAC-seq and WES that originated as PDX samples. I am looking to filter out contaminating mouse reads from the human reads in these datasets.
I have used Xenome before but wanted to try bbsplit. Xenome and bbsplit were attractive because they can handle the fastq files and there is no need to align to mouse and human and then compare filter those bams with tools like ngs-disambiguate, XenofilteR and etc
I built an index for bbsplit successfuly for using human and mouse genomes:
bbsplit.sh -Xmx40g build=1 path=/home/ryan/Reference/bbsplit_mm10_hg38 ref_Mouse=/home/ryan/Reference/Mus_musculus/Ensembl/STAR_reference/Mus_musculus.GRCm38.dna.primary_assembly.fa ref_Human=/home/ryan/Reference/Homo_sapiens/Ensembl/gencode_GRCH38/GRCh38.primary_assembly.genome.fa
I then ran bbsplit as such:
bbsplit.sh -Xmx40g path=/home/ryan/Reference/bbsplit_mm10_hg38/ build=1 in=/home/ryan/NGS_Data/JCA108_S9_L004_R1_001.fastq.gz in2=/home/ryan/NGS_Data/JCA108_S9_L004_R2_001.fastq.gz refstats=/home/ryan/NGS_Data//test/JCA108_stats.txt basename=/home/ryan/NGS_Data/test/JCA108_%_#.fq.gz
I am running this on a Linux system with 48G RAM and 8 threads and the process is taking a long time (over>24hrs so far). Do I need a lot more RAM to use it? the output file is growing, but very slowly!
Thanks!
ryan
Possibly. Since you are using both human and mouse genomes it is possible. If you have access to better hardware I suggest you move the analysis there.
Thanks @GenoMax! do you think something with at least 70GB of RAM?
If the process is taking a while (but working) then you can just let it finish. It can take a while depending on size of your input data. One way to speed it up would be to have more RAM and add more cores. But that would mean starting over on a new machine.