Hello! I am working on assembling a transcriptome of the terminal ganglion of the cricket (Gryllus bimaculatus). We are working on a non-model species and thus did not remove ribosomal contamination during library prep. We want to remove ribosomal contamination bioinformatically, and I am mapping our sequences, which have been trimmed for quality control with fastqc and rcorrector, to the silva ribosomal database. I uploaded the fasta silva database files onto the HPC, concatenated them, converted the U's-->T's, and then bowtie2 indexed the resulting fasta file. Now, we are having a problem when we want to map (bowtie2-align) our sequences to the silva database. Have you ever come across an error like this before? If so, how did you go about solving it?
#!/bin/bash
#$ -cwd
#$ -j y
#$ -S
#$ -pe smp 40
#$ -M x@xxx.edu -m be
export silva_db=/mnt/research/hhorch/term_RNAseq_analysis/silva_db/silva_db/silva_db
export r1=/mnt/research/hhorch/term_RNAseq_analysis/trimmed_reads_cor_val/fixed_1C_1_R1.cor_val_1.fq
export r2=/mnt/research/hhorch/term_RNAseq_analysis/trimmed_reads_cor_val/fixed_1C_1_R2.cor_val_2.fq
export summary=/mnt/research/hhorch/term_RNAseq_analysis/rRNA/1C_1_rrna_summary.txt
export mapped=/mnt/research/hhorch/term_RNAseq_analysis/rRNA/rrna_mapped_1C_1
export unmapped=/mnt/research/hhorch/term_RNAseq_analysis/rRNA/rrna_unmapped_1C_1
export single_mapped=/mnt/research/hhorch/term_RNAseq_analysis/rRNA/single_mapped_1C_1
export single_unmapped=/mnt/research/hhorch/term_RNAseq_analysis/rRNA/single_unmapped_1C_1
bowtie2 --very-sensitive-local --phred33 -x $silva_db -1 $r1 -2 $r2 --threads 40 --met-file $summary --al-conc-gz $mapped --un-conc-gz $unmapped --al-gz $single_mapped --un-gz $single_unmapped -S "$name"_out.sam
Where silva_db is the path to the bowtie2 indexed silva database, r1 and r2 are the input reads, and summary, mapped, unmapped, single_mapped, and single_unmapped are the output from the alignment.
I have tried many variations of this script with pretty much just troubleshooting the --very-sensitive-local
and --threads
parameters. For the first part, I have also tried --sensitive-local
, --fast-local
, and --very-fast-local
, with different combinations of threads and these moosehead commands:
qsub -l 10g=true -pe smp 16 silva_test9.sh
qsub -l gpu=1 -pe smp 16 silva_test10.sh
qsub -l 10g=true -pe smp 40 silva_test5.sh
The part in pink is where I tried different numbers of threads.
Within the script itself, I have also played with the number of --threads
. In addition to using cpu as a variable, I have just put the number of threads there (12, 16, 40, etc). The maximum amount of RAM we can use is 360gb.
Most of the time I have been aborting the job after a couple of hours and even a few days sometimes. Output files are created in the correct location, but there is nothing in them. The error message I have been receiving is "bowtie2 died with signal 9 (KILL)".
Thank you so much!
Meera
Your email address was redacted from the post above.
It seems that you run out of memory.
Hello! Yes, I think we have been running out of memory. Are you familiar with any other ways to map our sequences to this database?
I think it is a good way. Just ask how to increase the memory amount allowed to your job to your local bioinformaticians / cluster (HPC) managers. With the Slurm job manager (sbatch or SlurmEasy to launch scripts) this can be easily done at the beginning of the shell script you use to launch bowtie with the following lines:
Note that if the cluster is using SGE to manage jobs (qsub, etc), the above lines will not work, they are specific to Slurm. For SGE, please use the following:
Note that
h_vmem
is per core, so if you use multiple cores with--pe env 12
,h_vmem
will be multiplied by 12.Hi Gautier,
Thank you! I have been in contact with our HPC - expert and he has helped me up the RAM as much as possible. Unfortunately, it still has not been working. Would you recommend another alignment such as BWA?
Yes you can try using BWA for ungapped mapping. If you think that your alignments might have gaps, please use STAR (it could as you are analysing transcripts right?).