Hi all,
I have 112 reads from two animal species (56 reads/each). Those reads represent different time points (control, 6h, 1d, 2d) with 7 replicates in each point.
I want to analyze gene expression in the two species but I do not have a reference genome. I need to make a de novo transcriptome assembly. What I did so far was:
- QC using FastQC and MultiQC using the following slurm scripts:
#!/bin/bash
#SBATCH --job-name=fastqc
#SBATCH --ntasks=40
#SBATCH --time=24:10:00
#SBATCH --mem=200G
#SBATCH --mail-user=
#SBATCH --mail-type=All
#SBATCH --output=fastqc.out
#SBATCH --error=fastqc.err
#SBATCH --partition=ceti
echo Node list: $SLURM_JOB_NODELIST
echo Number of tasks: $SLURM_NTASKS
module load fastqc-0.11.9-gcc-10.2.0-kdeze47
#This script runs 'fastqc' and 'multiqc' on the raw RNA-seq reads
# Define PATH to fastq data directory
DATA_DIR=/users/habibm/taos-scratch/fastq
for file in $DATA_DIR/*R1.fastq.gz
do
withpath="${file}"
filename="${withpath##*/}"
base="${filename%*_*.fastq.gz}"
echo "${base}"
fastqc -t 8 \
$DATA_DIR/"${base}"R1.fastq.gz \
$DATA_DIR/"${base}"R2.fastq.gz \
-o $DATA_DIR
done
exit 0
=====================
MultiQC
#!/bin/bash
#SBATCH --job-name= multiqc
#SBATCH --partition=ceti
#SBATCH --ntasks=40
#SBATCH --mem=200G
#SBATCH --mail-user=
#SBATCH --mail-type=ALL
#SBATCH --output=multiqc.out
#SBATCH --error=multiqc.err
echo Node list: $SLURM_JOB_NODELIST
echo Number of tasks: $SLURM_NTASKS
# Define PATH to fastq data directory
DATA_DIR=/users/habibm/taos-scratch/QC
OUT_DIR=/users/habibm/taos-scratch/multiqc
module load miniconda3-4.7.12.1-intel-19.0.5-fdz2vxj
conda create --name multi_test python=3.7
source activate multi_test
pip install multiqc
#Run multiqc to get a summary for all samples
cd $DATA_DIR
multiqc .
done
exit 0
==========================
Now I need to filter and trim my samples but I have problems with this issue. Some reads have overrepresnted sequences (rRNA). I downloaded the SSU and LSU rRNA for my species as fasta files and I now need to build a slurm script for these jobs. Any suggestions for script that can work with my data?
Thanks,
Mohamed
A few questions/comments:
conda create -n multiqc_env -c bioconda multiqc
? (Again, you'd run this command before running the script; in the script, you'd justsource activate multiqc_env
)I tried to install MultiQC through the conda environment but it failed (I do not know why!?). So, I used pip to install it. I forgot to remove the commands for conda env.
The error message will tell you why. Bypassing that error using pip is only inviting more errors to your env. Fix the conda error and things will work smoother.