Entering edit mode
2.3 years ago
amy__
▴
190
Hello,
Having run fastp like this:
#!/bin/bash --login
#SBATCH -J AmyHouseman_fastp
#SBATCH -o %x.stdout.%J.%N
#SBATCH -e %x.stderr.%J.%N
#SBATCH --ntasks=1
#SBATCH --ntasks-per-node=1
#SBATCH --mail-type=ALL # Mail events (NONE, BEGIN, END, FAIL, ALL)
#SBATCH --mail-user=xxxxx # Where to send mail
#SBATCH --array=1-33
#SBATCH --partition=compute
#SBATCH --account=scw1581
#SBATCH --mem-per-cpu=128GB
module purge
module load singularity
module load parallel
# Set bash error trapping to exit on first error.
set -eu
WDPATH=/scratch/$USER/$SLURM_ARRAY_JOB_ID
CONTAINER_FILE=fastp-v0.23.1.sif
MY_CONTAINER_PATH=/scratch/$USER/containers
CONTAINER=$MY_CONTAINER_PATH/$CONTAINER_FILE
EXOME_IDs_FILE=/scratch/c.c21087028/Polyposis_Exome_Analysis_NG0921/fastp/All_fastp_input/NG0921_sampleIDs
FORWARD_FASTQ=/scratch/c.c21087028/Polyposis_Exome_Analysis_NG0921/fastp/All_fastp_input/RawPEreads/{}1.fq.gz
REVERSE_FASTQ=/scratch/c.c21087028/Polyposis_Exome_Analysis_NG0921/fastp/All_fastp_input/RawPEreads/{}2.fq.gz
TRIMMED_FORWARD_OUTPUT=/scratch/c.c21087028/Polyposis_Exome_Analysis_NG0921/fastp/All_fastp_output1/TrimmedPEreads/{}1.trimmed.fq.gz
TRIMMED_REVERSE_OUTPUT=/scratch/c.c21087028/Polyposis_Exome_Analysis_NG0921/fastp/All_fastp_output1/TrimmedPEreads/{}2.trimmed.fq.gz
HTML_OUTPUT=/scratch/c.c21087028/Polyposis_Exome_Analysis_NG0921/fastp/All_fastp_output1/html_output/{}pair.html
JSON_OUTPUT=/scratch/c.c21087028/Polyposis_Exome_Analysis_NG0921/fastp/All_fastp_output1/json_output/{}pair.json
FAILED_FASTQ_OUTPUT=/scratch/c.c21087028/Polyposis_Exome_Analysis_NG0921/fastp/All_fastp_output1/failed_output/{}failed.fq.gz
UNPAIRED_FORWARD_OUTPUT=/scratch/c.c21087028/Polyposis_Exome_Analysis_NG0921/fastp/All_fastp_output1/unpairedreads_output/Unpaired_read1/{}1.unpaired1.fq.gz
UNPAIRED_REVERSE_OUTPUT=/scratch/c.c21087028/Polyposis_Exome_Analysis_NG0921/fastp/All_fastp_output1/unpairedreads_output/Unpaired_read2/{}2.unpaired2.fq.gz
ADAPTERS=/scratch/c.c21087028/Polyposis_Exome_Analysis_NG0921/fastp/All_fastp_input/NG0921_adapter.fa
if [ "$SLURM_ARRAY_TASK_ID" == "1" ]
then
mkdir -p $MY_CONTAINER_PATH
[ -f "$CONTAINER" ] || ssh cl1 wget -O $CONTAINER https://wotan.cardiff.ac.uk/containers/$CONTAINER_FILE
mkdir $WDPATH
fi
while [ ! -d $WDPATH ]
do
sleep 10
done
sed -n "${SLURM_ARRAY_TASK_ID}p" $EXOME_IDs_FILE | parallel -j 1 "singularity run $CONTAINER -i $FORWARD_FASTQ -I $REVERSE_FASTQ -o $TRIMMED_FORWARD_OUTPUT -O $TRIMMED_REVERSE_OUTPUT --html $HTML_OUTPUT --json $JSON_OUTPUT --failed_out $FAILED_FASTQ_OUTPUT --unpaired1 $UNPAIRED_FORWARD_OUTPUT --unpaired2 $UNPAIRED_REVERSE_OUTPUT --adapter_fasta $ADAPTERS"
Sometimes I get output trimmed files that are of not correct, such as:
How can so many reads pass the filtering yet so little be in the filtered ones?
It is bizarre and really annoying!
Could you elaborate a little more on what you find problematic? Upon my cursory reading it seems everything is fine.
With good data, more reads pass the filters and fewer reads will have errors.
I think the confusion stems from following:
Before filtering - 114 M reads (I guess that is total in data set pre-filtering)
After Filtering - 35 M reads (that I would think is either reads that remain after filtering or are removed during filtering (ambiguous)
Above numbers do not add us since the
Filtering result
(which I guess is final) showsreads passed filters - 114M reads (once again)
Hi,
Its more that when I then do
The result is only 21785824
and for the reverse
the result is just 21785824
After running trimgalore, I get
57393620
and
echo $(zcat NG0921001_EKDN210018941-1A_HN2MGDSX2_L2_2_val_2.fq.gz|wc -l)/4|bc
57393620
Which is more in keeping with the 114M
Strange! I will get the fastqc from trimgalore and see if the stats
If you are starting with 21,785,824 reads how are you getting 57,393,620 after trimming?
Sorry I was not clear.
The original input fastq file (no trimming done) for NG0921001_EKDN210018941-1A_HN2MGDSX2_L2_1.fq.gz is 57402212 reads.
When using the trimming tool fastp it ends up outputting only 21,785,824 reads as the trimmed reads fastq.
Then when I used the original fastq and trim galore, to compare its output to fastp I get 57,393,620 as the output after running trim galore.
So fastp is doing something strange, and outputting a very small number compared to when trim galore does its own trimming
The explanation is simply that the two tools are being instructed to do something else altogether.