Entering edit mode
4.6 years ago
zhangdengwei
▴
210
Hi all,
I am using kneaddata
to remove the contaminated reads belonging to the host, and below is my command
nohup kneaddata -i ../01.fastp/dynamics_12/clean_SAMEA2580278_r1.fq.gz -i ../01.fastp/dynamics_12/clean_SAMEA2580278_r2.fq.gz -o ./SAMEA2580278 -db ~/database/Genome/1.Human/02.bowtie2.index/GRCh38 --bypass-trim -t 2 --remove-intermediate-output &
In line with its tutorial, it produced several files, as follows
clean_SAMEA2579907_r1_kneaddata_GRCh38_bowtie2_paired_contam_1.fastq
clean_SAMEA2579907_r1_kneaddata_GRCh38_bowtie2_paired_contam_2.fastq
clean_SAMEA2579907_r1_kneaddata_GRCh38_bowtie2_unmatched_1_contam.fastq
clean_SAMEA2579907_r1_kneaddata_GRCh38_bowtie2_unmatched_2_contam.fastq
clean_SAMEA2579907_r1_kneaddata.log
clean_SAMEA2579907_r1_kneaddata_paired_1.fastq
clean_SAMEA2579907_r1_kneaddata_paired_2.fastq
clean_SAMEA2579907_r1_kneaddata_unmatched_1.fastq
clean_SAMEA2579907_r1_kneaddata_unmatched_2.fastq
However, the reads number of two paired files - clean_SAMEA2579907_r1_kneaddata_paired_1.fastq and clean_SAMEA2579907_r1_kneaddata_paired_2.fastq - differed significantly. They should be the same. On the other hand, the two files are same in reads number when processing another sample. I am certain that there are neither errors nor warnings, so what happened? Any suggestions would be greatly appreciated.
Cheers
How did you count the number of reads?
Thanks @ATpoint. Maybe I found why it occurred. It might be due to the title for each read. Here is an example for one read in the original FASTQ file,
If I remove the space within the title, like
@ERR525690.10011001/1
, then runkneaddata
, it worked well. I suppose this might be a bug forkneaddata
, although I did not review its raw code carefully.