I performed sortmeRNA using the following command:
sortmerna --ref ref_files --reads input.fq --aligned output_rRNA --other output_clean --log -a 12 -v --paired_in --fastx
After unmerging "clean", I got "cleanFV" and "cleanRV". I assume that FV and RV files should have the same number of lines since they are paired, but when I checked it using wc -l, they are of the different number of lines. However, FastQC showed the same number of reads for the two files, and the sum of FV, RV and rRNA read number equals to the total number of reads of my input file. Can anyone explain why I got different number of lines but the same number of reads?
Hi, I don't think my data corrupted since my FV.fq.gz has 8078406 lines but my RV.fq.gz has 8058719 lines, they are quite different...and almost all of my samples are like that.
Also, why they still have the same number of reads? Another thing I noticed is that the original FV and RV files (after trimmomatic) I merged to feed sortmerna are also of different line numbers...again, they have the same number of reads shown in fastqc reports.
Do not trust
wc -l
counts. Either run therepair.sh
tool I posted above or use a Fastq file validation program likevalidateFiles
from Jim Kent's utils to see where the problem is. Add execute permissionchmod u+x validateFiles
after you download.Thank you for your suggestion, I am running repair.sh to figure out
I got an empty file for the singleton, so everything seems fine.