Hello, I am trying to convert some publically available .bam files to the fastq format, using picard tool function SamToFastq as following:
$ java -Xmx4g -jar picard.jar SamToFastq NON_PF=true INPUT=input.bam F=input_1.fastq.gz F2=input_2.fastq.gz FU=unpaired_input.fastq.gz
The resulting fastq files have lesser number of reads than original bam file. I am checking like this:
Using samtools view input.bam | wc -l
resulted into 62193989
While,zcat input_1.fastq.gz | wc -l
is reporting 103572960 (51786480)
Why is there are less number of reads in the fastq file? I have tried to with UNPAIRED_FASTQ=File
option in picard tools, which is reporting zero reads.
I would appreciated if someone could explain why this is happening? Either I am trying with wrong approach of making a correspondence between above numbers or is there something going wrong during conversion. Which is the best way to check whether the .bam to .fastq conversion went well?
Do not use wc -l with samtools. Use samtools view -c in.bam, which is much faster. Be also aware that a fastq contains 4 rows per read. Specifically to your problem, be sure to use INCLUDE_NON_PRIMARY_ALIGNMENTS=true as Pierre suggested to include non-primary alignments. After doing that, count again.