I found 2 posts recommending to use HTSlib htscmd for converting BAM to Fastq, but both output only interleaved file paired reads. I used this tool to deinterleave.
After deinterleaving the paired.fq from this command: htscmd bamshuf -Ou input.bam tmp-prefix | htscmd bam2fq -s se.fq.gz - | gzip > pe.fq.gz
(Source) I got only 41k pairs.
After deinterleaving output of this command htscmd bamshuf -uOn 128 aln_reads.bam tmp | htscmd bam2fq -a - | gzip > interleaved_reads.fq.gz
(Source), I got 232.7k reads for one end and 214.4k reads for the other end.
Is there a parameter in HTSlib htscmd which could instruct to output dedeinterleaved reads? Or is there a more fitting tool for deinterleaving?
My Bam file is from Tophat, and I would like to re-analyze these reads after filtering again with Tophat. Is it important to integrate them back with paired reads for re-analysis? It appears from here that singletons are not passed on to Cufflinks by Tophat for FPKM, but since they mapped, I would think that the Tophat/Cufflinks pipeline would make use of them? Are singletons tend to be splice-junction reads?
I encountered issue with this approach too (even using VALIDATION_STRINGENCY=SILENT), see here Picard error Illegal Mate State in converting BAM to Fastq Let me know if you have a solution for this. thanks
Hi, @aniketd86. I tried your code, the warning of "Exception in thread "main" htsjdk.samtools.SAMFormatException: SAM validation error: ERROR: Found 52323 unpaired mates" did disappear, but I think it is probably due to the parameter "VALIDATION_STRINGENCY=SILENT", and the unpaired reads were still discarded because the output_up.fastq was empty, and the reads in output_pe1.fastq and output_pe2.fastq were same as my previous command (without the last two lines in yours).