Hi,
I am processing pair-end sequencing data. I want to output the reads that are unmapped to reference genome or mapped with soft/hard clips, to two fastq files (say R1.fq, R2.fq).
Is there any existing tool available to do this? If not, I will have to develop it by myself.
But this only output the single reads, not the pair.
For pair-end data, if any read of a pair is unmapped or mapped with clips, I'd like to output both reads of this pair.
Use -f 5 then. It will give you unmapped but paired reads. You can play around with the flags, using this tool.
You'll need to do two passes, where you make a hash of the read names in the first pass and then output them in the second. You can do that in python with pysam, C with htslib, or java with htsjdk. I would suggest using python, it'll be quicker to write.