There was a discussion over year and half ago here Picard Exception: Illegal Mate State regarding Picard Exception: Illegal Mate State, and there was a link to here https://github.com/jeff-k/resolvepairs explaining that the error is due to more than one pair of reads having the same query name. The proposed solution was to add unique id to the names of reads belonging to the same mate pair using the Resolvepair script, which failed for me on line 94. So, I have tried to first filter with samtools view -bq 4 accepted_hits.bam > filtered.bam, and then also add VALIDATION_STRINGENCY=LENIENT, but the error persisted. My command was:
/java -jar /SamToFastq.jar VALIDATION_STRINGENCY=LENIENT INPUT=filtered.bam FASTQ=read_1.fastq SECOND_END_FASTQ=read_2.fastq'
Is the Resolvepair the only solution? Did anyone encounter failure due to syntax error on line 94? Thank you
Yes, it worked, thank you. It found 178.1k pairs and 33k singletons. I also tried bam2fastx-based script I found here but it found 211.3k pairs and 29k singletons. [However, using bam2fastx (from tophat) on its own exits fast with Error couldn't retrieve both reads for pair HWI..., although file was sorted by name]. Would appreciate clarifications on the inconsistencies between the results from these different approaches. Samtools flagstat output on this Bam file is:
My Bam file is from Tophat, and I would like to re-analyze these reads after filtering again with Tophat. Is it important to integrate them back with paired reads for re-analysis? It appears from here that singletons are not passed on to Cufflinks by Tophat for FPKM, but since they mapped, I would think that the Tophat/Cufflinks pipeline would make use of them? Are singletons tend to be splice-junction reads?
Hi! @Dan D. I encountered the same problem as @trakhtenberg did. I tried your method using bedtools. There were a lot of warning indicating that a read has a missing pair when the program was running, eg. "*WARNING: Query HWI-ST1061:191:C0LK6ACXX:5:1315:18250:10889 is marked as paired, but it's mate does not occur next to it in your BAM file. Skipping. "
If each warning refers to a missing read, the total number (330000), in my case, was more than that of the missing reads using picardtools/SamToFastq (271000). Did you saw the same warning when using bedtools?