Hi,
I'm wondering what do Picard Tools "cleansam" and "fixemate" exactly do to the data and what are the equivalent in samtools?
Actually, I'm processing my sorted by bam files (resulted mapping from fastq files with samtools) in order to proceed to the SNP calling with GATK. It is recommended from the GATK best practices guideline that unmapped sam/bam files should be treated with Picard tools "cleansam" and "fixmate" before removeduplicate. However, as I have already sorted my bam files with samtools and do not want to start all over with the unsorted files, I'm just wondering what could I do under samtools on my sorted bamfiles in order to have the same required results as using Picard tools "cleansam" and "fixmate".
Thanks.
As far as I know the fix mate command will check that two mates of a pair are actually in the file. Sometimes prior filtering will remove one mate of a pair, but it won't update the SAM flag which says both mates are present. Fix mate will run through the file and update the flag if it can't find the mate anymore.
Samtools fixmate:
Fixmate checks the two mates from a paired-end bam (name sorted) and then updates the flags and insert sizes. IMHO that only makes sense if you did any filtering on your bam. For example, I typically filter my bam (for ChIP-seq, ATAC-seq, these kind of assays) for properly-paired reads and MAPQ. In case filtering for MAPQ>30 removes the forward, but not the reverse mate, the actual read is no longer paired, even though the bitwise flag still indicates the remaining mate as such. Running fixmate will then flag this singleton as unpaired and remove the insert size field, which allows subsequent removal by e.g. samtools view -f 2. In your case, as you did align directly from fastq, I do not think that it is necessary. Just be sure in your subsequent SNV calling that you exclude reads with MAPQ=0, as these multimappers are unrealiable.