After running picard ValidateSamFile I get errors for all reads like the one below - "NM" tags are missing.
WARNING: Read name SRR6251016.24364087_TGTTATGAGA, A record is missing a read group
WARNING: Record 1, Read name SRR6251016.24364087_TGTTATGAGA, NM tag (nucleotide differences) is missing
I am using bam files produced by a STAR mapping pipeline which have "nM" tags as shown below. These are identical in function to NM tags but are alternatively named.
SRR6251016.24364087_TGTTATGAGA 99 chr1 3043025 255 70M = 3043191 236 AGAAAATTGGACATAGTACTACCGGAGGATCCAGCAATACCTCTCCTGGGCATATATCCAGAAGATGCCC EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEE<<EEEEEEEEEEEEEEAAEEEE
NH:i:1 HI:i:1 AS:i:136 nM:i:1
Does anyone know how to set Picard to recognise these tags (which are of the same format) as I need to run Picard MarkDuplicates next in my analysis?
Why do you want Picard to recognize these tags? I think MarkDuplicates only compares 5' end of reads without considering NM tag