Alternate Tag Types for Picard
1
4
Entering edit mode
5.1 years ago
Ben ▴ 30

After running picard ValidateSamFile I get errors for all reads like the one below - "NM" tags are missing.

WARNING: Read name SRR6251016.24364087_TGTTATGAGA, A record is missing a read group
WARNING: Record 1, Read name SRR6251016.24364087_TGTTATGAGA, NM tag (nucleotide differences) is missing

I am using bam files produced by a STAR mapping pipeline which have "nM" tags as shown below. These are identical in function to NM tags but are alternatively named.

SRR6251016.24364087_TGTTATGAGA  99  chr1    3043025 255 70M =   3043191 236 AGAAAATTGGACATAGTACTACCGGAGGATCCAGCAATACCTCTCCTGGGCATATATCCAGAAGATGCCC  EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEE<<EEEEEEEEEEEEEEAAEEEE  
NH:i:1  HI:i:1  AS:i:136    nM:i:1

Does anyone know how to set Picard to recognise these tags (which are of the same format) as I need to run Picard MarkDuplicates next in my analysis?

picard RNA-Seq alignment • 1.2k views
ADD COMMENT
0
Entering edit mode

Why do you want Picard to recognize these tags? I think MarkDuplicates only compares 5' end of reads without considering NM tag

ADD REPLY
7
Entering edit mode
5.1 years ago

If STAR's nM was actually identical in function to the standard NM, one might ask why on earth they were making life hard for everyone by using a different tag name. But in fact it is not:

nM : is the number of mismatches per (paired) alignment, not to be confused with NM, which is the number of mismatches in each mate.

Look at STAR's --outSAMattributes option, which can be used to also output NM. One might ask the STAR developers why NM and other tags desired by Picard's “typical usage” validation are not in STAR's standard set of attributes…

ADD COMMENT
0
Entering edit mode

Thank you! I didn't know that difference. Will try re-running my STAR mapping!

ADD REPLY

Login before adding your answer.

Traffic: 1879 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6