Entering edit mode
4.1 years ago
im
▴
30
I was given MAF files (perhaps not standard, I'm not sure) and I was wondering what protocol/format/standard the "Protein_Change" column might be using if anyone recognizes it, as I cannot find any documentation on it. I originally thought it may correspond to the amino acid change (which I need), but some rows in these files have the same Codon_Change (minus the location) yet have different values in the Protein_Change column, so I am now confused. For example:
What I want to do is group the mutations by identical amino acid change, but I can't figure out a good way to do that.
The trinucleotide
GCC
can occur multiple times in a coding sequence - so "same Codon_Change (minus the location)" is meaningless as location is the key component. The Protein_Change seems to be following HGVS conventions (to an extent, as synonymous variants should ideally be notated by=
like so:p.A22=
), so they should be easy to handle.For additional reference, the description of the HGVS format can be found on their website (https://varnomen.hgvs.org/ ) or in the paper (https://onlinelibrary.wiley.com/doi/full/10.1002/humu.22981 ). There is also python packages to parse the format (https://hgvs.readthedocs.io/en/stable/index.html ).