Hi everyone,
I'm a bioinformatics student currently working on a project to build a Python-based parser for mmCIF files, focused on analyzing enzyme structures like TPH1 and PAH.
My goals include:
- Parsing structural data efficiently (Biopython-free)
- Comparing enzymes based on resolution, method, and metadata
- Mapping disease-associated mutations to protein regions
- Theoretical binary modeling of mmCIF structure
I’ve structured the project to be clean, modular, and educational — with open-source code and clear README + diagrams. I'm looking to expand this to integrate structure modeling and mutation visualization.
GitHub Repository:
https://github.com/ShravyaRS/mmCIF_Parser_Project
Looking for:
- Feedback on structure/function features
- Ideas for mutation prediction tools to integrate
- Suggestions to improve parser logic for more complex mmCIF structures
I'm open to feedback, collaboration, or even pointers to datasets or literature that can help deepen this work. Thanks in advance!
Thank you so much for the clarification!
You're right — I should've phrased it more as building a parser "for mmCIF files" in general, and then applying it to enzyme data like TPH1 and PAH as a use case. My focus is more on making a clean, beginner-friendly tool from scratch (without Biopython) to help students understand the format and possibly extend it with structural/functional insights.
I'll definitely check out your benchmark — it looks super helpful!
Thanks again for taking the time to reply.