I am working on creating a DCA pipeline for my lab. I'm currently evaluating the performance of several different DCA tools, EVCouplings, GREMLIN, and pydca. I am trying to find the False Positive rate for the predicted chain1-chain2 contact map produced by each of these tools, but my problem is finding the "canonical" true positives for the control protein I'm using. I'm using the listed contacts from PDB sum, but they are very different from what I get from any of the DCA tools. My question is, is there a better place to find contact maps, am I misinterpreting the contact maps produced by the tools, or overlooking something?
Thank you for the tip about BioStructures - I have never used Julia but I'm working on getting it working now. Will the contact maps it predicts be for chain-chain contacts or contacts within a chain? (I'm benchmarking with actin.)
At first I thought the discrepancy between the PDBSum contacts and the contact maps produced by the DCA tools was due to different sequences because PDBSum uses a sequence 4 AA less than the original sequence I was using, but even after rerunning them all with the same sequence the discrepancy remains.
I haven't been using any MSA files- the inputs all seemed to just be asking for a single sequence. Reading through the documentation, it seems that an MSA file of the actin family from Pfam would be best to use, do I have that correct? (Please forgive all of the questions - I'm a student intern in a wet lab with no other bioinformaticians and I've only worked with genomic data, not proteomic before.)
It can be for either.
gets contacts within a chain, whereas
gets contacts between chains.
This explains the discrepancy then. DCA methods are statistical and require MSAs to work, preferably deep ones (a few hundred sequences or more). It is possible that some methods you are using generate a MSA as part of the pipeline, but if you are comparing methods I would get your own MSA and use that as input to all the methods.
This can be difficult and is a common problem in bioinformatics. If possible I would try and find someone at your institution with relevant expertise to help.
Yes, the original plan was for me to get guidance on projects from the bioinformatics core at the institution, but since my internship started remotely as all in-person research activities shut down due to COVID-19, that has gone out the window for the time being, and I'm muddling through as best I can. Thank you so much for your help, this has given me more guidance than a week and a half of deciphering documentation.
Makes sense. Not the easiest time to start a position for sure, but I'm sure you'll muddle through.