Issue with Converting Excel Data to JSON
1
0
Entering edit mode
10 weeks ago
Cake Day • 0

Hi everyone,

I am an undergraduate student trying to understand the working of Apta-MCTS (https://pmc.ncbi.nlm.nih.gov/articles/PMC8232527/). I believe that initially, I have to run the preprocess.py file first and then classifier.py for RNA aptamer classification.

Problem 1: I assumed that preprocess.py would generate files called train.json and test.json, which are required to run classifier.py, but preprocess.py does not seem to generate any output files.

Problem 2: I tried to convert the data from excel files referenced by the authors into .json files using the template provided in their GitHub (https://github.com/leekh7411/Apta-MCTS). (Just to check the working of classifier.py)

I have two Excel files containing information about proteins and aptamers and I need to structure the JSON output as follows:

{

    "targets": {

        "<protein_name>":{

            "model": {

                "method" : "Lee_and_Han_2019|Apta-MCTS",

                "score_function" : "<path of the weights of the pre-trained API classifer>",

                "k"      : "<number of top scored candidates>",

                "bp"     : "<length of candidate RNA-aptamer sequences>",

                "n_iter" : "<number of iterations for each base when method is Apta-MCTS>"

            },

            "protein": {

                "seq" : "<target protein sequence>"

            },

            "aptamer": {

                "name"      : [],

                "seq"       : []

            },

            "candidate-aptamer": {

                "score"    : [],

                "seq"      : [],

                "ss"       : [],

                "mfe"      : []

            },

            "protein-specificity": {

                "name" : "<list of name of proteins that do not want to bind>",

                "seq"  : "<list of sequence of proteins that do not want to bind>"

            }

        }

    },

    "n_jobs" : "<number of available cores for the multiprocessing tasks>"

}

However, the resulting JSON does not match the expected format, causing classifier.py to throw a KeyError: 'protein-seq'.

Input: python3 classifier.py -dataset_dir=datasets/li2014 -tag=rf-iCTF-li2014 -min_trees=35 -max_trees=200 -n_jobs=20 -num_models=1000

Error:

dataset_dir=datasets/li2014 -tag=rf-iCTF-li2014 -min_trees=35 -max_trees=200 -n_jobs=20 -num_models=1000

Traceback (most recent call last):

  File "/home/cake13/Apta-MCTS/paper_version/classifier.py", line 131, in <module>
    fire.Fire(main)

  File "/home/cake13/ViennaRNA-2.7.0/vienna_env/lib/python3.12/site-packages/fire/core.py", line 135, in Fire

    component_trace = _Fire(component, args, parsed_flag_args, context, name)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/home/cake13/ViennaRNA-2.7.0/vienna_env/lib/python3.12/site-packages/fire/core.py", line 468, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
                                ^^^^^^^^^^^^^^^^^^^^

  File "/home/cake13/ViennaRNA-2.7.0/vienna_env/lib/python3.12/site-packages/fire/core.py", line 684, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
                ^^^^^^^^^^^^^^^^^^^^^^

  File "/home/cake13/Apta-MCTS/paper_version/classifier.py", line 119, in main
    trainset = load_benchmark_dataset(train_json_path)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/home/cake13/Apta-MCTS/paper_version/preprocess.py", line 243, in load_benchmark_dataset
    pseqs  = d["protein-seq"]
             ~^^^^^^^^^^^^^^^

KeyError: 'protein-seq'

Questions:

  1. Could there be an issue with how I structured the JSON from Excel?
  2. Are there any best practices for formatting Excel-to-JSON conversions? Is that something that can be done or is my understanding of a json file wrong?
  3. Any suggestions for debugging where the JSON format might be incorrect?
  4. Do I need any additional files that need to be created or sourced from somewhere apart from what is provided by the authors in their GitHub (https://github.com/leekh7411/Apta-MCTS)?

Thanks in advance for any help!

python classifier json machine-learning • 308 views
ADD COMMENT

Login before adding your answer.

Traffic: 3202 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6