'm new to programing and I'm currently working on my thesis.
I'm working with multiple csv files and a json file containing genes with amino acid changes involved in antibiotic resistance. The csv files are formatted like this:
Gene_Aminoacids Filename
gyrA_S95T SRR9851427
tlyA_L11L SRR9851427
katG_R463L SRR9851427
In the json file the genes are present as keys, and the corresponding antibiotic which it effects are set as values.
Ex small part of json file.
"gyrA_A74S" : ["Quinolones"],
"gyrA_D89X" : ["Quinolones"],
"tlyA_C-83T" : ["Capreomycin"]
"katG_R104Q" : ["Isoniazid"],
"katG_S315I" : ["Isoniazid"],
"katG_S315N" : ["Isoniazid"],
etc....
What I'm interested in is finding matching (keys) genes from the json file and the csv files. I'm interested in a new output that should contain the keys that are found in both json & csv file, which is the genes, and the corresponding antibiotic (value) .
Ex of the wanted output
Gene_Aminoacids Antibiotic Filename`
"katG_R104Q" : ["Isoniazid"], SRR9851427
So far this is the code that I have written and I have looked into similar issues but they didn't work on my data.
def retrive_rest_mutations(jsonfile):
with open(jsonfile) as data_file:
data = json.load(data_file)
return(data.keys())
mutation_keys = retrive_rest_mutations("tb_TEST.json")
##Read & set path to folder containing a.a changes
path = "Replaced_P_G.ann.vcf"
samp = glob.glob(path + "/*_G.P.vcf_replaced.txt")
###Read text files
result = []
def read_text_file(file_path):
with open(file_path, 'r') as f:
print(f.read())
##iterate through all files
def all_files():
for file in os.listdir():
if file.endswith(".txt"):
file_path = f"{samp}/{file}"
read_text_file(file_path)
print("\n")
The code might be wrongly indented due to that i copied it I'm uncertain on how to do the matching between the json file and the multiple csv files and there might be a simple solution to my issue.
Dose anyone maybe have a suggestion, or what I should look into to get the new output containing the Genes + Antibiotic + Filename?
Best regards