I have two excel files containing data regarding schizophrenia.
File 1: Contains information about GWAS studies for schizophrenia and includes their CNV regions and associated genes. I built this by importing info from several databases.
File 2: Contains information about the increase in the expression of genes and also contains info about CNV regions and genes and so on.
My task is to find out if CNV regions and genes found in File 1 are present in File 2 and since there are about 3K entries I want to automate the process. Is there a script that I can write to read the two files in and display the duplicate entries? I just need a reference or pseudo-code if you will so that I can get started or atleast have an idea of how to proceed. If members can post the code in python or C++ it would be beneficial.
Thanks,