In the context of developing a classification model for ascertaining whether a given variant is affecting the gene expression for a certain disease, I've obtained 1k bp up and downstream of the variant locations. Now, what are the possible features that I could extract out of this sequences for this specific task? Also, is it more relevant to compute biological features over statistical ones for the same purpose? Any help would be much appreciated.
You can try to create VCF file from your data set and predict the variant effect to see the mutations are deleterious or tolerated A: Allele frequency visualization
I'm sorry I don't think you read my description right. I'm trying to be disease specific in my context. However, I did use ensembl VEP to obtain the positions of rs ids of my interest, in hg38 assembly. Thanks for your thoughts.