I'm gonna start working on a machine learning project that concerns a bioinformatics issue (proteins, DNA, RNA,......, etc), but I don't know how to get the datasets of these components, and how to use them , so is there anyone can send me datasets and guide me how to exploit them ?
can send me datasets and guide me how to exploit them
If you're looking for someone to do this for you, what part of the project will you be doing and why is this project important to you? It looks like you sold a possible idea with just hot-selling buzzwords and are now looking for real ideas that employ technologies underlying those buzzwords.
You are totally wrong,i don't look for someone to do this instead of me and haven't sold a possible idea with just hot-selling buzzwords and i'm not looking for real ideas that employ technologies underlying those buzzwords, i just need for help by sending me some datasets.
Can you briefly explain your scientific hypothesis that you're looking to examine using ML as a tool and how ML is a good approach there? As in, what data do you think will go into the ML algorithm, what features do you imagine it's use and what sort of decision would it make for you? Without that, it's just feeding nonsense to a black box and torturing it until it spews something that feels like it makes sense but no one has any idea what is happening or why.
I'm a bioinformatic student at the university, and this project is given by the university, which is an app that processes a biological problem using ML as a tool using any algorithm of the ML algorithms (classification, clustring, prediction,...), but the subject isn't defined, so we have to choose it (dna, rna, proteins,...), for that i have asked to get the collections of the datasets for these mentionned components.
If you have any subjects, you could suggest them on me.
OK so this looks like you did not sell someone on buzzwords - they were sold on it elsewhere and are pushing half baked ideas on you. Talk to the people that designed this project and ask them what they're trying to accomplish when they don't give you a problem or an end goal but just mention a tool. People that design projects have a specific task in mind.
The person who has designed this project is the teacher, he gave us the freedom to choose the topic.
He's your teacher, go to him with your questions until you understand what exactly he wants you to learn. A good teacher has a specific learning goal in mind and welcomes questions aimed at learning.
One of the most important step in any ML related project is data preparation, which is it seems what you are asking for. I would start by thinking what is the question that ML should answer, f.e, protein structure, mutations, expression signatures.. etc. Once you know the question, find some "ground truth" data that answer to your question, that you should choose for training your ML model.
From reading the discussion in the comments above, it seems to me that the teacher wants students to find a bioinformatics question that can be addressed using machine learning and then have a go at tackling it. My suggestion would be to go over the topics discussed in class and get ideas from there. Alternatively look for ideas in textbooks or even the literature.
The process goes like this: first find a question you want to address, second collect relevant data and third identify relevant analysis methods and tools.
As others have already pointed out, just throwing any dataset at you will not help.
I suggest you start browsing the scientific literature first, what types of ML problems are commonly addressed in the biomedical domain. As a starter, think e.g. text-processing of scientific literature or patient dossiers, image classification of pathology slides, image segmentation in microcopy, variant calling in genomic sequencing data etc.
Once you have an idea what type of task your ML approach should solve, then we might be able to help. Also mind that training an ML model from scratch is usually very time and resource demanding. Look e.g. at Huggingface for pretrained models that you can refine with some additional training. On the same site, you will also find a diverse collection of datasets. Kaggle datasets is another source for already well annotated ML training data.
If you're looking for someone to do this for you, what part of the project will you be doing and why is this project important to you? It looks like you sold a possible idea with just hot-selling buzzwords and are now looking for real ideas that employ technologies underlying those buzzwords.
You are totally wrong,i don't look for someone to do this instead of me and haven't sold a possible idea with just hot-selling buzzwords and i'm not looking for real ideas that employ technologies underlying those buzzwords, i just need for help by sending me some datasets.
Can you briefly explain your scientific hypothesis that you're looking to examine using ML as a tool and how ML is a good approach there? As in, what data do you think will go into the ML algorithm, what features do you imagine it's use and what sort of decision would it make for you? Without that, it's just feeding nonsense to a black box and torturing it until it spews something that feels like it makes sense but no one has any idea what is happening or why.
I'm a bioinformatic student at the university, and this project is given by the university, which is an app that processes a biological problem using ML as a tool using any algorithm of the ML algorithms (classification, clustring, prediction,...), but the subject isn't defined, so we have to choose it (dna, rna, proteins,...), for that i have asked to get the collections of the datasets for these mentionned components.
If you have any subjects, you could suggest them on me.
OK so this looks like you did not sell someone on buzzwords - they were sold on it elsewhere and are pushing half baked ideas on you. Talk to the people that designed this project and ask them what they're trying to accomplish when they don't give you a problem or an end goal but just mention a tool. People that design projects have a specific task in mind.
The person who has designed this project is the teacher, he gave us the freedom to choose the topic.
If you have some topics, you can suggest them.
He's your teacher, go to him with your questions until you understand what exactly he wants you to learn. A good teacher has a specific learning goal in mind and welcomes questions aimed at learning.
I said that the topic is not imposed, for this i came to here
anyway thank you
One of the most important step in any ML related project is data preparation, which is it seems what you are asking for. I would start by thinking what is the question that ML should answer, f.e, protein structure, mutations, expression signatures.. etc. Once you know the question, find some "ground truth" data that answer to your question, that you should choose for training your ML model.
there are some datasets for that ?
I said that the topic is not imposed, for this i came to here
anyway thank you