Bioinformatics datasets
2
0
Entering edit mode
2.1 years ago

Hi,

I'm gonna start working on a machine learning project that concerns a bioinformatics issue (proteins, DNA, RNA,......, etc), but I don't know how to get the datasets of these components, and how to use them , so is there anyone can send me datasets and guide me how to exploit them ?

Thanks.

dna datasets rna proteins • 2.2k views
ADD COMMENT
1
Entering edit mode

can send me datasets and guide me how to exploit them

If you're looking for someone to do this for you, what part of the project will you be doing and why is this project important to you? It looks like you sold a possible idea with just hot-selling buzzwords and are now looking for real ideas that employ technologies underlying those buzzwords.

ADD REPLY
0
Entering edit mode

You are totally wrong,i don't look for someone to do this instead of me and haven't sold a possible idea with just hot-selling buzzwords and i'm not looking for real ideas that employ technologies underlying those buzzwords, i just need for help by sending me some datasets.

ADD REPLY
4
Entering edit mode

Can you briefly explain your scientific hypothesis that you're looking to examine using ML as a tool and how ML is a good approach there? As in, what data do you think will go into the ML algorithm, what features do you imagine it's use and what sort of decision would it make for you? Without that, it's just feeding nonsense to a black box and torturing it until it spews something that feels like it makes sense but no one has any idea what is happening or why.

ADD REPLY
0
Entering edit mode

I'm a bioinformatic student at the university, and this project is given by the university, which is an app that processes a biological problem using ML as a tool using any algorithm of the ML algorithms (classification, clustring, prediction,...), but the subject isn't defined, so we have to choose it (dna, rna, proteins,...), for that i have asked to get the collections of the datasets for these mentionned components.

If you have any subjects, you could suggest them on me.

ADD REPLY
2
Entering edit mode

OK so this looks like you did not sell someone on buzzwords - they were sold on it elsewhere and are pushing half baked ideas on you. Talk to the people that designed this project and ask them what they're trying to accomplish when they don't give you a problem or an end goal but just mention a tool. People that design projects have a specific task in mind.

ADD REPLY
0
Entering edit mode

The person who has designed this project is the teacher, he gave us the freedom to choose the topic.

If you have some topics, you can suggest them.

ADD REPLY
1
Entering edit mode

The person who has designed this project is the teacher, he gave us the freedom to choose the topic.

He's your teacher, go to him with your questions until you understand what exactly he wants you to learn. A good teacher has a specific learning goal in mind and welcomes questions aimed at learning.

ADD REPLY
0
Entering edit mode

I said that the topic is not imposed, for this i came to here

anyway thank you

ADD REPLY
1
Entering edit mode

One of the most important step in any ML related project is data preparation, which is it seems what you are asking for. I would start by thinking what is the question that ML should answer, f.e, protein structure, mutations, expression signatures.. etc. Once you know the question, find some "ground truth" data that answer to your question, that you should choose for training your ML model.

ADD REPLY
0
Entering edit mode

there are some datasets for that ?

ADD REPLY
0
Entering edit mode

I said that the topic is not imposed, for this i came to here

anyway thank you

ADD REPLY
2
Entering edit mode
2.0 years ago

From reading the discussion in the comments above, it seems to me that the teacher wants students to find a bioinformatics question that can be addressed using machine learning and then have a go at tackling it. My suggestion would be to go over the topics discussed in class and get ideas from there. Alternatively look for ideas in textbooks or even the literature.
The process goes like this: first find a question you want to address, second collect relevant data and third identify relevant analysis methods and tools.

ADD COMMENT
0
Entering edit mode
2.0 years ago

As others have already pointed out, just throwing any dataset at you will not help.

I suggest you start browsing the scientific literature first, what types of ML problems are commonly addressed in the biomedical domain. As a starter, think e.g. text-processing of scientific literature or patient dossiers, image classification of pathology slides, image segmentation in microcopy, variant calling in genomic sequencing data etc.

Once you have an idea what type of task your ML approach should solve, then we might be able to help. Also mind that training an ML model from scratch is usually very time and resource demanding. Look e.g. at Huggingface for pretrained models that you can refine with some additional training. On the same site, you will also find a diverse collection of datasets. Kaggle datasets is another source for already well annotated ML training data.

ADD COMMENT

Login before adding your answer.

Traffic: 2152 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6