Chromosome Data -- Only genes for given chromosome, without introns and with annotated motifs, possible?
0
1
Entering edit mode
9.9 years ago
ldpubsec ▴ 80

Hello,

I am currently a beginner at bioinformatics data, and I would like to ask you, whether it is possible to download separate chromosomes, for example all chromosomes of homo sapiens, which would then (after some processing or already --) contain: (1) only coding regions, which would have annotated its motifs; and in second case (2) original chromosome data but with annotated motifs in genes.

So in result, for the first case (1) I would like to have chromosome data file containing all its protein sequences and for each protein, annotated their motifs, like "here starts protein [protein sequence], motif1 [sequence], motif2 [sequence], motif3 [sequence], start of another protein [sequence]" etc. For the second case (2) chromosome would contain all data, but for proteins, motifs would be annotated as in first case.

Is this possible? What tools or best format do I need?

Thank you very much for help, I appreciate it a lot

Assembly gene sequence genome rna-seq • 4.0k views
ADD COMMENT
1
Entering edit mode

If I understand well you want the human genome sequence and its annotation. In ENSEMBL biomart you can filter the feature you wan to see in your annotation (coding sequences for example)

ADD REPLY
0
Entering edit mode

What sort of motifs are you after? As NicoBxl suggested, you probably just want to use Biomart. This will allow you to get gene/transcript/etc. annotations and sequences. If you're after something else, like getting motifs within proteins annotated, then you'd probably need to look into using uniprot or a similar protein database.

ADD REPLY
1
Entering edit mode

You can get protein domains and motifs in BioMart too.

ADD REPLY
0
Entering edit mode

Can you get location information though (namely, motif X is at positions ABC of the protein)?

ADD REPLY
0
Entering edit mode

No, good point. You can just get the domain ID and the fact that it's in the transcript.

ADD REPLY
0
Entering edit mode

Then, is Ensembl still the way for what i need?

ADD REPLY
0
Entering edit mode

Not alone, though you might incorporate Ensembl/biomart into the eventual program(s) you write to do this.

ADD REPLY
0
Entering edit mode

Then, retreiving raw sequences with simple tags "protein start", "motif 1", ..., could not be achieved only by BioMart?

New situation: I have parsed chromosome into all protein sequences saved in .txt files on my disk. Now I could use tool for adding motif tags into it. What do you think?

ADD REPLY
0
Entering edit mode

If you need the positions of the motifs, then that's not available from biomart, so no.

For simply predicting sites from sequences, sure, there's no reason that shouldn't work. An alternative (and likely faster) approach would be to find a different database that provide the motif locations in protein coordinates and then merge the two datasets together with a custom program.

ADD REPLY
0
Entering edit mode

Good, is there difference (quality, or anything, except the time consumption) between 'using prediction tool to all protein sequences i have' and 'retreiving these informations from database'?

Thank you really

ADD REPLY
0
Entering edit mode

Unless you've used these prediction tools before and are familiar with them, the odds are good that the databases are higher quality.

ADD REPLY
0
Entering edit mode

Thank you for comments, what I need is to annotate proteins, and for those proteins annotate their motifs. So in result, I would have chromosome data file containing all its protein sequences and for each protein, annotated their motifs. It is absolutely sufficient for me to have annotation like "here starts protein, motif1, motif2, motif3, start of another protein" etc. (I'll add this info into the original post.)

ADD REPLY

Login before adding your answer.

Traffic: 2463 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6