store sequence in a local database
1
0
Entering edit mode
10.3 years ago
ravihansa82 ▴ 130

Dear friends I have set of sequences(about 4000) in fasta formats as shown in the following

>ENSG00000127837;ENST00000248450;AAMP;sequence length: 431
GTGAGAACTGCCGCTCCTCAGGCCATGGGACAGGAGACGCTCACCCCTGGCCTCTGACTCCTGCTT

I need to store these sequences to further analysis. I tried to store sequences with MS ACESS but I could not put sequences in to one row and I could not extracted sequence length for example 431 in above. because during importing this fasta file in to the access whole "sequence length: 431" part coming as one field. and DNA sequence could not put in to a one row it was stored in several rows under first column.

I need to store all the sequence as one a record as shown in following format. Please give advice and your suggestions

gene ID            transcript ID      gene symbol    sequence length    sequence
ENSG00000127837    ENST00000248450    AAMP           431                GTGAGAACTGCC
windows sequence database • 4.9k views
ADD COMMENT
3
Entering edit mode

Do you just want one file, containing all your different FASTA or do you want a real Database (e.g. MySQL)?

In my opinion one file should be sufficient, since 4000 sequences are not that many.

Can you please tell what your preferred programming language is, if you can program at all.

ADD REPLY
0
Entering edit mode

thanx for the reply friend..yes I need a one file at the moment..rather than put them in to a real database. I am dealing with java , perl and python....

ADD REPLY
3
Entering edit mode

You need to show us a bit more about the import process and provide the SQL code or whatever you are using, I do not believe there is a 'fasta importer' that comes with MS Access. As a general recommendation I would try to resort to open source solutions such as MySQL/or PostgreSQL. The acceptance of these tools by the bioinformatics community is much higher, and thereby our ability to help.

Please consider whether your data needs to be stored in a database at all. Most likely, leaving small sequence data in FASTA files is sufficient for the purpose of any analysis. Storing sequence data in an Access database most likely provides no advantage you could utilize unless you have an extremely specific type of analysis pipeline in your company or department.

ADD REPLY
0
Entering edit mode

thank you friend..

I imported the set of sequences stored in the text file. through the access, I imported them as text file with ";" as delimited. Once I read you comment here ,I think I did something which can't try with access...So that I have to try with other way...isn't it?

ADD REPLY
0
Entering edit mode

tnx ..friend

yes..actually this is part of my study. I have such sequences in FASTA format. I want to further deal with these set of sequences. that is why I wanted to store such data in a particular form of one file /database type. because my study further expect to extend towards the patter matching of such sequences(these sequences are used to find set of cis-elements ) so that I want to readily identify gene name, transcript name and length easily. That is why I wanna store such data in to particular format in order to look at and retrieve easily.

ADD REPLY
1
Entering edit mode
10.3 years ago
linus ▴ 360

I do not get why you want to have a database with such few data.

Here is a idea/solution without any database, but with a single file containing all your informations:

It is quite simple. You store your whole data in a CSV file. An example would be:

gene ID;           transcript ID;     gene symbol;    sequence length;    sequence
ENSG00000127837    ENST00000248450    AAMP            431                 GTGAGAACTGCC

Of course without the whitespace.

This kind of data has two advantages. The first one is, that if your sequences are not that long you can still do an easy lookup in excel. The second is, that there is in every proper programming language an already implemented CSV reader/writer, allowing you easy access to your data. For example in python there is: https://docs.python.org/2/library/csv.html

So basically you need to parse your multi FASTA files into for example python and then just write them into a single CSV file.

I hope my idea helps you. If you could describe your use-cases more, we can probably give more or better advice.

ADD COMMENT

Login before adding your answer.

Traffic: 1079 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6