Dear friends I have set of sequences(about 4000) in fasta formats as shown in the following
>ENSG00000127837;ENST00000248450;AAMP;sequence length: 431
GTGAGAACTGCCGCTCCTCAGGCCATGGGACAGGAGACGCTCACCCCTGGCCTCTGACTCCTGCTT
I need to store these sequences to further analysis. I tried to store sequences with MS ACESS but I could not put sequences in to one row and I could not extracted sequence length for example 431 in above. because during importing this fasta file in to the access whole "sequence length: 431" part coming as one field. and DNA sequence could not put in to a one row it was stored in several rows under first column.
I need to store all the sequence as one a record as shown in following format. Please give advice and your suggestions
gene ID transcript ID gene symbol sequence length sequence
ENSG00000127837 ENST00000248450 AAMP 431 GTGAGAACTGCC
Do you just want one file, containing all your different FASTA or do you want a real Database (e.g. MySQL)?
In my opinion one file should be sufficient, since 4000 sequences are not that many.
Can you please tell what your preferred programming language is, if you can program at all.
thanx for the reply friend..yes I need a one file at the moment..rather than put them in to a real database. I am dealing with java , perl and python....
You need to show us a bit more about the import process and provide the SQL code or whatever you are using, I do not believe there is a 'fasta importer' that comes with MS Access. As a general recommendation I would try to resort to open source solutions such as MySQL/or PostgreSQL. The acceptance of these tools by the bioinformatics community is much higher, and thereby our ability to help.
Please consider whether your data needs to be stored in a database at all. Most likely, leaving small sequence data in FASTA files is sufficient for the purpose of any analysis. Storing sequence data in an Access database most likely provides no advantage you could utilize unless you have an extremely specific type of analysis pipeline in your company or department.
thank you friend..
I imported the set of sequences stored in the text file. through the access, I imported them as text file with ";" as delimited. Once I read you comment here ,I think I did something which can't try with access...So that I have to try with other way...isn't it?
tnx ..friend
yes..actually this is part of my study. I have such sequences in FASTA format. I want to further deal with these set of sequences. that is why I wanted to store such data in a particular form of one file /database type. because my study further expect to extend towards the patter matching of such sequences(these sequences are used to find set of cis-elements ) so that I want to readily identify gene name, transcript name and length easily. That is why I wanna store such data in to particular format in order to look at and retrieve easily.