I will be working with the full CDS genome for a lot of species. I want to store some information about the transcript along with other information about the sequences too. What would be the best way to set this database to make working with them efficient. Would SQL be a way to go ?
If you work with Ensembl you might want to have a look at pyGeno. It will store them for you along with all the annotation information from the GTF (Ids, names, biotypes, ....) into an SQL database, and index it also. You might have to create some datawraps yourself but the operation won't take more than a few minutes.
ADD REPLY
• link
updated 2.7 years ago by
Ram
44k
•
written 9.9 years ago by
moranr
▴
290
1
Entering edit mode
Here is how you create a new datawrap. It's basically putting all the URLs from witch the files should be downloaded into one file and compressing into a tar.gz archive. Let me know if you encounter any issues.
If you just want to store information about transcripts and be able to access that easily then an SQL database would be a simple enough solution (at least if you're already familiar with SQL). Note that I wouldn't try to store the genome in one (you could, but you're better off using an indexed file).
I'll look into this further thank you.
Here is how you create a new datawrap. It's basically putting all the URLs from witch the files should be downloaded into one file and compressing into a tar.gz archive. Let me know if you encounter any issues.
Great thank you very much