Hi, I need your help, I have several FASTA files and I need to make a database of bacterial genomes (I am working with an example "klebsiella pneumoniae"): I thought of making a database with several classes (genomes, species, genus), I was proposed to make a single class:
create table myDatabase (
taxon varchar(100),
seqtype enum(dna,prot),
seq longtext
)
I don't know how to implement my database with several (thousands) genomes.
because with this method for example it would take a lot of time ("INSERT INTO myDatabase VALUES ('klebsiella', 'dan, AATU...);
).
Moderator Edit: Previous thread for context: construction of a database
Translated with www.DeepL.com/Translator (free version)
Creating an SQL database with sequence data is not a great idea. In any case, you should to be able to use bulk import facilities to import records in bulk as long as you have them in a tab/comma separated format file. Google will help you if you search for "bulk import from file to SQL" and add your SQL provider (MySQL, SQLite, etc) as another keyword in the search
Thank you for your answer, you advise me to make a database with NoSQL?
I did not give such advice. I asked a question on how you're making this decision between SQL/NoSQL without understanding the architecture of either.
These are FASTA files so they are not tab separated/comma format files.
but they can be easily converted to tab delimited files...