If for example I need all the sequencing runs for a particular organism and a particular platform (e.g. pacbio), can I search either SRA only or ENA only?
While the annotated sections of the INSDC databases (i.e. DDBJ, ENA Sequence (formerly EMBL-Bank) and GenBank) are synchronised daily (with occasional longer delays for some larger submissions and for release processes). The read archives (i.e. ENA Read, NCBI SRA and DDBJ Sequence Read Archive (DRA)) are a little different due to the volume of data involved. As such synchronisations can take longer, and some data may not be shared.
When working with data accessions, you can identify the primary source for the data using the accession prefix, these are:
'DR' for DDBJ Sequence Read Archive (DRA)
'EGA' for European Genome-phenome Archive (EGA)
'ER' for ENA Read
'SR' for NCBI Sequence Read Archive (SRA)
For more information about specific data synchronisation policies and/or queries relating to the synchronisation of specific datasets I suggest you contact the specific database teams involved, see their web sites for details.
Thanks for input! Actually I see now only this question ENA/SRA updating frequency. which is very similar to mine. There seem to be no definitive answer though!
NCBI, ENA and DDBJ share data periodically (i think that is every two weeks, but I am not sure). There are some instances where the data appear only in one DB and not in others.
ADD COMMENT
• link
updated 2.8 years ago by
Ram
44k
•
written 10.1 years ago by
Shyam
▴
150
Thanks for input! Actually I see now only this question ENA/SRA updating frequency. which is very similar to mine. There seem to be no definitive answer though!
Perhaps ENA themselves could give a definite answer?