Hi everyone,
I've been setting up a data management system using Berkeley DB and the python bindings (bsddb3) to run parallel tasks for ~ 80 million RNA sequences that have properties similar to the "Target RNA" schema below.
Target RNA
length, sequence, expression, mismatchPositions, etc.
Problem:
10 Parallel processes that access separate records ("rows") and write to the database run very slowly. Each job seems to run slower than the previous, perhaps indicating some sort of locking issue? For instance, if I do a test to update the length of 10 million sequences I'll run 10 separate jobs that access different records (0-1,000,000 for the first job, 1,000,001-2,000,000 for the second, etc.) like so:
#n is job number, dbFN is the filelocation of database
import bsddb3
db = bsddb3.db.DB()
db.set_cachesize(1,0)
db.open(dbFN, None, bsddb3.db.DB_HASH, bsddb3.db.DB_CREATE)
for i in range(n*1,000,000, (n+1)*1,000,000):
db[str(i)] = str(int(db[str(i)]) + 1)
db.close()
And the runtime (s) for each job finishing are something like: 30, 50, 60, 70, etc. Running only one job to update the whole 10 million sequences will take ~ 120s. It seems to slow down the cluster even though the database is 200 MB total. Running 100 jobs that are read only works fine. I'd like to be able to run 30+ jobs concurrently so not being able to run 10 worries me.
Questions:
- Each Processs is accessing different records, why would the speed of the nth job depend on the previous jobs? Doesn't bdb lock on records as opposed to entire DBs? Will manually using transactions speed this up?
- In my previous post there was mention of using redis to run 100 jobs running in parallel all accessing the same memory file, is this something berkeley db is not capable of doing because the records aren't stored in ram like redis?
- How can I tell which part of the system is causing the problem? Is it a bdb problem or a network/filesystem problem. I'm fairly ignorant about filesystems/networks and don't know how to troubleshoot this type of problem. Any tips would be appreciated.
- Is there flags I can pass to allow easier concurrency or db parameters I can change like page sizes to make it more efficient?
- If the problem is due to my network/filesystem setup, how is your system set up to all for this type of concurrency?
System Setup/Stats
- bdb 4.8 with bsddb3
- 5 nodes (w/ Gigabit ethernet connections) using NFS
- "dd throughput": 60MB/s
- cache: 1GB for each db connection
- each db is a HASH type.
If any more info is needed I'll update ASAP.
Thanks.
EDIT:
FYI, Berkeley DB does not support synchronous database access by multiple processes FOR NFS (as mentioned in the comments) http://www.oracle.com/technetwork/database/berkeleydb/db-faq-095848.html
Also, Tokyo/Kyoto cabinet do not support simultaneous database access via multiple processes at all - Tokyo/Kyoto Tyrant/Tycoon have to be used instead, although I'm not sure if even these are safe with NFS?
My experience is that using NFS with multiple nodes accessing the same informations is a very bad idea. I don't know the details about this particular problem, but if it's just reading and writing simple data, you might be better of using mysql. If the problem is more about distributing load to nodes, I'd recommend to use Hadoop.
re: NFS mulitple nodes writing to the same file is bad news, I tried all kids of locking schemes and never got it to work consistently i.e. it would fail one in million times. Finally gave up and just wrote to separate files and aggregated them later. Inelegant, but it works.
I think you will get some good answers on the BDB forum: http://forums.oracle.com/forums/forum.jspa?forumID=271
I'm copying the question there right now but I figured I ask here as well because people might have experience with it.
Thanks for the info everyone. Will a network interface be safe with NFS? It is only using one process/server per database (Tyrant, Tycoon, Redis) so I'd venture that it would be?
@gawp Thanks for the info everyone. Will a network interface be safe with NFS? It is only using one process/server per database (Tyrant, Tycoon, Redis) so I'd venture that it would be?
My recommendation to use Redis still stands.
@Aleksandr Levchuk: Redis looks amazing, but If I can get a solution that doesn't load all the databases into memory it would be better. I'm currently trying out both Redis and Tyrant. PS: I've referenced your example about loading 100 processes using a single database thread multiple times since that post, thanks for the reply - it was helpful.