Database
Cluster Emulator (DCLUE)
K.
Kant, A. Sahoo and N. Jani
DCLUE is a
powerful simulator for studying the performance of clustered database systems.
It includes detailed simulation of an Oracle-9i like clustered
database running a TPC-C like workload.
The model implements the following features:
- Complete creation of the
database tables and their indices and their proper initialization. (Only
the required table columns [fields] are represented, but row sizes are
modeled correctly).
- Emulation of disk IO and buffer cache
at each node. Each table & its index are cached separately. Disk IO
emulates elevator scheduling, caching and other usual IO features.
- All TPC-C queries are
implemented in detail according to a query plan. Barring a few short-cuts,
each query should actually produce results as in a real database.
- Multiversion
concurrency control, where each update effectively creates a new
data version. This scheme obviates the need for read locks.
- Reasonably sophisticated
(write) locking mechanism. In particular, locks are acquired in two
phases: initial intention locking
or latching
followed by mass conversion of latches to locks. Locking granularity can
extend from page down to row level.
- Scheduling of message
sends, message receives and application processing are
implemented in a way to mimic processing by current operating systems
(simplified though).
- Thread switching is
implemented explicitly and takes into account the working set and whether
the working set fits in the processor cache. Processor caches are not
implemented; instead, their impact is accounted for in thread switching
overhead.
- Queuing and latency impact of
processor-memory data transfers are
implemented. In particular, latencies of address bus, data bus and memory
channels are accounted for in iteratively estimating the CPI (avg cycles per instruction) for the processor.
- Oracle 9i like cache fusion
architecture which allows pooling of buffer caches from all nodes. As a result, a lot of local buffer cache
misses can be satisfied from some remote buffer cache.
- Inter-process
communication (IPC) messages between nodes are implemented
explicitly and are used for a variety of purposes including locking,
directory query/update, data transfer, etc.
- Remote disk accesses use iSCSI protocol which itself requires a couple
of IPC messages. These are called “storage messages” and are carried on
separate TCP connections from regular IPC messages. They may also be
carried on a separate fabric.
- Logging of
all modified data to log disks before a transaction is allowed to commit.
Logging could be directed to one or more nodes.
Owing to its very detailed
nature, the model requires a rather fine-grain calibration, which could
be difficult, but can be done reasonably for a well studied workloads like
TPC-C, TCP/IP processing, iSCSI processing, etc. On the positive side, however, the model
isn't dependent on high level input parameters such as buffer cache hit ratio
or number of IPC messages per transaction – things that will be easily
invalidated by a change in configuration.
The model is developed using the OPNET simulation package and uses the
complete TCP/IP/Ethernet MAC provided by OPNET
for networking. DCLUE also supports a
rudimentary implementation of SCTP as the
transport protocol. For more information on DCLUE, please consult the
following papers:
- K. Kant and A. Sahoo, “Clustered
DBMS Scalability under Unified Ethernet Fabric”, Proc. of ICPP, May
2005, Oslo, Norway
- K. Kant, A. Sahoo
and N. Jani, “DCLUE:
A Distributed Cluster Emulator”, to appear in IEICE Transactions, Special Issue on Parallel/Distributed Computing
and Networking, 2006. Also appears in Proc. of 2005 Opnetwork, Washington DC, Aug 2005.
The main
use of DCLUE will be in a series of ongoing projects dealing with fabrics for future
data centers. In particular, DCLUE will be used to evaluation real
application impact of a variety of link level, network level and transport
level fabric features. DCLUE can be made available in source form on a request
basis.