docs

2018-12-08 20:57:39 +01:00 · 2018-12-08 20:57:39 +01:00 · 5cbaf71227
parent 35c5365125
commit 5cbaf71227
1 changed files with 100 additions and 17 deletions
--- a/README.md
+++ b/README.md
@ -4,28 +4,32 @@ available for C++17, will support C++ 11 again soon.  MIT licensed.

 [LMDB](http://www.lmdb.tech/doc/index.html) is an outrageously fast
 key/value store with semantics that make it highly interesting for many
-applications. Of specific note, besides speed, is the full support for
-transactions and read/write concurrency. LMDB is also famed for its
-robustness.. **when used correctly**.
+applications.  Of specific note, besides speed, is the full support for
+transactions and good read/write concurrency.  LMDB is also famed for its
+robustness..  **when used correctly**.

 The design of LMDB is elegant and simple, which aids both the performance
-and stability. The downside of this elegant design is a plethora of rules
-that need to be followed to not break things. In other words, LMDB delivers
-great things but only if you use it exactly right.
+and stability. The downside of this elegant design is a [nontrivial set of
+rules](http://www.lmdb.tech/doc/starting.html)
+that [need to be followed](http://www.lmdb.tech/doc/group__mdb.html) to not break things. In other words, LMDB delivers
+great things but only if you use it exactly right. This is by [conscious
+design](https://twitter.com/hyc_symas/status/1056168832606392320). 

 Among the things to keep in mind when using LMDB natively:

 * Never open a database file more than once anywhere in your process
 * Never open more than one transaction within a thread
-   * .. unless they are all Read Only and have MDB_NOTLS set
+   * .. unless they are all read-only and have MDB_NOTLS set
 * When opening a named database, no other threads may do that at the same time
 * Cursors within RO transactions need freeing, but cursors within RW
 transactions must not be freed. 

-Breaking these rules causes no errors, but does lead to silent data
-corruption, missing updates, or random crashes.
+Breaking these rules causes no immediate errors, but does lead to silent
+data corruption, missing updates, or random crashes. Again, this is not an
+actual bug in LMDB, it means that LMDB expects you to use it according to
+its exact rules. And who are we to disagree?

-This LMDB library aims to deliver the full LMDB performance while
+The `lmdb-safe` library aims to deliver the full LMDB performance while
 programmatically making sure the LMDB semantics are adhered to, with very
 limited overhead.

@ -33,9 +37,23 @@ Most common LMDB functionality is wrapped within this library but the native
 MDB handles are all available should you want to use functionality we did
 not (yet) cater for.

+# Status
+Very early. If using this tiny library, be aware things might change
+rapidly. To use, add `lmdb-safe.cc` and `lmdb-safe.hh` to your project.
+
+# Philosophy
+This library tries to not restrict your use of LMDB, nor make it slower,
+except on operations that should be rare. The native LMDB handles
+(Environment, DBI, Transactions & Cursors) are all available for your direct
+use if need be.
+
+When using `lmdb-safe`, errors "that should never happen" are turned into
+exceptions. An error that merely indicates that a key can not be found is
+passed on as a regular LMDB error code.
+
 # Example
 The following example has no overhead compared to native LMDB, but already
-exhibits several ways in which lmdb-safe is easier and safer to use:
+exhibits several ways in which lmdb-safe automates LMDB constraints:
 ```
  auto env = getMDBEnv("./database", 0, 0600);
  auto dbi = env->openDB("example", MDB_CREATE);
@ -43,11 +61,11 @@ exhibits several ways in which lmdb-safe is easier and safer to use:
 ```

 The first line requests an LMDB environment for a database hosted in
-`./database`. **Within LMDB, it is not allowed to open a database file more
-than once**, not even from other threads, not even when using a different LMDB
-handle. `getMDBEnv` keeps a registry of LMDB environments, keyed to the
-exact inode. If another part of your process requests access to the same
-inode, it will get the same environment. 
+`./database`.  **Within LMDB, it is not allowed to open a database file more
+than once**, not even from other threads, not even when using a different
+LMDB handle.  `getMDBEnv` keeps a registry of LMDB environments, keyed to
+the exact inode & flags.  If another part of your process requests access to
+the same inode, it will get the same environment. `MDBEnv` is threadsafe.

 On the second line, a database is opened within our environment. The
 semantics of opening or creating a database within LMDB are tricky. With
@ -75,7 +93,7 @@ transaction is aborted automatically. To commit or abort, use `commit()` or

 LMDB is so fast because it does not copy data unless it really needs to.
 Memory bandwidth is a huge determinant of performance on modern CPUs. This
-wrapper agrees and using modern C++, it is possible to seemlessly use
+wrapper agrees, and using modern C++ makes it possible to seemlessly use
 'views' on data without copying them. Using these techniques, the call to
 `txn.put()` sets the "lmdb" string to "great", without making additional
 copies. 
@ -87,3 +105,68 @@ disk.
 In the final line, we commit the transaction, after which it also becomes
 available for other threads and processes. 

+A slightly expanded version of this code can be found in
+[basic-example.cc](basic-example.cc).
+
+
+# Cursors, transactions
+This example shows how to use cursors and how to mix `lmdb-safe` with direct
+calls to mdb.
+
+```
+  auto env = getMDBEnv("./database", 0, 0600);
+  auto dbi = env->openDB("huge", MDB_CREATE);
+  auto txn = env->getRWTransaction();
+
+  unsigned int limit=20000000;
+```
+
+This is the usual opening sequence.
+
+```
+  auto cursor=txn.getCursor(dbi);
+  MDB_val key, data;
+  int count=0;
+  cout<<"Counting records.. "; cout.flush();
+  while(!cursor.get(key, data, count ? MDB_NEXT : MDB_FIRST)) {
+    count++;
+  }
+  cout<<"Have "<<count<<"!"<<endl;
+```
+
+This describes how we generate a cursor for the `huge` database and iterate
+over it to count the number of keys in there. We pass two LMDB native
+`MDB_val` structs to the cursor `get` function. These do not get copies of
+all the millions of potential keys in the `huge` database - they only
+contain pointers to that data. Because of this, we can count 20 million
+records in under a second (!).
+  
+```
+  cout<<"Clearing records.. "; cout.flush();
+  mdb_drop(txn, dbi, 0); // clear records
+  cout<<"Done!"<<endl;
+```
+
+Here we drop al keys from the database, which too happens nearly
+instantaneously. Note that we pass our `txn` (which is a class) to the
+native `mdb_drop` function which we did not wrap. This is possible because
+`txn` converts to an `MDB_env*` if needed.
+
+```
+  cout << "Adding "<<limit<<" values  .. "; cout.flush();
+  for(unsigned int n = 0 ; n < limit; ++n) {
+    txn.put(dbi, MDBVal(n), MDBVal(n));
+  }
+  cout <<"Done!"<<endl;
+  cout <<"Calling commit.. "; cout.flush();
+  txn.commit();
+  cout<<"Done!"<<endl;
+```
+
+Here we add 20 million values using the `MDBVal` wrapper which converts our
+unsigned integer into an `MDB_val`. We then commit the `mdb_drop` and the 20
+million puts. All this happened in less than 20 seconds.
+
+Had we created our database with the `MDB_INTEGERKEY` option and added the
+`MDB_APPEND` flag to `txn.put`, the whole process would have taken around 5
+seconds.