- The purpose of the database is to provide a simple NOSQL like storage.
- It should be lightweight and cross platform.
- It should sync over varous channels (containers).
- The containers should be dumb and content agnostic i.e. their content can be encrypted
- Large file content should be handled separately to keep the database lean and data should be loaded lazily
- The sync is simple and inspired by CouchDB, so there will be a deterministic winner in conflict situations
- Works offline
- Blockchain to avoid data corruption
- No single point of failure
- Multiple sync nodes, therefore backup is always in sync
- Simple but robust conflict resolution with no user interaction required
The current working copy of the live database is located in the apps subfolder in
~/Application Support where the user has no direct access to it.
A database can store the transactions log and the assets in multiple containers. These can be file packages on the local device or somewhere on the network. But containers can also live in cloud services like Cloud Drive, Dropbox etc.
Both the database and the container have to fit the same
contentType if available. Each database instance has its own
instanceID which is used with its entries in the transaction log and assets. Think of it as a per-device or per-installation separation.
Important actions like data changes are propagated through
In order to get sync working and to have a strategy for conflicts all records build a revision tree. Each
SeaRecord state that is saved gets its own unique
_rev property. Another property called
_revParent is holding the previous
_rev value the change was originating from. This way
_revParent is pointing to the parent node in the revision tree.
Deleted nodes have the property
_deleted = true. (TODO: Or just missing
The current implementation is similar to the CouchDB conflict resolution strategy. Out of the box it identifies a winner by the following rules:
- Deleted branches are ignored
- Deeper branches win
- Comparing the
_revproperties the higher value (string compare) wins
Other changes are silently ignored. In order to merge the data you will need to code a custom solution for yourself. Basically you will create a new version with the merged content on the winning branch and add deletions at all other branch ends.
The merge strategy you choose for your app is up to you, it can be e.g. one or a combination of the following:
- Present the versions to the user and let her choose
- Merge per property
A single data item which behaves
NSMutableDictionary like i.e. values can be set like this:
SeaRecord *rec = [[SeaRecord alloc] init]; rec[@"name"] = "John Doe"; rec[@"age"] = @42; [database saveRecord:rec];
But for convenience dynamic properties can be defined as well. Example:
@interface MyRecord : SeaRecord @property NSString *name; @property NSNumber *age; @end @implementation MyRecord @dynamic name; // These are important and required!!! @dynamic age; @end
A record can handle the following object types:
NSData- Small binary data, see
SeaAssetfor large binaries
SeaAsset- Files or other large binary data which should not stay in memory. More
SeaRecordReference- Lightweight reference to another
It is possible to use a
SeaRecord as a property value inside of another record. These relationships are very loose and do not have any cascading effects except that referred records will be saved before the parent record. You can inspect to have the actual record being loaded in the runtime record.
Warning Avoid circular references!
A local file or data bound to a MIME type is used as the
SeaAsset content. The actual data storage happens when it is used together with a
SeaRecord and then saved to a
SeaDabase. It is very likely that the content is loaded lazily when
.data is accessed.
.URL might also be a remote URL from a
SeaContainer if that is appropriate. Contents and file should never be changed, just create a new
SeaAsset and set it to the
SeaAsset *asset = [[SeaAsset alloc] initURL:url]
Persisted to the database additional meta data will be stored, like
SeaAsset is uniquely identified by the
instance ID and the
index that is also used for storing in the container.
A container can be requested to return all transactions that are newer than the current status. The status is a dictionary of
instanceID keys and the number of the last known
A container can also observe the transactions and emit a change notification. The current database should register for those notifications and trigger a sync.
The containers should always be ready for dumb sync between each other. It has to be "dumb" because the content could be encrypted or shared.
The containers have an inner structure that is similar across all implementations:
- A info container holding the
- A transaction folder holding per instance logs (block chains) of all changes. The entries logically start at 0 and then increment by 1. The actual implementation e.g. on the file system due to OS limitations is split up into multiple folders.
- An asset folder holding per instance copies of binary data objects like images, etc. The numbering is the same as for transactions. A deduplication has to be implemented on the application level.
<ContainerRoot>/ info.json transactions/ <Instance0>/ 1/ // The level of subfolders, each folder has max. 1000 entries 0 // Internally called `index` 1 2 ... <Instance1>/ ... assets/ <Instance0>/ 1/ // The level of subfolders, each folder has max. 1000 entries 0 // Internally called `index` 1 2 ... <Instance1>/ ...
This is the most classic container. On macOS it is able to observe changes and notify the database to sync.
SeaFileSystemContainer but adds file access synchronization to avoid conflicts especially for containers shared via Cloud Drive. It can be used both on macOS and iOS. It uses
NSFilePresenter to observe content changes and trigger a sync.
Additional safety for the synched data is achieved by block chain inspired writing of data. That means that each new block (see
SeaContainer "transactions" for details) holds the checksum (SHA2 / SHA256) of the previous block. Assets are indirectly hashed by the meta data stored in the
SeaRecord which again also holds a checksum of the file contents.
- Identification ID
- Block mode (1 byte)
- Size of data part in bytes, see point 7 of this list (4 bytes)
- Index number (4 bytes)
- Timestamp, usually a Lamport timestamp (4 bytes)
- Previous checksum over complete previous block content including header (32 bytes)
- Checksum over data part (32 bytes)
- Payload / Data
- If encryption is used it the algorithm is
- Random IV (96 bit / 12 bytes)
- The additional data is the header of the current block. TODO
- The tag is sent along with the cipher data.
- The password is mangled through PBKDF2 using a random salt (64 bit / 8 byte) and also a HMAC
SHA-256. More than 50,000 iterations are performed.
HMAC for verification is not required any more, due to the "tag" feature of GCM algorithm. It also causes less computations on a separate HMAC key. High PBKDF2 iterations are encouraged, due to Moores Law predictions.
Any checksum used in the implementation is
SHA-256 which corresponds to a family member of
SHA2, which is pretty well supported cross platform.
256 bit checksums seem to be sufficient. A general protection against manipulation without a proof-of-work seems to be overkill for the current scenarios. If an attacker would replace the whole chain or add a new instance overriding entries, there would not be any good protection right now. This is topic for an advanced implementation, once it becomes a requirement.
A Lamport timestamp is used instead of a regular timestamp to guarantee logical ordering.
This is a utility to store the data locally. For this implementation
SQLite is used, but it is basically a simple Key-Value-Store.
Plays nicely together with SeaDocumentController and does most things required to set up the
database property out of the box.
This controller can be used to conveniently feed tables etc. Just set the
database and an optional
recordType and the rest will be behave as expected.
Sea implementation project.
A macOS tool named "SeaInspector" is available for download to inspect the blockchain structure and other content related info.
- Shared root secret all blockchain build on?
- Authorize new instances
- Explore private / public key mechanisms for authorization
- Cryptree ideas for read/write access and revokation
- Change password without recoding the whole chain