Sea Framework

Sea.objc

SeaDatabase

  • The purpose of the database is to provide a simple NOSQL like storage.
  • It should be lightweight and cross platform.
  • It should sync over varous channels (containers).
  • The containers should be dumb and content agnostic i.e. their content can be encrypted
  • Large file content should be handled separately to keep the database lean and data should be loaded lazily
  • The sync is simple and inspired by CouchDB, so there will be a deterministic winner in conflict situations
  • Works offline
  • Blockchain to avoid data corruption
  • No single point of failure
  • Multiple sync nodes, therefore backup is always in sync
  • Simple but robust conflict resolution with no user interaction required

Implementation

The current working copy of the live database is located in the apps subfolder in ~/Application Support where the user has no direct access to it.

A database can store the transactions log and the assets in multiple containers. These can be file packages on the local device or somewhere on the network. But containers can also live in cloud services like Cloud Drive, Dropbox etc.

Both the database and the container have to fit the same databaseID and contentType if available. Each database instance has its own instanceID which is used with its entries in the transaction log and assets. Think of it as a per-device or per-installation separation.

Notifications

Important actions like data changes are propagated through NSNotificationCenter.

  • SeaRecordDidSaveNotification
  • SeaRecordDidChangeNotification

SeaRevisionTree

In order to get sync working and to have a strategy for conflicts all records build a revision tree. Each SeaRecord state that is saved gets its own unique _rev property. Another property called _revParent is holding the previous _rev value the change was originating from. This way _revParent is pointing to the parent node in the revision tree.

Deleted nodes have the property _deleted = true. (TODO: Or just missing _rev?)

Conflict resolution

The current implementation is similar to the CouchDB conflict resolution strategy. Out of the box it identifies a winner by the following rules:

  1. Deleted branches are ignored
  2. Deeper branches win
  3. Comparing the _rev properties the higher value (string compare) wins

Other changes are silently ignored. In order to merge the data you will need to code a custom solution for yourself. Basically you will create a new version with the merged content on the winning branch and add deletions at all other branch ends.

The merge strategy you choose for your app is up to you, it can be e.g. one or a combination of the following:

  1. Present the versions to the user and let her choose
  2. Merge per property

SeaRecord

A single data item which behaves NSMutableDictionary like i.e. values can be set like this:

SeaRecord *rec = [[SeaRecord alloc] init];
rec[@"name"] = "John Doe";
rec[@"age"] = @42;
[database saveRecord:rec];

But for convenience dynamic properties can be defined as well. Example:

@interface MyRecord : SeaRecord
@property NSString *name;
@property NSNumber *age;
@end

@implementation MyRecord
@dynamic name; // These are important and required!!!
@dynamic age;
@end

A record can handle the following object types:

  • NSString
  • NSNumber
  • NSDecimal
  • NSDate
  • NSData - Small binary data, see SeaAsset for large binaries
  • NSDictionary
  • NSArray
  • SeaAsset - Files or other large binary data which should not stay in memory. More
  • SeaRecordReference - Lightweight reference to another SeaRecord entry

References / Relations

It is possible to use a SeaRecord as a property value inside of another record. These relationships are very loose and do not have any cascading effects except that referred records will be saved before the parent record. You can inspect to have the actual record being loaded in the runtime record.

Warning Avoid circular references!

SeaAsset

A local file or data bound to a MIME type is used as the SeaAsset content. The actual data storage happens when it is used together with a SeaRecord and then saved to a SeaDabase. It is very likely that the content is loaded lazily when .data is accessed. .URL might also be a remote URL from a SeaContainer if that is appropriate. Contents and file should never be changed, just create a new SeaAsset and set it to the SeaRecord.

SeaAsset *asset = [[SeaAsset alloc] initURL:url]

Persisted to the database additional meta data will be stored, like size, checksum and type. Each SeaAsset is uniquely identified by the instance ID and the index that is also used for storing in the container.

SeaContainer

A container can be requested to return all transactions that are newer than the current status. The status is a dictionary of instanceID keys and the number of the last known index value.

A container can also observe the transactions and emit a change notification. The current database should register for those notifications and trigger a sync.

The containers should always be ready for dumb sync between each other. It has to be "dumb" because the content could be encrypted or shared.

Inner structure

The containers have an inner structure that is similar across all implementations:

  1. A info container holding the containerID etc.
  2. A transaction folder holding per instance logs (block chains) of all changes. The entries logically start at 0 and then increment by 1. The actual implementation e.g. on the file system due to OS limitations is split up into multiple folders.
  3. An asset folder holding per instance copies of binary data objects like images, etc. The numbering is the same as for transactions. A deduplication has to be implemented on the application level.

Example:

<ContainerRoot>/
    info.json
    transactions/
        <Instance0>/
            1/           // The level of subfolders, each folder has max. 1000 entries
                0        // Internally called `index`
                1
                2
            ...
        <Instance1>/
        ...
    assets/
        <Instance0>/
            1/           // The level of subfolders, each folder has max. 1000 entries
                0        // Internally called `index`
                1
                2
            ...
        <Instance1>/
            ...

SeaFileSystemContainer

This is the most classic container. On macOS it is able to observe changes and notify the database to sync.

SeaDocumentContainer

Inherits from SeaFileSystemContainer but adds file access synchronization to avoid conflicts especially for containers shared via Cloud Drive. It can be used both on macOS and iOS. It uses presentedItemDidChange: from NSFilePresenter to observe content changes and trigger a sync.

SeaBlockChain

Additional safety for the synched data is achieved by block chain inspired writing of data. That means that each new block (see SeaContainer "transactions" for details) holds the checksum (SHA2 / SHA256) of the previous block. Assets are indirectly hashed by the meta data stored in the SeaRecord which again also holds a checksum of the file contents.

Internal Format of a Block

  1. Identification ID SEA (3 bytes)
  2. Block mode (1 byte)
  3. Size of data part in bytes, see point 7 of this list (4 bytes)
  4. Index number (4 bytes)
  5. Timestamp, usually a Lamport timestamp (4 bytes)
  6. Previous checksum over complete previous block content including header (32 bytes)
  7. Checksum over data part (32 bytes)
  8. Payload / Data

Discussion

32bit has been preferred over 64bit for the first implementation due to limited support in Javascript for the later. Over the "mode" it will be possible to tweak the format if required even on a per block basis.

Crypto

  • If encryption is used it the algorithm is AES-256-GCM
  • Random IV (96 bit / 12 bytes)
  • The additional data is the header of the current block. TODO
  • The tag is sent along with the cipher data.
  • The password is mangled through PBKDF2 using a random salt (64 bit / 8 byte) and also a HMAC SHA-256. More than 50,000 iterations are performed.

Discussion

HMAC for verification is not required any more, due to the "tag" feature of GCM algorithm. It also causes less computations on a separate HMAC key. High PBKDF2 iterations are encouraged, due to Moores Law predictions.

Checksums

Any checksum used in the implementation is SHA-256 which corresponds to a family member of SHA2, which is pretty well supported cross platform.

Discussion

256 bit checksums seem to be sufficient. A general protection against manipulation without a proof-of-work seems to be overkill for the current scenarios. If an attacker would replace the whole chain or add a new instance overriding entries, there would not be any good protection right now. This is topic for an advanced implementation, once it becomes a requirement.

SeaLamport

A Lamport timestamp is used instead of a regular timestamp to guarantee logical ordering.

SeaStorage

This is a utility to store the data locally. For this implementation SQLite is used, but it is basically a simple Key-Value-Store.

Sea.iOS

SeaUIDocument

Sea.macOS

SeaNSDocument

Plays nicely together with SeaDocumentController and does most things required to set up the database property out of the box.

SeaNSArrayController

This controller can be used to conveniently feed tables etc. Just set the database and an optional recordType and the rest will be behave as expected.

Sea.JS

See separate Sea implementation project.

Appendix

Inspector

A macOS tool named "SeaInspector" is available for download to inspect the blockchain structure and other content related info.

Ideas

Manipulation Prevention / Crypto++

  • Shared root secret all blockchain build on?
  • Authorize new instances
  • Explore private / public key mechanisms for authorization
  • Cryptree ideas for read/write access and revokation
  • Change password without recoding the whole chain

Related Projects