Implementing your Sync backend
Understanding changes
objects
Synchronized changes (received by the app in pullChanges
and sent to the backend in pushChanges
) are represented as an object with raw records. Those only use raw table and column names, and raw values (strings/numbers/booleans) — the same as in Schema.
Deleted objects are always only represented by their IDs.
Example:
{
projects: {
created: [
{ id: 'aaaa', name: 'Foo', is_favorite: true },
{ id: 'bbbb', name: 'Bar', is_favorite: false },
],
updated: [
{ id: 'ccc', name: 'Baz', is_favorite: true },
],
deleted: ['ddd'],
},
tasks: {
created: [],
updated: [
{ id: 'tttt', name: 'Buy eggs' },
],
deleted: [],
},
...
}
Again, notice the properties returned have the format defined in the Schema (e.g. is_favorite
, not isFavorite
).
Valid changes objects MUST conform to this shape:
Changes = {
[table_name: string]: {
created: RawRecord[],
updated: RawRecord[],
deleted: string[],
}
}
Implementing pull endpoint
Expected parameters:
{
lastPulledAt: Timestamp,
schemaVersion: int,
migration: null | { from: int, tables: string[], columns: { table: string, columns: string[] }[] }
}
Expected response:
{ changes: Changes, timestamp: Timestamp }
- The pull endpoint SHOULD take parameters and return a response matching the shape specified above.
This shape MAY be different if negotiated with the frontend (however, frontend-side
pullChanges()
MUST conform to this) - The pull endpoint MUST return all record changes in all collections since
lastPulledAt
, specifically:- all records that were created on the server since
lastPulledAt
- all records that were updated on the server since
lastPulledAt
- IDs of all records that were deleted on the server since
lastPulledAt
- record IDs MUST NOT be duplicated
- all records that were created on the server since
- If
lastPulledAt
is null or 0, you MUST return all accessible records (first sync) - The timestamp returned by the server MUST be a value that, if passed again to
pullChanges()
aslastPulledAt
, will return all changes that happened since this moment. - The pull endpoint MUST provide a consistent view of changes since
lastPulledAt
- You should perform all queries synchronously or in a write lock to ensure that returned changes are consistent
- You should also mark the current server time synchronously with the queries
- This is to ensure that no changes are made to the database while you're fetching changes (otherwise some records would never be returned in a pull query)
- If it's absolutely not possible to do so, and you have to query each collection separately, be sure to return a
lastPulledAt
timestamp marked BEFORE querying starts. You still risk inconsistent responses (that may break app's consistency assumptions), but the next pull will fetch whatever changes occured during previous pull. - An alternative solution is to check for the newest change before and after all queries are made, and if there's been a change during the pull, return an error code, or retry.
- If
migration
is not null, you MUST include records needed to get a consistent view after a local database migration- Specifically, you MUST include all records in tables that were added to the local database between the last user sync and
schemaVersion
- For all columns that were added to the local app database between the last sync and
schemaVersion
, you MUST include all records for which the added column has a value other than the default value (0
,''
,false
, ornull
depending on column type and nullability) - You can determine what schema changes were made to the local app in two ways:
- You can compare
migration.from
(local schema version at the time of the last sync) andschemaVersion
(current local schema version). This requires you to negotiate with the frontend what schema changes are made at which schema versions, but gives you more control - Or you can ignore
migration.from
and only look atmigration.tables
(which indicates which tables were added to the local database since the last sync) andmigration.columns
(which indicates which columns were added to the local database to which tables since last sync). - If you use
migration.tables
andmigration.columns
, you MUST whitelist values a client can request. Take care not to leak any internal fields to the client.
- You can compare
- Specifically, you MUST include all records in tables that were added to the local database between the last user sync and
- Returned raw records MUST match your app's Schema
- Returned raw records MUST NOT not contain special
_status
,_changed
fields. - Returned raw records MAY contain fields (columns) that are not yet present in the local app (at
schemaVersion
-- but added in a later version). They will be safely ignored. - Returned raw records MUST NOT contain arbitrary column names, as they may be unsafe (e.g.
__proto__
orconstructor
). You should whitelist acceptable column names. - Returned record IDs MUST only contain safe characters
- Default WatermelonDB IDs conform to
/^[a-zA-Z0-9]{16}$/
_-.
are also allowed if you override default ID generator, but'"\/$
are unsafe
- Default WatermelonDB IDs conform to
- Changes SHOULD NOT contain collections that are not yet present in the local app (at
schemaVersion
). They will, however, be safely ignored.- NOTE: This is true for WatermelonDB v0.17 and above. If you support clients using earlier versions, you MUST NOT return collections not known by them.
- Changes MUST NOT contain collections with arbitrary names, as they may be unsafe. You should whitelist acceptable collection names.
Implementing push endpoint
- The push endpoint MUST apply local changes (passed as a
changes
object) to the database. Specifically:- create new records as specified by the changes object
- update existing records as specified by the changes object
- delete records by the specified IDs
- If the
changes
object contains a new record with an ID that already exists, you MUST update it, and MUST NOT return an error code.- (This happens if previous push succeeded on the backend, but not on frontend)
- If the
changes
object contains an update to a record that does not exist, then:- If you can determine that this record no longer exists because it was deleted, you SHOULD return an error code (to force frontend to pull the information about this deleted ID)
- Otherwise, you MUST create it, and MUST NOT return an error code. (This scenario should not happen, but in case of frontend or backend bugs, it would keep sync from ever succeeding.)
- If the
changes
object contains a record to delete that doesn't exist, you MUST ignore it and MUST NOT return an error code- (This may happen if previous push succeeded on the backend, but not on frontend, or if another user deleted this record in between user's pull and push calls)
- If the
changes
object contains a record that has been modified on the server afterlastPulledAt
, you MUST abort push and return an error code- This scenario means that there's a conflict, and record was updated remotely between user's pull and push calls. Returning an error forces frontend to call pull endpoint again to resolve the conflict
- If application of all local changes succeeds, the endpoint MUST return a success status code.
- The push endpoint MUST be fully transactional. If there is an error, all local changes MUST be reverted on the server, and en error code MUST be returned.
- You MUST ignore
_status
and_changed
fields contained in records inchanges
object - You SHOULD validate data passed to the endpoint. In particular, collection and column names ought to be whitelisted, as well as ID format — and of course any application-specific invariants, such as permissions to access and modify records
- You SHOULD sanitize record fields passed to the endpoint. If there's something slightly wrong with the contents (but not shape) of the data (e.g.
user.role
should beowner
,admin
, ormember
, but user sent empty string orabcdef
), you SHOULD NOT send an error code. Instead, prefer to "fix" errors (sanitize to correct format).- Rationale: Synchronization should be reliable, and should not fail other than transiently, or for serious programming errors. Otherwise, the user will have a permanently unsyncable app, and may have to log out/delete it and lose unsynced data. You don't want a bug 5 versions ago to create a persistently failing sync.
- You SHOULD delete all descendants of deleted records
- Frontend should ask the push endpoint to do so as well, but if it's buggy, you may end up with permanent orphans
Tips on implementing server-side changes tracking
If you're wondering how to actually implement consistent pulling of all changes since the last pull, or how to detect that a record being pushed by the user changed after lastPulledAt
, here's what we recommend:
- Add a
last_modified
field to all your server database tables, and bump it toNOW()
every time you create or update a record. - This way, when you want to get all changes since
lastPulledAt
, you query records whoselast_modified > lastPulledAt
. - The timestamp should be at least millisecond resolution, and you should add (for extra safety) a MySQL/PostgreSQL procedure that will ensure
last_modified
uniqueness and monotonicity- Specificaly, check that there is no record with a
last_modified
equal to or greater thanNOW()
, and if there is, increment the new timestamp by 1 (or however much you need to ensure it's the greatest number) - An example of this for PostgreSQL can be found in Kinto
- This protects against weird edge cases - such as records being lost due to server clock time changes (NTP time sync, leap seconds, etc.)
- Specificaly, check that there is no record with a
- Of course, remember to ignore
last_modified
from the user if you do it this way. - An alternative to using timestamps is to use an auto-incrementing counter sequence, but you must ensure that this sequence is consistent across all collections. You also leak to users the amount of traffic to your sync server (number of changes in the sequence)
- To distinguish between
created
andupdated
records, you can also store server-sideserver_created_at
timestamp (if it's greater thanlast_pulled_at
supplied to sync, then record is to becreated
on client, if less than — client already has it and it is to beupdated
on client). Note that this timestamp must be consistent with last_modified — and you must not use client-createdcreated_at
field, since you can never trust local timestamps.- Alternatively, you can send all non-deleted records as all
updated
and Watermelon will do the right thing in 99% of cases (you will be slightly less protected against weird edge cases — treatment of locally deleted records is different). If you do this, passsendCreatedAsUpdated: true
tosynchronize()
to supress warnings about records to be updated not existing locally.
- Alternatively, you can send all non-deleted records as all
- You do need to implement a mechanism to track when records were deleted on the server, otherwise you wouldn't know to push them
- One possible implementation is to not fully delete records, but mark them as DELETED=true
- Or, you can have a
deleted_xxx
table with just the record ID and timestamp (consistent with last_modified) - Or, you can treat it the same way as "revoked permissions"
- If you have a collaborative app with any sort of permissions, you also need to track granting and revoking of permissions the same way as changes to records
- If permission to access records has been granted, the pull endpoint must add those records to
created
- If permission to access records has been revoked, the pull endpoint must add those records to
deleted
- Remember to also return all descendants of a record in those cases
- If permission to access records has been granted, the pull endpoint must add those records to
Existing Backend Implementations
Note that those are not maintained by WatermelonDB, and we make no endorsements about quality of these projects:
Did you make one? Please contribute a link!