Schemata
To ensure high level of integrity and validity during write operations to the data storage and exchange of Kafka messages, we use JSON Schemas and AsyncAPI accordingly.
JSON Schema
JSON schemas are the fundamental building block of the whole DIVA architecture. In Diva 2 we have implemented a granular structured hierarchical schema system that has saved us a lot of unnecessary complexity and hundreds lines of code compared to DIVA 1.
Every existing piece of data in DIVA is well defined in a schema. DIVA never accepts data that is not explicitly defined in a schema. This in turn means that schemas are the first place in the system to be tweaked when developing new functionality. Schema-drive development engages us to first clearly define the data model and then start with implementation. However, the most important benefit comes from the fact that thanks to the JSON Schema tools we were able to implement fully automated validation of the data in our services.
You will find all the schemas in the core/schemata/json-schema/
directory.
The directory is structured like follows:
artifacts
- common reusable definitionsresource
- Resource and all subresources schemasasset
- Asset and all subassets schemasreview
- Review schemahistory
- History schemauser
- User schema
Json schema determine our data model. Since the attributes of different resource types can vary significantly, we define other sub-resources such as PDF, CSV, PNG, etc. for the resources and assets. The type of sub-resource is generally determined by the format or type of data source. E.g. MongoDB, Hadoop or an API could be a sub-resource. The entity abstracts common attributes that are "inherited" by all other entities.
Bottom-to-top schemas
Practically, we define the schemas using the bottom-to-top approach. This means that the sub-resources do not extend the Resource schema, rather they are included in the Resource.
With the bottom-to-top approach, we include the subordinated schemas in the master schema according to certain conditions. Basically, for example, the resource schema contains all the schemas definition of the sub-resources. The advantages of this way of constructing the schemas are especially evident in automated validation.
We will take a Resource example to demonstrate the approach:
{
...
"allOf": [
{
"properties": {
"resourceType": {
...
}
}
},
{
"$ref": "/entity" // extend Entity schema
},
// include sub-resource schema, if resourceType corresponds to this sub-resource
{
"if": {
"properties": {
"resourceType": {
"const": "urbanPulseSensor"
}
}
},
"then": {
"$ref": "/resourceUrbanPulseSensor"
}
},
]
}
This schema tells us that if we have an entity of type Resource and this Resource has type urbanPulseSensor
, we need to validate
against a composition of Entity, Resource and UrbanPulseSensor schemas.
Defining a schema
The biggest disadvantage of schemas - they must be written. The important thing to note here is that each attribute has a strict and detailed definition. And as mentioned before, every possible piece of data must be described in a schema, whether it is a resource, user, etc. We will demonstrate the process of schema creation using a new resource type that should be added to Diva, e.g., Excel file resource.
First, we need to find the right place for the new scheme. Since we know that this will be a new file Resource, we follow the path below:
schemata
└───json-schema
│ └───resources
│ │ resource.json
│ │ resourceFile.json
│ │ resourceFileApplicationPdf.json
│ │ ...
│ │ resourceFileExcel.json # our new definition for file Resource
Next create a resourceFileExcel.json
file and write down all required definitions here. You can use the existing schemas for other types as a guideline.
caution
Please define the schemas in detail and as strictly as possible! Always provide attributes with a title and description. Wherever it makes sense, define concrete type, format, value range etc. and give a couple of examples.
Since Excel is a file and a subtype of File Resource, we need to include the Excel Resource in resourceFile.json
according to the bottom-to-top approach:
// resourceFile.json
{
...
"if": {
"properties": { "mimeType": { "const": "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet" } }
},
"then": {
"$ref": "/resourceFileExcel"
}
},
}
This is the general procedure. There is no one ultimate pattern for schema creation for all possible resources in the world. Schema composition is a logical process and should be derived from the real world.
Validation
There are many different tools in the JSON Schema ecosystem. For validating the data against a schema we use ajv library for Node.js. We recommend the use of similar tools for your programming language.
Generally all write operations to the main MongoDB storage must pass validation first.
So all management services must always implement the validation. The management services like Resource or User Management
fetch the required schema on a start time from the Schema Registry service. Important to note that the services have only
to fetch the root schema (e.g. resource.json
or user.json
) thanks to the bottom-to-top approach and automated schema resolution
in ajv. The services with read-only access to the entities can implement the validation, but do not have to.