Software Development

Our design and development journey to Redis cluster support in OpenCTI

Jul 30, 2023 8 min read

Unusually this article will not focus on a new threat intelligence features of the OpenCTI platform but on our experience to adapt the product in order to support a Redis Cluster environnement (from OCTI 5.7.X). At the beginning of the OpenCTI journey we used some Redis features that are unfortunalty not compatible with the clustered version, this article aims to explain what we have changed to finally implement the support of Redis cluster while keeping the compatibility for the single node deployement.


OpenCTI Redis usage

To better understand the challenge, let’s start to explain why we used Redis in the first place in the context of OpenCTI. As described below, Redis database is a central piece of the platform technological stack.

OpenCTI techical stack

Shared sessions

As OpenCTI supports clustering deployment for the platform, the user sessions are stored in Redis. We have chosen to store sessions in Redis and not in ElasticSearch/OpenSearch because the database has feature like aut-timeout which is essential when it comes to handle sessions but also because store the session in memory allows greater performances when fetching the object multiple times in a short amount of time.

Cluster management

Recommandation for optimal performances and high availability is to deploy multiple OpenCTI platform instances as a cluster and load balance workers and jobs across those instances. Each instance registers itself in specific keys for share information in real time about:

  • Quorum, versions and alive signals.
  • Background processes launched and locked.
  • Any other useful information to keep the cluster consistent.
Cluster status

Locks

OpenCTI relies on ElasticSearch/OpenSearch to store the knowledge (both entities and relationships). We’ve made this choice because this database system has the best coverage of our use cases for data volume, ingestion performances, full text search and aggregation/correlation. However, this database is not transactional. To prevent concurrency issues we implemented a fast locking system (thanks to Redlock) which utilitizes Redis under the hood. The locking system has, by design, a short expiration time so Redis feats perfectly for this kind of usage (same system as session auto-timeout).

Creation workflow with locks and resolution

Ingestion observability

In OpenCTI, the ingestion mechanism of the different connectors can be monitored directly in the user interface. To handle this use case, the connectors declare “works” (generally one STIX bundle) and each work is constitued of a certain amount of operations (generally STIX bundle chunks). When a worker have processed a chunk, the number of processed operations is incremented.

Counting operations within a work

This permanent monitoring can be IO-intensive so Redis is used to maintain the count of all processed operations. At the end of the work, the final numbers are stored in ElasticSearch/OpenSearch.

Publish-subscribe for users and caching

Redis is used as a pub/sub system to notify the users when something changes in the platform (creation, modification and deletion) but also to notify the OpenCTI internal cache that some pieces of data need to be refreshed (invalidation).

Stream(s)

Finally, Redis is also used to store all the threat intelligence information stream which represents the whole knowledge operations within the platform (creation, modification and deletion). Multiple built-in OpenCTI features rely on this stream to be able to take decision and react in real time (notifications, synchronization, rule engine, etc.).

Stream usage within OpenCTI

Where we start and the limitations

Based on all the features described above, the initial design was done for a single node Redis deployment. Basically, to support OpenCTI use cases, we mainly used 3 important features of the Redis database that have some direct impacts on our path to support the clustering mode.

Multiple logical databases

We decided to use the capability of multiple databases in Redis to split the different kinds of keys we need to store. At the beginning, it seemed to be a good idea to separate the concerns and store the main data in the database 0 and the tracking/monitoring in the database 1. Actually it was not a bad choice but this feature is not available in cluster mode.

Pattern key scanning

In OpenCTI, some APIs (and user interface screens) are directly related to functions used to list keys stored in the Redis database. For instance, the list of sessions in the settings section which allows an OpenCTI administrator to visualize (and kill) them.

In order to do that we used the scan command to get all the keys starting with session:*. This is not a bad way to do it but, in cluster mode, it is necessary to iterate over all nodes to start a scan for each one of them. It seems doable but that is changing a lot the source code logic between single-node and cluster modes.

Get multiple keys in one call

Redis offers the powerful mget command that allows to fech multiple keys in the same query. This feature is very powerful but this is not always possible to use it within a Redis cluster. This command can only be used in cluster if all the keys are located on the same node. To achieve this, it is generally needed to establish a strategy depending of the usage to collocate some keys by forcing the hash computation.

Redis client with ioredis

As the OpenCTI platform is developed in NodeJS, we decided to use the ioredis client. At the beginning, we were simply using the default client builder that is not exactly the same when it comes to use it against a Redis cluster. Depending on the used Redis, a different client may be needed. One challenge for us was also to find a way to support both Redis single-node and Redis cluster without developping two different clients.

What have we done?

To support Redis cluster, we had no choice to go through all the limitations and change our design approach and our code. Let’s review the different usages and the impacts.

Redis client

On the client side, we decided to find a way to use the same approach using a single node or a cluster. For that we focused on using features available on both approaches. Depending of the declared configuration for the platform we instanciate the right client and then simply use it without any specific code.

Observability, pub-sub and stream

We did nothing on that part as the current implementation was fully supported by Redis cluster.

Multiple logical databases

We just removed the usage of this feature. All instanciated clients now just connect to the database 0 as data sharding is direclty handle by the cluster.

Locks

The management of the locks was a bit specific because of the usage of the redlock libraryThis library is leveraging the mget command and some scripting to get the locks as quickly as possible. That is great but unfortunately prevent the ability to shard the locks accross the cluster.

Because locks are created and fetched very often in a short amount of time, this is a use case where forcing collocation makes sense. Then we have prefixed all lock keys with {lock} to force Redis to collocate the keys and therefore have the ability to continue to use the redlock library without any issue.

Sessions and cluster management

For that part we decided to change the design of how OpenCTI is listing the keys to avoid using the scan command. Therefore, we have created a new approach using a Redis SET to maintain the list of keys linked to the same needs. This approach was possible because sessions and cluster information represent a limited amount of data.

With this new design, it is now possible to shard all the keys for sessions and cluster information, and use the SET to get the list of information when needed. Because of the shards, all keys in the SET must be fetched independently with a simple get command.

Conclusion

We hope this article helped you to understand the usage of Redis within the OpenCTI platform. Also, we have written it to provides every developers thinking about moving to Redis Cluster with relavant information and lessons learned. If you have some knowledge about Redis and want to help us using it better, don’t hesitate to join the community slack channel!

Stay up to date with everything at Filigran

Sign up for our newsletter and get bi-monthly updates of Filigran major events: product updates, upcoming events, latest content and more.