Why did we choose NATS.io

Yaniv Ben Hemo
3 min readJul 26, 2021

Our story behind choosing of NATS.IO.

To understand our choice, we need to start by understanding our needs, define KPIs, and then compare the possible solutions and their deployment options, protection abilities, scaling, support, community, and not less important — the people behind the projects.

At its core, NATS is about publishing and listening for messages. NATS has a very small footprint compared to Kafka. Kafka is more mature compared to NATS and performs very well with massive data streams. NATS has a subset of Kafka’s features as it is focused on a narrower set of use cases. NATS is designed for scenarios where high performance and low latency are critical but losing some data is acceptable (We solved this one).

Last but not least, NATS is easy to operate — deploy over Kubernetes, monitor by Grafana, encrypt traffic, and much more.

Credit: https://nats.io/

We use NATS, specifically a feature called STAN (replaced by JetStream), which is NATS’s older streaming service for moving bits of data between the different engine services our platform offers to our users -

*Bear in mind that the following might not be needed while using JetStream*

What was required to take under consideration and self-developed which doesn’t exist out-of-the-box in STAN -

  • NATS Massage size

NATS has a message size limitation enforced by the server and communicated to the client during connection setup. Currently, the limit is 1MB. For us or any other solution/use case such as ours, this means that the data receiver service should be able to split the data into small fragments no bigger than 1MB.

  • Data Routing

When fragments of data travel between different STAN channels/microservices, a particular mechanism requires orchestrating and navigating the fragments from one stage to another with dynamic path management, meaning that a specific message should pass through actions 1,3,5. Still, other messages, which belongs to different source/pipeline should pass through 2,3,5,6.

  • Data loss protection

When working in a cloud-native environment, stateless microservices, and the ability to scale in and out, there are some challenges. Imagine the following scenario: Some stage which resides in a pod processing ingesting fragments of data for some reason fail during processing a fragment. Because it’s a pod and managed by Kubernetes, it will probably restart, but what happens to that lost fragment? Probably lost…

We built our atomic system using a cache database to preserve the while-in-process fragments were restored from failure.

NATS’ new streaming service — JetStream.

JetStream was created to solve the problems identified with streaming technology today — complexity, fragility, and a lack of scalability. Some technologies address these better than others, but no current streaming technology is genuinely multi-tenant, horizontally scalable, and supports multiple deployment models. No technology we are aware of can scale from edge to cloud under the same security context while having complete deployment observability for operations.

credit: NATS docs

Besides great technology and architecture — NATS lead by software heroes with a great community and roadmap. It is essential as the technical aspects itself.

I hope our use case helped you make a better decision, and as NATS support us, we would love to support others with their journey with NATS.

--

--

Yaniv Ben Hemo

A developer, technologist, and entrepreneur. Co-Founder and CEO at Memphis.dev. Trying to make developers' lives a bit easier.