Site icon Vinsguru

NATS – Server Clustering

Overview:

In the previous article on NATS, we had discussed the basics of NATS set up and its features.

NATS is a very lightweight, high performance messaging server – It makes it as a good choice for modern microservices architecture to solve the service discovery and load balancing issues. But there are couple of issues here. Single server has a limitation. For high volume messaging, we need multiple NATS server to scale horizontally. Also, when NATS server goes down for some reason, our entire application might not be accessible as the service discovery will also fail. Lets see how we could address these issues using clustering!

Service Discovery:

In a distributed system, when there are multiple services, one service might want to talk to another service. For example, order-service might want to check with payment-service to fulfill an order. To do that order-service needs to know the DNS name/ IP address of the payment-service in the network to send the request! In the modern architecture with cloud auto scaling, we should not depend on IP addresses! There are various ways to solve this issue. NATS solves this by creating channels. That is, NATS would be acting like a bridge for the services to pass the messages among them. order-service publishes a message into a channel where payment-service listens to that and responds.

Even though the above design works perfectly fine, what will happen when NATS goes down for some reason? It is going to be a single point of failure! Our entire application will go down with that!

This is where server clustering comes into picture! Like NATS setup, the cluster setup is also very easy!

Clustering:

We can run multiple NATS instances together as a single cluster. These NATS instances use gossip protocol to pass the information about other servers connected to the cluster. So when we connect to one of the servers in the NATS cluster with a client library, we immediately come to know the entire cluster information. So when a server goes down, our client will automatically reconnect to another server in the cluster. Because of this behavior, we can horizontally scale out / scale in the NATS instances based on the demand. As long as 1 server is still up and running in the cluster, our application will still work fine.

Set up:

Seed Server:

To form a cluster one server needs to talk to another server / know the location of another server. One/more servers would be acting like a seed server.

For ex: I create 2 NATS instances. They would be like 2 independent servers running in the network. One does not know the information of another.

Seed servers are nothing special. It is not special configuration or anything. They would simply act like a starting point for the servers willing to join the cluster. For ex: I have Server 1. I bring in another Server 2 and configure 2 in such a way that 1 is the seed server. 1 and 2 form a cluster now.

I bring in as many server as I want by configuring any server in the cluster as a seed server. Seed server shares the information about the cluster to the new server.

Docker-compose:

Typically we would not run multiple instances of NATS in a single machine. For learning purposes, we would use docker containers to form the cluster as shown here.
We run 3 containers to simulate 3 node cluster.

version: "3"
services:
  nats1:
    image: nats:alpine
    ports:
      - 4222:4222
    command: "-cluster nats://0.0.0.0:4248"
  nats2:
    image: nats:alpine
    ports:
      - 5222:4222
    depends_on:
      - nats1  
    command: "-cluster nats://0.0.0.0:4248 -routes nats://nats1:4248"  
  nats3:
    image: nats:alpine
    ports:
      - 6222:4222
    depends_on:
      - nats1      
    command: "-cluster nats://0.0.0.0:4248 -routes nats://nats1:4248"

Connection Listener:

NATS provides an interface for us to implement to listen to the cluster server information during initial connection.

public class NatsConnectionListener implements ConnectionListener {

    @Override
    public void connectionEvent(Connection connection, Events events) {
        System.out.println(
                events.toString() + " : " + connection.getServers()
        );
    }

}

Subscriber:

Options build = new Options.Builder()
                .connectionListener(new NatsConnectionListener())
                .build();
Connection nats = Nats.connect(build);

// message dispatcher
Dispatcher dispatcher = nats.createDispatcher(msg -> {});

// subscribers with queue group
dispatcher.subscribe("vinsguru", "grp1", (msg) -> {
    System.out.println("Received 1 : " + new String(msg.getData(), StandardCharsets.UTF_8));
    nats.publish(msg.getReplyTo(), "Hello from subscriber 1 of grp1".getBytes());
});

dispatcher.subscribe("vinsguru", "grp1", (msg) -> {
    System.out.println("Received 2 : " + new String(msg.getData(), StandardCharsets.UTF_8));
    nats.publish(msg.getReplyTo(), "Hello from subscriber 2 of grp1".getBytes());
});

Output:

Out connection listener shows the entire servers info in the cluster.

nats: discovered servers : [nats://localhost:4222, 192.168.112.2:4222, 192.168.112.4:4222, 192.168.112.3:4222]
nats: connection opened : [nats://localhost:4222, 192.168.112.2:4222, 192.168.112.4:4222, 192.168.112.3:4222]

Publisher:

Lets connect to the one of the instances in the cluster, Lets publish a message every second. Here I am sending ‘Hi‘ and expect some response from the subscriber.

Connection nats = Nats.connect();

for (int i = 0; i < 1000; i++) {
    nats.request("vinsguru", "Hi".getBytes())
            .thenApply(Message::getData)
            .thenApply(String::new)
            .thenAccept(System.out::println);
    Thread.sleep(1000);
}

Output:

Hello from subscriber 2 of grp1
Hello from subscriber 1 of grp1
Hello from subscriber 1 of grp1
...
...
...

High Availability:

Everything seems to work fine so far. Now Lets bring one of the servers down. If you are using docker-compose, issue the below command.

// to bring nats1 down

docker-compose stop nats1

We see some console errors which are due to nats2 and nat3 instances. They are trying to contact nats1 which causes the error. But It does NOT mean cluster is down.

Subscriber shows this output via our ConnectionListener. Our client library knows that one of the servers in the cluster is missing.

nats: connection disconnected : [nats://localhost:4222, 192.168.112.2:4222, 192.168.112.4:4222, 192.168.112.3:4222]
nats: discovered servers : [nats://localhost:4222, 192.168.112.3:4222, 192.168.112.4:4222]
nats: connection reconnected : [nats://localhost:4222, 192.168.112.3:4222, 192.168.112.4:4222]
nats: subscriptions re-established : [nats://localhost:4222, 192.168.112.3:4222, 192.168.112.4:4222]

I continue to see the output from the publisher as shown here.

Hello from subscriber 2 of grp1
Hello from subscriber 1 of grp1
...
...
Hello from subscriber 1 of grp1
Hello from subscriber 2 of grp1

Lets bring one more server (nats2) down. NATS3 instance continues to operate.

Both publisher and subscriber are able to pass the message as usual as long as 1 server is up and running in the cluster.

Summary:

We were able to successfully setup NATS clustering with a simple docker-compose file and demonstrate how it behaves. Setting up NATS is very easy and It is also highly resilient, works just fine as long as 1 server is still up and running in the cluster. So there is no single point of failure.

Happy coding 🙂

Share This:

Exit mobile version