Couchbase XDCR Replication – Step by Step – Best Practices

Fatih Gençali May 19, 2021 1 Comment

What is Couchbase

Couchbase Server is an open source, distributed, JSON document database. It exposes a scale-out, key-value store with managed cache for sub-millisecond data operations, purpose-built indexers for efficient queries and a powerful query engine for executing SQL-like queries. For mobile and Internet of Things environments Couchbase also runs natively on-device and manages synchronization to the server.

Why Couchbase?

Couchbase Server is specialized to provide low-latency data management for large-scale interactive web, mobile, and IoT applications. Common requirements that Couchbase Server was designed to satisfy include:

Unified Programming Interface
Query
Search
Mobile and IoT
Analytics
Core database engine
Scale-out architecture
Memory-first architecture
Big data and SQL integrations
Full-stack security
Container and Cloud deployments
High Availability

Many databases are able to satisfy one or more of these requirements but require tradeoffs when running in production with internet-scale, mission critical applications. For example, one solution might deliver data model flexibility but might lack the ability to add or remove nodes without an impact on up-time or performance. Another solution might demonstrate good write scalability without being able to index or and change the data model on the fly. Couchbase Server is designed to deliver a productive developer and administration experience while also providing performance at scale, whether in the cloud, in a container, on-premise or on an edge device.

Nosql Performance Benchmark

New Benchmark Comparing MongoDB, DataStax and Couchbase Server Demonstrates Couchbase as the Most Scalable, Best Performing NoSQL Database.^[3]

Node based benchmark .^[1]

According to CAP Theorem Couchbase .

Cap Theorem^[4]

Couchbase is on CP and AP diagram.

Couchbase CP and AP diagram detail.

What is XDCR?

Cross Data Center Replication (XDCR) replicates data between clusters: this provides protection against data-center failure, and also provides high-performance data-access for globally distributed, mission-critical applications.

XDCR replicates data from a specific bucket on the source cluster to a specific bucket on the target cluster. Data from the source bucket is pushed to the target bucket by means of an XDCR agent, running on the source cluster, using the Database Change Protocol. Any bucket (Couchbase or Ephemeral) on any cluster can be specified as a source or a target for one or more XDCR definitions.

A complete architectural description of XDCR is provided in Cross Data Center Replication (XDCR). You may wish to familiarize yourself with the information provided there, before performing the routines provided in this section.

Xdcr Basic Structure;

Pre-requirements ;

Confirm that your cluster is properly sized and is able to handle new XDCR streams. For example, XDCR needs 1-2 additional CPU cores per stream and in some cases it will require more RAM and network resources as well. If a cluster is not properly sized for the existing workload plus the new XDCR streams, XDCR can compete for server resources and have a negative impact on overall performances.
Couchbase Server uses TCP/IP port 8091to exchange cluster configuration information. If you are communicating with a destination cluster over a dedicated connection or the Internet, you should ensure that all the nodes in the destination and source clusters can communicate with each other over ports 8091 and 8092.

Ports Listed by Communication Path

XDCR (cluster-to-cluster)

Version 1 (CAPI)
- Unencrypted: 8091, 8092
Version 2 (XMEM)
- Unencrypted: 8091, 8092, 11210
- Encrypted: 11207, 18091, 18092

Couchbase stores data both on disk and in RAM. The default behavior is to write the document to disk at some arbitrary time (usually quickly) after storing in RAM. This leaves a short window where node failure can result in loss of data.

In any case, after writing to RAM, the document will eventually be written to disk. Couchbase keeps a disk write queue which you can check on the metrics report page in the management console. Now, CB does synchronize writes across the cluster, and I believe a write will be synchronized across a cluster before Couchbase will acknowledge that the write happened (e.g. before the write method returns to the caller).

If you have more documents than available RAM, only the most-frequently accessed documents will be stored in RAM for quick retrieval, with all others being “evicted” to disk.

Advice;

When bucket size reduced from 200 gb to 10 gb in source, replication got faster enough. In other words, if bucket size is high and altough all data is in ram, I have seen that the replication had 10 seconds gap.

Source and target have to same linux setting and same resources . This is just advice.

Prod bucket resident must be %100. Because replication speed is important.

Bucket replication best settings ;

XDCR Source Nozzles per Node: 2 --> 8

XDCR Target Nozzles per Node: 2 --> 24

(Nozzles=Channel=parallel , as  cpu core)

XDCR Checkpoint Interval (sn): 1800 --> 60

Control frequency is low, but not as much as waiting in the queue. The higher this value, the longer it takes for XDCR queues to grow.

XDCR Batch Count: 500 --> 2000

It is beneficial to increase by 2.3 times. It also sends so many data groups at the same time.

XDCR Batch Size (kB): 2048 --> 8192

It is beneficial to increase by 2.3 times. At the same time, it sends such a large amount of data.

XDCR Failure Retry Interval: 10 --> 10

It is used for retry attempts in network errors.

XDCR Optimistic Replication Threshold: 256 --> 1024 --> 256 --> 128 

Increasing or decreasing this value appropriately can speed up replication, collect data above 1 mb and send it in bulk. But collection can be a waste of time and waiting in the queue.

This is the compressed document size in bytes. 0 - 2097152 Bytes (20MB). Default is 256 Bytes. XDCR retrieves metadata for documents larger than this size at once before copying the uncompressed document to a destination set. This option improves XDCR latency.

XDCR Statistics Collection Interval (ms): 1000 --> 1000

XDCR Logging Level: info --> info

Advice;

I recommend that source and target be same setting and have same resources.

These are bucket setting , cluster setting , cpu , memory , disk quality etc.

Xdcr replication is just data replication . Before replication , you must create bucket metadata.

If you want , you create user , index ,view ,event etc.

As additional information;

You can make xdcr replication on community version.

You can make xdcr replication on enterprise version. This needs additional license. If you dont use standby as a prod, it is not high fee.

Couchbase’s other connectors for XDCR; Elasticsearch, Hadoop, Kafka, Spark, Talend, SQL (ODBC / JDBC)

Couchbase management can be done via WEB UI, REST API and CLI. In particular, the web user interface is very simple and straightforward to use. You can make many operational transactions and queries through the user interface.

Replication Summary;

Stby=Xdcr=Target=Remote same term.

A different name xdcr cluster is established with the same features.

The buckets with the same name with the same features are created in the xdcr cluster.

In Prod, add remote server and xdcr information are entered in the xdcr tab.

Prod in xdcr tab with add remote cluster;

Cluster Name= Xdcr couchbase name

IP/Hostname= Xdcr ip / hostname

Username=Xdcr Admin username

Password=Xdcr Admin user password




Prod in xdcr tab with add bucket replication;

Replicate From Bucket = Bucket name in the prod

Remote Cluster = Added Xdcr name

Remote Bucket = Bucket name added in Xdcr

Memory settings for Xdcr cluster settings are given according to the server memory value.

Should be free size for server memory.

Xdcr needs additional memory in prod cluster.

Multiple couchbase bucket replication is possible.