How to launch a large deployment
The Charmed OpenSearch operator can be deployed at scale to support large deployments. This guide explains how to launch a large deployment of OpenSearch using Juju.
Summary
OpenSearch node roles
When deploying OpenSearch at scale, it is important to understand the roles
that nodes can assume on a cluster.
Amongst the multiple roles supported by OpenSearch, two notable roles are especially crucial for a successful cluster formation:
cluster_manager
: assigned to nodes responsible for handling cluster-wide operations such as creating and deleting indices, managing shards, and rebalancing data across the cluster. Every cluster has a singlecluster_manager
node elected as the master node among thecluster_manager
eligible nodes.data
: assigned to nodes which store and perform data-related operations like indexing and searching. Data nodes hold the shards that contain the indexed data. Data nodes can also be configured to perform ingest and transform operations. In charmed OpenSearch, data nodes can optionally be further classified into tiers - to allow for defining index lifecycle management policies:data.hot
data.warm
data.cold
There are also other roles that nodes can take on in an OpenSearch cluster, such as ingest
nodes, and coordinating
nodes etc.
Roles in charmed OpenSearch are applied on the application level, in other words, all nodes get assigned the same set of roles defined for an application.
Set roles
Roles can either be set by the user or automatically generated by the charm.
Auto-generated roles
When no roles are set on the roles
config option of the opensearch application, the charm automatically assigns the following roles to all nodes.
["data", "ingest", "ml", "cluster_manager"]
User set roles
There are currently two ways for users to set roles in an application: at deploy time, or via a config change. Note that a role change will effectively trigger a rolling restart of the OpenSearch application.
To set roles at deploy time, run
juju deploy opensearch -n 3 --config roles="cluster_manager,data,ml"
To set roles later on through a config change, run
juju config opensearch roles="cluster_manager,data,ml"
Note: We currently do not allow the removal of either
cluster_manager
ordata
roles.
Deploy a large OpenSearch cluster
The OpenSearch charm manages large deployments and diversity in the topology of its nodes through juju integrations
.
The cluster will consist of multiple integrated juju applications (clusters) with each application configured to have a mix of cluster_manager
and data
roles defined for its nodes.
Deploy the clusters
Caution: Charmed OpenSearch supports performance profiles and will have different RAM consumption according to the profile chosen:
production
: consumes 50% of the RAM available, up to 32Gstaging
: consumes 25% of the RAM available, up to 32Gtesting
: consumes 1G of RAM
The configuration defaults to production
, but for the examples below, testing will be chosen as it is assumed the deployment happens on a single LXD cluster.
-
First, deploy the orchestrator app.
juju deploy -n 3 \ opensearch main \ --config cluster_name="app" \ --channel 2/edge \ --config profile="testing"
As a reminder, since we did not set any role to this application, the operator will assign each node the
cluster_manager,coordinating_only,data,ingest,ml
roles. -
(Optional, but recommended) Next, deploy a failover application with
cluster_manager
nodes to ensure high availability and fault tolerance. The failover app will take over the orchestration of the fleet in the events where themain
app fails or gets removed. Thus, it is important that this application has thecluster_manager
role as part of its roles to ensure the continuity of the existence of the cluster.juju deploy -n 3 \ opensearch failover \ --config cluster_name="app" \ --config init_hold="true" \ --config roles="cluster_manager" \ --channel 2/edge \ --config profile="testing"
The failover nodes are not required for a basic deployment of OpenSearch. They are however highly recommended for production deployments to ensure high availability and fault tolerance.
Note 1: It is imperative that the
cluster_name
config values match between applications in large deployments. A cluster_name mismatch will effectively prevent 2 applications from forming a cluster.Note 2: It is imperative that only the main orchestrator app sets the
init_hold
config option tofalse
(by default) - the non-main orchestrator apps should set the value totrue
to prevent the application from starting before being integrated with the main. -
After deploying the nodes of the
main
app and additionalcluster_manager
nodes on thefailover
, we will deploy a new app withdata.hot
node roles.juju deploy -n 3 \ opensearch data-hot \ --config cluster_name="app" \ --config roles="data.hot" \ --config init_hold="true" \ --channel 2/edge \ --config profile="testing"
-
We also need to deploy a TLS operator to enable TLS encryption for the cluster. We will deploy the
self-signed-certificates
charm to provide self-signed certificates for the cluster.juju deploy self-signed-certificates
-
We can now track the progress of the deployment by running:
juju status --watch 1s
Once the deployment is complete, you should see the following output:
Model Controller Cloud/Region Version SLA Timestamp dev development localhost/localhost 3.5.3 unsupported 06:01:06Z App Version Status Scale Charm Channel Rev Exposed Message data-hot blocked 3 opensearch 2/edge 159 no Cannot start. Waiting for peer cluster relation... failover blocked 3 opensearch 2/edge 159 no Cannot start. Waiting for peer cluster relation... main blocked 3 opensearch 2/edge 159 no Missing TLS relation with this cluster. self-signed-certificates active 1 self-signed-certificates latest/stable 155 no Unit Workload Agent Machine Public address Ports Message data-hot/0 active idle 6 10.214.176.165 data-hot/1* active idle 7 10.214.176.7 data-hot/2 active idle 8 10.214.176.161 failover/0* active idle 3 10.214.176.194 failover/1 active idle 4 10.214.176.152 failover/2 active idle 5 10.214.176.221 main/0 blocked idle 0 10.214.176.231 Missing TLS relation with this cluster. main/1 blocked idle 1 10.214.176.57 Missing TLS relation with this cluster. main/2* blocked idle 2 10.214.176.140 Missing TLS relation with this cluster. self-signed-certificates/0* active idle 9 10.214.176.201 Machine State Address Inst id Base AZ Message 0 started 10.214.176.231 juju-d6b263-0 [email protected] Running 1 started 10.214.176.57 juju-d6b263-1 [email protected] Running 2 started 10.214.176.140 juju-d6b263-2 [email protected] Running 3 started 10.214.176.194 juju-d6b263-3 [email protected] Running 4 started 10.214.176.152 juju-d6b263-4 [email protected] Running 5 started 10.214.176.221 juju-d6b263-5 [email protected] Running 6 started 10.214.176.165 juju-d6b263-6 [email protected] Running 7 started 10.214.176.7 juju-d6b263-7 [email protected] Running 8 started 10.214.176.161 juju-d6b263-8 [email protected] Running 9 started 10.214.176.201 juju-d6b263-9 [email protected] Running
Add the required relations
Configure TLS encryption
The Charmed OpenSearch operator does not function without TLS enabled. To enable TLS, integrate the self-signed-certificates
with all opensearch applications.
juju integrate self-signed-certificates main
juju integrate self-signed-certificates failover
juju integrate self-signed-certificates data-hot
Once the integrations are established, the self-signed-certificates
charm will provide the required certificates for the OpenSearch clusters.
Once TLS is fully configured in the main
app, the latter will start immediately. As opposed to the other apps which are still waiting for the admin
certificates to be shared with them by the main
orchestrator.
When the main
app is ready, juju status
will show something similar to the sample output below:
Model Controller Cloud/Region Version SLA Timestamp
dev development localhost/localhost 3.5.3 unsupported 06:03:49Z
App Version Status Scale Charm Channel Rev Exposed Message
data-hot blocked 3 opensearch 2/edge 159 no Cannot start. Waiting for peer cluster relation...
failover blocked 3 opensearch 2/edge 159 no Cannot start. Waiting for peer cluster relation...
main active 3 opensearch 2/edge 159 no
self-signed-certificates active 1 self-signed-certificates latest/stable 155 no
Unit Workload Agent Machine Public address Ports Message
data-hot/0 active idle 6 10.214.176.165
data-hot/1* active idle 7 10.214.176.7
data-hot/2 active idle 8 10.214.176.161
failover/0* active idle 3 10.214.176.194
failover/1 active idle 4 10.214.176.152
failover/2 active idle 5 10.214.176.221
main/0 active idle 0 10.214.176.231 9200/tcp
main/1 active idle 1 10.214.176.57 9200/tcp
main/2* active idle 2 10.214.176.140 9200/tcp
self-signed-certificates/0* active idle 9 10.214.176.201
Machine State Address Inst id Base AZ Message
0 started 10.214.176.231 juju-d6b263-0 [email protected] Running
1 started 10.214.176.57 juju-d6b263-1 [email protected] Running
2 started 10.214.176.140 juju-d6b263-2 [email protected] Running
3 started 10.214.176.194 juju-d6b263-3 [email protected] Running
4 started 10.214.176.152 juju-d6b263-4 [email protected] Running
5 started 10.214.176.221 juju-d6b263-5 [email protected] Running
6 started 10.214.176.165 juju-d6b263-6 [email protected] Running
7 started 10.214.176.7 juju-d6b263-7 [email protected] Running
8 started 10.214.176.161 juju-d6b263-8 [email protected] Running
9 started 10.214.176.201 juju-d6b263-9 [email protected] Running
Form the OpenSearch cluster
Now, in order to form the large OpenSearch cluster (constituted of all the 3 previous opensearch apps), integrate the main
charm to the failover
and data-hot
juju apps.
juju integrate main:peer-cluster-orchestrator failover:peer-cluster
juju integrate main:peer-cluster-orchestrator data-hot:peer-cluster
juju integrate failover:peer-cluster-orchestrator data-hot:peer-cluster
Once the relations are added, the main
application will orchestrate the formation of the OpenSearch cluster. This will start the rest of the nodes in the cluster.
You can track the progress of the cluster formation by running:
juju status --watch 1s
Once the cluster is formed and all nodes are up and ready, juju status
will show something similar to the sample output below:
Model Controller Cloud/Region Version SLA Timestamp
dev development localhost/localhost 3.5.3 unsupported 06:11:18Z
App Version Status Scale Charm Channel Rev Exposed Message
data-hot active 3 opensearch 2/edge 159 no
failover active 3 opensearch 2/edge 159 no
main active 3 opensearch 2/edge 159 no
self-signed-certificates active 1 self-signed-certificates latest/stable 155 no
Unit Workload Agent Machine Public address Ports Message
data-hot/0 active idle 6 10.214.176.165 9200/tcp
data-hot/1* active idle 7 10.214.176.7 9200/tcp
data-hot/2 active idle 8 10.214.176.161 9200/tcp
failover/0* active idle 3 10.214.176.194 9200/tcp
failover/1 active idle 4 10.214.176.152 9200/tcp
failover/2 active idle 5 10.214.176.221 9200/tcp
main/0 active idle 0 10.214.176.231 9200/tcp
main/1 active idle 1 10.214.176.57 9200/tcp
main/2* active idle 2 10.214.176.140 9200/tcp
self-signed-certificates/0* active idle 9 10.214.176.201
Machine State Address Inst id Base AZ Message
0 started 10.214.176.231 juju-d6b263-0 [email protected] Running
1 started 10.214.176.57 juju-d6b263-1 [email protected] Running
2 started 10.214.176.140 juju-d6b263-2 [email protected] Running
3 started 10.214.176.194 juju-d6b263-3 [email protected] Running
4 started 10.214.176.152 juju-d6b263-4 [email protected] Running
5 started 10.214.176.221 juju-d6b263-5 [email protected] Running
6 started 10.214.176.165 juju-d6b263-6 [email protected] Running
7 started 10.214.176.7 juju-d6b263-7 [email protected] Running
8 started 10.214.176.161 juju-d6b263-8 [email protected] Running
9 started 10.214.176.201 juju-d6b263-9 [email protected] Running
Caution: The cluster will not come online if no data
nodes are available. Ensure the data
nodes are deployed and ready before forming the cluster.
Reminder1: In order to form a large deployment out of multiple juju apps, all applications must have the same cluster_name
config option value or not set it at all, in which case it will be auto-generated in the main orchestrator and inherited by the other members.
Reminder2: init_hold
must be set to true
for any subsequent (non main orchestrator) application. Otherwise the application may start and never be able to join the rest of the deployment fleet.