cluster-tests/tests/cleanup/test_cluster_cleanup.py · TS-23.10-RC.1 · truenas-rk3588 / truenas

Improve CTDB private IPs handling (#11643) · 6bd31c65
Andrew Walker authored Jul 13, 2023
This PR makes several critical changes to how the ctdb nodes
file (private IPs) are managed in middleware. Some types of
issues that are addressed by this PR are:

* Inconsistencies in nodes configuration file on different
  cluster nodes
* Duplicate nodes entries being added via ctdb.shared.volume.create
  (e.g. same node twice with different IPs)
* Lack of clear mapping between ctdb nodes and gluster peers

Originally there was easy mapping of entries in the ctdb nodes
file back to the originating gluster peer. This PR changes the
nodes files entry from form of
```
<ipaddress>
<ipaddress>
<ipaddress>
```

to
```
<ipaddress>#{<peer information>}
<ipaddress>#{<peer information>}
```

Above required a change to nodes file parsing function in ctdbd,
which means this PR may not be backported into stable/bluefin.

Currently only peer UUID is stored in the file, but in future
we can potentially expand to include additional information.
Store gluster node UUID in ctdb nodes file. This allows us to
more tightly couple gluster TSP configuration with our CTDB nodes
configuration (e.g. include peer UUID in ctdb.private.ips.query
return).

Since this change requires that the backend maintain the mapping
between nodes and peers, the ctdb private ips API was changed to
require caller to provide a gluster peer UUID for the nodes entry.

The ctdb.shared.volume.create method originally would automatically
append nodes file entries for missing gluster peers based on DNS
lookup results, but this has over time proven to be somewhat less
than reliable (users may have DNS misconfigurations that result
in multiple nodes files entries or incorrect interfaces being used).

This PR allows caller of ctdb.shared.volume.create to optionally
include a list of private address + gluster peer UUID mappings
to be added (if necessary) to the nodes file. This method will now
explicitly fail in the following situations:

* gluster peers that are not present in resulting file
* peer UUIDs in payload that do not exist
* entries in nodes file that do not map back to gluster UUID

This has somewhat wide-ranging impact for how our backend APIs are
used. For instance, if a new gluster peer is added to an existing
TSP, then a private IP for the nodes file must also be supplied.

A new cluster management service is also being added via this
pull request to enforce consistency in how SCALE clusters are
configured (and ensure that proper validation takes place to prevent
issue reports on drastically misconfigured clutsers). It also
provides a stub of an API for adding new cluster nodes (expansion).

The end-goal of this API is to force the caller to provide us with
a payload containing gluster peer, brick, and private address (nodes
file) information for each proposed cluster node.
6bd31c65
test_cluster_cleanup.py 5.01 KB
Replace test_cluster_cleanup.py