Changing amount of nodes

Choose which "node" is the "node" you meant

First thing to do is to determine which 'nodes' you need to add or remove. There is an ambiguity in this term, there are two different meanings:

  • TokenD core nodes
  • Kubernetes (k8s) worker nodes

If you know what you need, skip directly to the guide about increasing/decreasing your type of nodes. If you are not sure, here is some explanations.

TokenD core nodes are just docker containers running inside the pods. You are not paying directly for this nodes. They are the backbone of the whole TokenD system. The more of them are running, the more resources you need, the more failing nodes network will handle. However if there are too many nodes, it will take too much time to reach consensus, your users will experience lags and even timeouts. In private network it is OK to have 4 core nodes running to have the best balance of stability, availability, performance and price. For development purposes it is OK to have network with 1 core node running.

k8s worker nodes are the machines your containers are running on. These are nodes you are paying for to Amazon (or any other cloud provider). If there are not enough computing power or low memory, you may experience strange behaviour of the system. Some pods may become unhealthy, which is certainly bad. However if there are too many worker nodes, you will underutilize their capabilities.

So long:

  • If you need to increase amount of unhealthy nodes your network can tolerate, you could increase TokenD core nodes;
  • If you need to make consensus faster or just looking for decreasing loads, you could decrease TokenD core nodes;
  • If you need to reduce costs, you should decrease k8s worker nodes;
  • If you feel, your network lacks resources, you should increase k8s worker nodes.

Increasing the amount of TokenD core nodes

If you want to tolerate n faulty nodes, your network must have at least 3n+1 nodes in total. In private network, running in secure and stable environment it is highly unlikely that any node will fail and almost impossible that more than 1 node will fail. That is why we recommend to run 4 nodes in production (3 * 1 + 1).

Increasing the amount of core nodes is neither trivial, nor automated task. You will have to increase nodes in small runs, adding one node after another.

Open <your_project>.env.yaml file, find there core section:

  core:
    # ...
    # Add here one core:
    nodes:
      # Generate new name for core and name history bucket accordingly
    - historybucket: tomtit-core-fervent-germain
      name: core-fervent-germain
      # You have to generate this one
      seed: SAVDMOCIHA3KHIQT3XW7ET84GKI4HUT7ASGZOLMXWE4TPW33MECWCZ6I
      # Uncomment following line, if you have less than 4 nodes
      # unsafe: true
    - historybucket: tomtit-core-lucid-mcclintock
      name: core-lucid-mcclintock
      seed: SAJ66EAOG7MMQQNW3262MU5NLLJ22MHG5M2CMVGGD4CVRDHT32ZGCKIL
      # Uncomment following line, if you have less than 4 nodes
      # unsafe: true
    # Increment this number
    nodescount: 2
    # ...

Find horizon section:

  horizon:
    # ...
    nodes:
    # Add new horizon to be watching your new core
    - core: core-fervent-germain
      name: horizon-xenodochial-goldstine
      persistentvolumesize: ""
    - core: core-lucid-mcclintock
      name: horizon-friendly-hawking
      persistentvolumesize: ""
    # ...

When <your_project>.env.yaml file is ready do the following:

  1. Regenerate <your_project>.k8s.yaml file using tokend-cli
  2. Run kubectl apply -f <your_project>.k8s.yaml
  3. Restart those cores, which already were in your system, they need to get config updates. To do this, please run kubectl get pods -n <your_namespace>, kubectl delete pods <cores_which_already_were_running> -n <your_namespace>, wait until they are up and running.
  4. Check the state of newly deployed core. To do this, please login into core's container (kubectl exec -it <core-pod-name> /bin/bash -n <your_namespace>) and get core node's info (curl localhost:8080/info). Find field "state" in response and wait untill it will be "Synced!". This takes approximately 1h-10h, depending on the size of the history. For a very long history it may take much longer.
  5. When your node finishes catching up (gets into "Synced!" state), you are ready to increase the amount of nodes for one more, repeat the whole section as many times, as you need.

If you feel really in need to increase the amount of nodes, please contact us, we'll find a solution for you.

Decreasing the amount of TokenD core nodes

Notice: Remember that it is quite difficult to increase the amount back.

Open <your_project>.env.yaml file, find there core section:

  core:
    # ...
    # Remove (or comment) here as many cores, as you wish, but remember the names of the removed ones
    nodes:
    # - historybucket: tomtit-core-fervent-germain
    #   name: core-fervent-germain
    #   seed: SAVDMOCIHA3KHIQT3XW7ET35GKI4HUT7ASGZOLMXWE4TPW33MECWCZ6I
    - historybucket: tomtit-core-lucid-mcclintock
      name: core-lucid-mcclintock
      seed: SAJ66EAOG7MMQQNW3262MU5NLLJ22MHG5M2CMVGGD4CVRDHT32ZGCKIL
      # Uncomment following line, if you have less than 4 nodes
      unsafe: true
    # Decrease this number to the desired value
    nodescount: 2
    # ...

Find horizon section:

  horizon:
    # ...
    nodes:
    # Remove here those, who has the names of the deleted cores in "core" field
    # - core: core-fervent-germain
    #   name: horizon-xenodochial-goldstine
    #   persistentvolumesize: ""
    - core: core-lucid-mcclintock
      name: horizon-friendly-hawking
      persistentvolumesize: ""
    # ...

When <your_project>.env.yaml file is ready do the following:

  1. Regenerate <your_project>.k8s.yaml file using tokend-cli
  2. Run kubectl apply -f <your_project>.k8s.yaml
  3. Restart those cores, which must stay, they need to get config updates. To do this, please run kubectl get pods -n <your_namespace>, kubectl delete pods <cores_which_must_stay> -n <your_namespace>, wait until they are up and running.
  4. Run kubectl get deployments -n <your_namespace>
  5. Remove deployments named after cores and horizons you've deleted from env.yaml file
  6. You are done shrinking your network.

Note: You also may want to delete s3 bucket containing the history of removed cores.

Increasing the amount of k8s worker nodes

There are three simple steps:

  1. Open instance group config: kops edit ig nodes --state=s3://<state-bucket>
  2. Increase minSize and maxSize
  3. Apply changes: kops update cluster --yes --state=s3://<state-bucket>

You are done!

Decreasing the amount of k8s worker nodes

Algorithm of removing the node from the cluster is common, but please read the whole section befor applying, there is something special about cluster running TokenD network.

Base algorithm:

  1. drain the node you want to remove: kubectl drain <node-name>. You might have to ignore daemonsets and local-data in the machine kubectl drain <node-name> --ignore-daemonsets --delete-local-data
  2. Edit instance group, to decrease max amount of nodes: kops edit ig nodes --state=s3://<state-bucket>, decrement maxSize and minSize
  3. Update cluster: kops update cluster --yes --state=s3://<state-bucket>

For more information about base algorithm please see: https://stackoverflow.com/a/54220808

TokenD specific

When you drain k8s worker node, you forbid k8s master to schedule pods there. Master begins to migrate all running pods from that node to other workers. This means that all migrating pods will be stopped and started on the other worker node.

The ones, who dislike restarting the most are pods running TokenD core containers. You can look into worker node before draining it (kubectl describe node <node-name>) and find the one with the least core pods running on it. If any core was migrated, just give it time (up to a half an hour) to get healthy again. Standard deployment, containing 4 TokenD cores, can tolerate 1 unhealthy core, so if you migrate 1 core a time, network will work.

results matching ""

    No results matching ""