How we organize nodes (and the Netdata agents that are running on those nodes) across different rooms should reflect our architectural decision because the room is a logical container with its own user members and notification rules. So if we are monitoring large infrastructure we should be consistent with these rules and one way to achieve this is to choose automation. Netdata Cloud Terraform Provider lets you automate this by provisioning all the cloud resources and giving you the credentials to spin up the Netdata Agents. In this article, we will concentrate on how in practice we can organize and assign nodes across different rooms in two scenarios, in each of them I’m using non-production installation of the Netdata Agents:
-
Each Netdata Agent is connected to Netdata Cloud. This architecture is most likely to be provisioned across geographically separated single nodes. In this example provisioning Netdata Agents is being done through the Docker Compose and each agent has its room. The Terraform code looks like this:
terraform { required_providers { netdata = { source = "netdata/netdata" } } required_version = ">= 1.4.0" } provider "netdata" {} resource "netdata_space" "test" { name = "TestingSpace" description = "Created by Terraform" } resource "netdata_room" "room1" { space_id = netdata_space.test.id name = "TestingRoom1" description = "Created by Terraform" } resource "netdata_room" "room2" { space_id = netdata_space.test.id name = "TestingRoom2" description = "Created by Terraform" } resource "terraform_data" "install_agent" { provisioner "local-exec" { command = "docker-compose up -d" environment = { NETDATA_CLAIM_TOKEN = netdata_space.test.claim_token NETDATA_CLAIM_ROOMS1 = netdata_room.room1.id NETDATA_CLAIM_ROOMS2 = netdata_room.room2.id } } provisioner "local-exec" { when = destroy command = "docker-compose down" } }
and the
docker-compose.yaml
:services: netdata: image: netdata/netdata:stable container_name: netdata1 restart: unless-stopped hostname: "netdata1" cap_add: - SYS_PTRACE - SYS_ADMIN security_opt: - apparmor:unconfined volumes: - /etc/passwd:/host/etc/passwd:ro - /etc/group:/host/etc/group:ro - /etc/localtime:/etc/localtime:ro - /proc:/host/proc:ro - /sys:/host/sys:ro - /etc/os-release:/host/etc/os-release:ro - /var/log:/host/var/log:ro - /var/run/docker.sock:/var/run/docker.sock:ro environment: - NETDATA_CLAIM_TOKEN=${NETDATA_CLAIM_TOKEN} - NETDATA_CLAIM_URL=https://app.netdata.cloud - NETDATA_CLAIM_ROOMS=${NETDATA_CLAIM_ROOMS1} netdata-child: image: netdata/netdata:stable container_name: netdata2 restart: unless-stopped hostname: "netdata2" cap_add: - SYS_PTRACE - SYS_ADMIN security_opt: - apparmor:unconfined volumes: - /etc/passwd:/host/etc/passwd:ro - /etc/group:/host/etc/group:ro - /etc/localtime:/etc/localtime:ro - /proc:/host/proc:ro - /sys:/host/sys:ro - /etc/os-release:/host/etc/os-release:ro - /var/log:/host/var/log:ro - /var/run/docker.sock:/var/run/docker.sock:ro environment: - NETDATA_CLAIM_TOKEN=${NETDATA_CLAIM_TOKEN} - NETDATA_CLAIM_URL=https://app.netdata.cloud - NETDATA_CLAIM_ROOMS=${NETDATA_CLAIM_ROOMS2}
Each of the Netdata Agents gets its Room ID created right after creating a new space. The claim token is bound with the space.
-
In this scenario, we’re using streaming replication, when the Netdata Child Agents are streaming to the Netdata Parent Agent which is then connected to the cloud. It is a much more robust approach with all the benefits described here. Choosing this approach by default all Netdata Child Agents associated to a Netdata Parent Agent are connected to the same room, to make the distinguish you use the following automation:
terraform { required_providers { netdata = { source = "netdata/netdata" } } required_version = ">= 1.4.0" } provider "netdata" {} resource "netdata_space" "test" { name = "TestingSpace" description = "Created by Terraform" } resource "netdata_room" "room1" { space_id = netdata_space.test.id name = "TestingRoom1" description = "Created by Terraform" } resource "netdata_room" "room2" { space_id = netdata_space.test.id name = "TestingRoom2" description = "Created by Terraform" } resource "netdata_node_room_member" "room1" { room_id = netdata_room.room1.id space_id = netdata_space.test.id node_names = [ "netdata-parent" ] depends_on = [ terraform_data.install_agent ] } resource "netdata_node_room_member" "room2" { room_id = netdata_room.room2.id space_id = netdata_space.test.id node_names = [ "netdata-child" ] depends_on = [ terraform_data.install_agent ] } resource "terraform_data" "install_agent" { provisioner "local-exec" { command = "docker-compose up -d && sleep 5" environment = { NETDATA_CLAIM_TOKEN = netdata_space.test.claim_token } } provisioner "local-exec" { when = destroy command = "docker-compose down" } }
and the
docker-compose.yaml
:services: netdata: image: netdata/netdata:stable container_name: netdata-parent restart: unless-stopped hostname: "netdata-parent" cap_add: - SYS_PTRACE - SYS_ADMIN security_opt: - apparmor:unconfined volumes: - /etc/passwd:/host/etc/passwd:ro - /etc/group:/host/etc/group:ro - /etc/localtime:/etc/localtime:ro - /proc:/host/proc:ro - /sys:/host/sys:ro - /etc/os-release:/host/etc/os-release:ro - /var/log:/host/var/log:ro - /var/run/docker.sock:/var/run/docker.sock:ro - ./parent-stream.conf:/etc/netdata/stream.conf environment: - NETDATA_CLAIM_TOKEN=${NETDATA_CLAIM_TOKEN} - NETDATA_CLAIM_URL=https://app.netdata.cloud netdata-child: image: netdata/netdata:stable container_name: netdata-child restart: unless-stopped hostname: "netdata-child" cap_add: - SYS_PTRACE - SYS_ADMIN security_opt: - apparmor:unconfined volumes: - /etc/passwd:/host/etc/passwd:ro - /etc/group:/host/etc/group:ro - /etc/localtime:/etc/localtime:ro - /proc:/host/proc:ro - /sys:/host/sys:ro - /etc/os-release:/host/etc/os-release:ro - /var/log:/host/var/log:ro - /var/run/docker.sock:/var/run/docker.sock:ro - ./child-stream.conf:/etc/netdata/stream.conf
parent-stream.conf
:[11111111-2222-3333-4444-555555555555] enabled = yes
child-stream.conf
:[stream] enabled = yes destination = netdata-parent:19999 api key = 11111111-2222-3333-4444-555555555555
Here we should match the node room members by the node name, which in this case is a hostname. With the resource
netdata_node_room_member
you can change membership for the nodes already been provisioned to the cloud.
So as you just saw, with only a few lines of code, we can spin up the monitoring and go even further by automating user membership and notification integration.