The only agent that thinks for itself

Autonomous Monitoring with self-learning AI built-in, operating independently across your entire stack.

Unlimited Metrics & Logs
Machine learning & MCP
5% CPU, 150MB RAM
3GB disk, >1 year retention
800+ integrations, zero config
Dashboards, alerts out of the box
> Discover Netdata Agents
Centralized metrics streaming and storage

Aggregate metrics from multiple agents into centralized Parent nodes for unified monitoring across your infrastructure.

Stream from unlimited agents
Long-term data retention
High availability clustering
Data replication & backup
Scalable architecture
Enterprise-grade security
> Learn about Parents
Fully managed cloud platform

Access your monitoring data from anywhere with our SaaS platform. No infrastructure to manage, automatic updates, and global availability.

Zero infrastructure management
99.9% uptime SLA
Global data centers
Automatic updates & patches
Enterprise SSO & RBAC
SOC2 & ISO certified
> Explore Netdata Cloud
Deploy Netdata Cloud in your infrastructure

Run the full Netdata Cloud platform on-premises for complete data sovereignty and compliance with your security policies.

Complete data sovereignty
Air-gapped deployment
Custom compliance controls
Private network integration
Dedicated support team
Kubernetes & Docker support
> Learn about Cloud On-Premises
Powerful, intuitive monitoring interface

Modern, responsive UI built for real-time troubleshooting with customizable dashboards and advanced visualization capabilities.

Real-time chart updates
Customizable dashboards
Dark & light themes
Advanced filtering & search
Responsive on all devices
Collaboration features
> Explore Netdata UI
Monitor on the go

Native iOS and Android apps bring full monitoring capabilities to your mobile device with real-time alerts and notifications.

iOS & Android apps
Push notifications
Touch-optimized interface
Offline data access
Biometric authentication
Widget support
> Download apps

Best energy efficiency

True real-time per-second

100% automated zero config

Centralized observability

Multi-year retention

High availability built-in

Zero maintenance

Always up-to-date

Enterprise security

Complete data control

Air-gap ready

Compliance certified

Millisecond responsiveness

Infinite zoom & pan

Works on any device

Native performance

Instant alerts

Monitor anywhere

80% Faster Incident Resolution
AI-powered troubleshooting from detection, to root cause and blast radius identification, to reporting.
True Real-Time and Simple, even at Scale
Linearly and infinitely scalable full-stack observability, that can be deployed even mid-crisis.
90% Cost Reduction, Full Fidelity
Instead of centralizing the data, Netdata distributes the code, eliminating pipelines and complexity.
Control Without Surrender
SOC 2 Type 2 certified with every metric kept on your infrastructure.
Integrations

800+ collectors and notification channels, auto-discovered and ready out of the box.

800+ data collectors
Auto-discovery & zero config
Cloud, infra, app protocols
Notifications out of the box
> Explore integrations
Real Results
46% Cost Reduction

Reduced monitoring costs by 46% while cutting staff overhead by 67%.

— Leonardo Antunez, Codyas

Zero Pipeline

No data shipping. No central storage costs. Query at the edge.

From Our Users
"Out-of-the-Box"

So many out-of-the-box features! I mostly don't have to develop anything.

— Simon Beginn, LANCOM Systems

No Query Language

Point-and-click troubleshooting. No PromQL, no LogQL, no learning curve.

Enterprise Ready
67% Less Staff, 46% Cost Cut

Enterprise efficiency without enterprise complexity—real ROI from day one.

— Leonardo Antunez, Codyas

SOC 2 Type 2 Certified

Zero data egress. Only metadata reaches the cloud. Your metrics stay on your infrastructure.

Full Coverage
800+ Collectors

Auto-discovered and configured. No manual setup required.

Any Notification Channel

Slack, PagerDuty, Teams, email, webhooks—all built-in.

Built for the People Who Get Paged
Because 3am alerts deserve instant answers, not hour-long hunts.
Every Industry Has Rules. We Master Them.
See how healthcare, finance, and government teams cut monitoring costs 90% while staying audit-ready.
Monitor Any Technology. Configure Nothing.
Install the agent. It already knows your stack.
From Our Users
"A Rare Unicorn"

Netdata gives more than you invest in it. A rare unicorn that obeys the Pareto rule.

— Eduard Porquet Mateu, TMB Barcelona

99% Downtime Reduction

Reduced website downtime by 99% and cloud bill by 30% using Netdata alerts.

— Falkland Islands Government

Real Savings
30% Cloud Cost Reduction

Optimized resource allocation based on Netdata alerts cut cloud spending by 30%.

— Falkland Islands Government

46% Cost Cut

Reduced monitoring staff by 67% while cutting operational costs by 46%.

— Codyas

Real Coverage
"Plugin for Everything"

Netdata has agent capacity or a plugin for everything, including Windows and Kubernetes.

— Eduard Porquet Mateu, TMB Barcelona

"Out-of-the-Box"

So many out-of-the-box features! I mostly don't have to develop anything.

— Simon Beginn, LANCOM Systems

Real Speed
Troubleshooting in 30 Seconds

From 2-3 minutes to 30 seconds—instant visibility into any node issue.

— Matthew Artist, Nodecraft

20% Downtime Reduction

20% less downtime and 40% budget optimization from out-of-the-box monitoring.

— Simon Beginn, LANCOM Systems

Pay per Node. Unlimited Everything Else.

One price per node. Unlimited metrics, logs, users, and retention. No per-GB surprises.

Free tier—forever
No metric limits or caps
Retention you control
Cancel anytime
> See pricing plans
What's Your Monitoring Really Costing You?

Most teams overpay by 40-60%. Let's find out why.

Expose hidden metric charges
Calculate tool consolidation
Customers report 30-67% savings
Results in under 60 seconds
> See what you're really paying
Your Infrastructure Is Unique. Let's Talk.

Because monitoring 10 nodes is different from monitoring 10,000.

On-prem & air-gapped deployment
Volume pricing & agreements
Architecture review for your scale
Compliance & security support
> Start a conversation
Monitoring That Sells Itself

Deploy in minutes. Impress clients in hours. Earn recurring revenue for years.

30-second live demos close deals
Zero config = zero support burden
Competitive margins & deal protection
Response in 48 hours
> Apply to partner
Per-Second Metrics at Homelab Prices

Same engine, same dashboards, same ML. Just priced for tinkerers.

Community: Free forever · 5 nodes · non-commercial
Homelab: $90/yr · unlimited nodes · fair usage
> Start monitoring your lab—free
$1,000 Per Referral. Unlimited Referrals.

Your colleagues get 10% off. You get 10% commission. Everyone wins.

10% of subscriptions, up to $1,000 each
Track earnings inside Netdata Cloud
PayPal/Venmo payouts in 3-4 weeks
No caps, no complexity
> Get your referral link
Cost Proof
40% Budget Optimization

"Netdata's significant positive impact" — LANCOM Systems

Calculate Your Savings

Compare vs Datadog, Grafana, Dynatrace

Savings Proof
46% Cost Reduction

"Cut costs by 46%, staff by 67%" — Codyas

30% Cloud Bill Savings

"Reduced cloud bill by 30%" — Falkland Islands Gov

Enterprise Proof
"Better Than Combined Alternatives"

"Better observability with Netdata than combining other tools." — TMB Barcelona

Real Engineers, <24h Response

DPA, SLAs, on-prem, volume pricing

Why Partners Win
Demo Live Infrastructure

One command, 30 seconds, real data—no sandbox needed

Zero Tickets, High Margins

Auto-config + per-node pricing = predictable profit

Homelab Ready
"Absolutely Incredible"

"We tested every monitoring system under the sun." — Benjamin Gabler, CEO Rocket.Net

76k+ GitHub Stars

3rd most starred monitoring project

Worth Recommending
Product That Delivers

Customers report 40-67% cost cuts, 99% downtime reduction

Zero Risk to Your Rep

Free tier lets them try before they buy

Never Fight Fires Alone

Docs, community, and expert help—pick your path to resolution.

Learn.netdata.cloud docs
Discord, Forums, GitHub
Premium support available
> Get answers now
60 Seconds to First Dashboard

One command to install. Zero config. 850+ integrations documented.

Linux, Windows, K8s, Docker
Auto-discovers your stack
> Read our documentation
See Netdata in Action

Watch real-time monitoring in action—demos, tutorials, and engineering deep dives.

Product demos and walkthroughs
Real infrastructure, not staged
> Start with the 3-minute tour
Level Up Your Monitoring
Real problems. Real solutions. 112+ guides from basic monitoring to AI observability.
76,000+ Engineers Strong
615+ contributors. 1.5M daily downloads. One mission: simplify observability.
Per-Second. 90% Cheaper. Data Stays Home.
Side-by-side comparisons: costs, real-time granularity, and data sovereignty for every major tool.

See why teams switch from Datadog, Prometheus, Grafana, and more.

> Browse all comparisons
Edge-Native Observability, Born Open Source
Per-second visibility, ML on every metric, and data that never leaves your infrastructure.
Founded in 2016
615+ contributors worldwide
Remote-first, engineering-driven
Open source first
> Read our story
Promises We Publish—and Prove
12 principles backed by open code, independent validation, and measurable outcomes.
Open source, peer-reviewed
Zero config, instant value
Data sovereignty by design
Aligned pricing, no surprises
> See all 12 principles
Edge-Native, AI-Ready, 100% Open
76k+ stars. Full ML, AI, and automation—GPLv3+, not premium add-ons.
76,000+ GitHub stars
GPLv3+ licensed forever
ML on every metric, included
Zero vendor lock-in
> Explore our open source
Build Real-Time Observability for the World
Remote-first team shipping per-second monitoring with ML on every metric.
Remote-first, fully distributed
Open source (76k+ stars)
Challenging technical problems
Your code on millions of systems
> See open roles
Talk to a Netdata Human in <24 Hours
Sales, partnerships, press, or professional services—real engineers, fast answers.
Discuss your observability needs
Pricing and volume discounts
Partnership opportunities
Media and press inquiries
> Book a conversation
Your Data. Your Rules.
On-prem data, cloud control plane, transparent terms.
Trust & Scale
76,000+ GitHub Stars

One of the most popular open-source monitoring projects

SOC 2 Type 2 Certified

Enterprise-grade security and compliance

Data Sovereignty

Your metrics stay on your infrastructure

Validated
University of Amsterdam

"Most energy-efficient monitoring solution" — ICSOC 2023, peer-reviewed

ADASTEC (Autonomous Driving)

"Doesn't miss alerts—mission-critical trust for safety software"

Community Stats
615+ Contributors

Global community improving monitoring for everyone

1.5M+ Downloads/Day

Trusted by teams worldwide

GPLv3+ Licensed

Free forever, fully open source agent

Why Join?
Remote-First

Work from anywhere, async-friendly culture

Impact at Scale

Your work helps millions of systems

Compliance
SOC 2 Type 2

Audited security controls

GDPR Ready

Data stays on your infrastructure

Blog

Monitoring Netdata Restarts: A Journey to a Reliable and High-Performance Solution

How we built a scalable, cost-efficient solution to capture every restart and diagnose critical issues in real time.
by Netdata Team · March 6, 2025

For a tool like Netdata, monitoring crashes and abnormal events extends far beyond bug fixing—it’s essential for identifying edge cases, preventing regressions, and delivering the most dependable observability experience possible. With millions of daily downloads, each event provides a vital signal for maintaining the integrity of our systems.

The Challenge with Traditional Solutions

Over the years, we’ve evaluated many monitoring tools, each with significant limitations:

ToolStrengthsLimitations
Sentry• Comprehensive error tracking features
• Detailed stack traces
• Per-event pricing model becomes prohibitive at scale
• Forces sampling which reduces visibility into critical issues
• Compromises complete error capture for cost control
GCP BigQuery & Similar• Powerful query capabilities
• Flexible data processing
• High scalability potential
• Complex reporting setup and maintenance
• Significant costs at high event volumes
• Requires specialized technical expertise
Other Solutions• Various specialized features
• Some open-source flexibility
• Either too inflexible for custom requirements
• Or prohibitively expensive at full-capture scale
• Often require compromising between detail and cost

We consistently encountered these core challenges:

  • Customization Difficulties: Off-the-shelf tools treat annotations as special cases, often missing the flexibility required for deep analysis.
  • High Costs: A per-event pricing model can become prohibitively expensive at scale.
  • Effortful Reporting: Generating nuanced, detailed reports required a lot of manual intervention and often didn’t capture the full complexity of our system’s behavior.

Identifying the Requirements

For Netdata, we required a solution offering:

  • Comprehensive Ingestion: A system capturing every event without sampling, preserving all critical details.
  • Complete Customization: A flexible data structure that supports multidimensional analysis across all fields and parameters.
  • Economic Scalability: A framework that expands without triggering prohibitive operational expenses.
  • Superior Performance: Capacity to handle tens of thousands of events per second while maintaining optimal processing speed.

Our Solution: Simple, Powerful, and Efficient

After reviewing traditional options, we discovered an innovative approach that leverages existing infrastructure. Our solution transforms event monitoring through systemd’s journal:

RequirementImplementationBenefit
Zero SamplingComplete event captureEvery single agent event is preserved, providing full visibility into system behavior
Flexible Data StructureSystemd journal field mappingSupports multidimensional analysis across all fields and parameters
Cost-Effective ScalingUtilization of existing infrastructureEliminates licensing costs while maintaining high performance
Exceptional PerformanceLightweight processing pipelineEfficiently handles up to 20,000 events per second per instance with horizontal scaling capability

Core Mechanism: Agent Status Tracking

The implementation required minimal development effort because we realized we already had all the necessary parts!

Each Netdata Agent records its operational status to disk, documenting whether it exited normally or crashed—and if crashed, capturing detailed diagnostic information. Upon restart, the Agent evaluates this status file to determine reporting requirements. When anonymous telemetry is enabled or a crash is detected, the Agent transmits this status to our agent-events backend. An intelligent deduplication system prevents redundant reporting, ensuring each unique event is logged only once per day per Agent.

Backend Processing Pipeline

The agent-events backend performs the crucial transformation:

First, a lightweight Go-based web server receives JSON payloads via HTTP POST and sends them to stdout. We selected this implementation for its exceptional performance characteristics after evaluating several options (including Python and Node.js alternatives).

Click to see the JSON payload of a status event...
{
  "message": "Netdata was last crashed while starting, because of a fatal error",
  "cause": "fatal on start",
  "@timestamp": "datetime",
  "version": 8,
  "version_saved": 8,
  "agent": {
    "id": "UUID",
    "ephemeral_id": "UUID",
    "version": "v2.2.6",
    "uptime": 20,
    "ND_node_id": null,
    "ND_claim_id": null,
    "ND_restarts": 55,
    "ND_profile": [
      "parent"
    ],
    "ND_status": "initializing",
    "ND_exit_reason": [
      "fatal"
    ],
    "ND_install_type": "custom"
  },
  "host": {
    "architecture": "x86_64",
    "virtualization": "none",
    "container": "none",
    "uptime": 1227119,
    "boot": {
      "id": "04077a68-8296-4abf-bd77-f20527bb5cde"
    },
    "memory": {
      "total": 134722347008,
      "free": 51673772032
    },
    "disk": {
      "db": {
        "total": 3936551493632,
        "free": 1999957258240,
        "inodes_total": 244170752,
        "inodes_free": 239307909,
        "read_only": false,
        "name": "Manjaro Linux",
        "version": "25.0.0",
        "family": "manjaro",
        "platform": "arch"
      },
      "fatal": {
        "line": 0,
        "filename": "",
        "function": "",
        "message": "Cannot create unique machine id file '/var/lib/netdata/registry/netdata.public.unique.id'",
        "errno": "13, Permission denied",
        "thread": "",
        "stack_trace": "stack_trace_formatter+0x196\nlog_field_strdupz+0x237\nnd_logger_log_fields.constprop.0+0xc4\nnd_logger.constprop.0+0x597\nnetdata_logger_fatal+0x18d\nregistry_get_this_machine_guid.part.0+0x283\nregistry_get_this_machine_guid.constprop.0+0x20\nnetdata_main+0x264c\nmain+0x2d __libc_init_first+0x8a\n__libc_start_main+0x85 _start+0x21"
      },
      "dedup": [
        {
          "@timestamp": "2025-03-01T12:23:43.61Z",
          "hash": 15037880939850199034
        }
      ]
    }
  }
}

Our processing pipeline uses this simple sequence:

web-server | log2journal json | systemd-cat-native

Both log2journal and systemd-cat-native are included with Netdata, making this solution immediately available to all users.

Data Transformation and Storage

The pipeline transforms complex nested JSON structures into flattened journal entries. Regardless of field count or nesting depth, log2journal expands every data point into a discrete journal field, preserving complete information while enabling powerful querying. To keep system logs clean, we isolate this process in a dedicated systemd unit with its own log namespace, providing operational separation and preserving analytical power.

Click to see the mapping of the JSON fields into systemd journald entries...
Original JSON KeyJournald Key GeneratedReason for Tracking in Netdata
messageMESSAGEA high level description of the reason the agent restarted.
causeAE_CAUSEA shorter code indicating the reason the agent restarted. One of fatal and exit, exit on system shutdown, exit to update, exit and updated, exit instructed, abnormal power off, out of memory, disk read-only, disk full, disk almost full, fatal on start, killed hard on start, fatal on exit, killed hard on shutdown, killed hard on update, killed hard on exit, killed fatal, killed hard.
@timestampAE__TIMESTAMPCaptures the exact time the status file was saved.
versionAE_VERSIONThe schema version of the posted status file.
version_savedAE_VERSION_SAVEDThe schema version of the saved status file.
agent.idAE_AGENT_IDThe MACHINE_GUID of the agent.
agent.ephemeral_idAE_AGENT_EPHEMERAL_IDThe unique INVOCATION_ID of the agent.
agent.versionAE_AGENT_VERSIONThe version of the agent.
agent.uptimeAE_AGENT_UPTIMEThe time the agent was running when the status file was generated.
agent.ND_node_idAE_AGENT_NODE_IDThe Netdata Cloud Node ID of the agent.
agent.ND_claim_idAE_AGENT_CLAIM_IDThe Netdata Cloud Claim ID of the agent.
agent.ND_restartsAE_AGENT_RESTARTSThe number of restarts the agent had so far. A strong indication for crash loops.
agent.ND_profileAE_AGENT_ND_PROFILE_{n}The configuration profile of the agent.
agent.ND_statusAE_STATUSThe operational status of the agent (initializing, running, exiting, exited).
agent.ND_exit_reasonAE_AGENT_ND_EXIT_REASON_{n}Records the reason for the agent’s exit, one or more of the following: signal-bus-error, signal-segmentation-fault, signal-floating-point-exception, signal-illegal-instruction, out-of-memory, already-running, fatal, api-quit, cmd-exit, signal-quit, signal-terminate, signal-interrupt, service-stop, system-shutdown, update
agent.ND_install_typeAE_AGENT_INSTALL_TYPEIndicates the installation method (e.g., custom, binpkg); different installs may behave differently under error conditions.
host.architectureAE_HOST_ARCHITECTUREProvides the system architecture, which is useful for reproducing environment-specific issues.
host.virtualizationAE_HOST_VIRTUALIZATIONIndicates if the system is running under virtualization; such environments can have unique resource or timing constraints affecting stability.
host.containerAE_HOST_CONTAINERSpecifies if the host is containerized, which is critical when diagnosing errors in containerized deployments.
host.uptimeAE_HOST_UPTIMEReflects the host’s overall uptime, allowing correlation between system stability and agent crashes.
host.boot.idAE_HOST_BOOT_IDA unique identifier for the current boot session—helpful for correlating events across system reboots.
host.memory.totalAE_HOST_MEMORY_TOTALShows the total available memory; important for diagnosing if resource exhaustion contributed to the fatal error.
host.memory.freeAE_HOST_MEMORY_FREEIndicates available memory at crash time, which can highlight potential memory pressure issues.
host.disk.db.totalAE_HOST_DISK_DB_TOTALReflects the total disk space allocated for database/log storage; issues here might affect logging during fatal errors.
host.disk.db.freeAE_HOST_DISK_DB_FREEShows available disk space; low disk space may hinder proper logging and recovery following a fatal error.
host.disk.db.inodes_totalAE_HOST_DISK_DB_INODES_TOTALProvides the total inodes available, useful for diagnosing filesystem constraints that could contribute to system errors.
host.disk.db.inodes_freeAE_HOST_DISK_DB_INODES_FREEIndicates the number of free inodes; running out of inodes can cause filesystem errors that affect netdata’s operation.
host.disk.db.read_onlyAE_HOST_DISK_DB_READ_ONLYFlags if the disk is mounted read-only—this may prevent netdata from writing necessary logs or recovering from errors.
os.typeAE_OS_TYPEDefines the operating system type; critical for understanding the context in which the error occurred.
os.kernelAE_OS_KERNELProvides the kernel version, which can be cross-referenced with known issues in specific kernel releases that might lead to fatal errors.
os.nameAE_OS_NAMEIndicates the operating system distribution, which helps in narrowing down environment-specific issues.
os.versionAE_OS_VERSIONShows the OS version, essential for linking the fatal error to recent system updates or known bugs.
os.familyAE_OS_FAMILYGroups the OS into a family (e.g., linux), aiding in high-level analysis across similar systems.
os.platformAE_OS_PLATFORMProvides platform details (often rewritten to include the OS family), key for identifying platform-specific compatibility issues that might cause crashes.
fatal.lineCODE_LINEPinpoints the exact line number in the source code where the fatal error occurred—vital for debugging and tracking down the faulty code.
fatal.filenameCODE_FILESpecifies the source file in which the fatal error was triggered, which speeds up code review and pinpointing the problem area.
fatal.functionCODE_FUNCIdentifies the function where the fatal error occurred, providing context about the code path and facilitating targeted debugging.
fatal.messageAE_FATAL_MESSAGEContains the detailed error message; essential for understanding what went wrong and the context surrounding the fatal event (and later merged into the main message field).
fatal.errnoAE_FATAL_ERRNOProvides an error number associated with the failure, which can help in mapping the error to known issues or system error codes.
fatal.threadAE_FATAL_THREADIndicates the thread in which the error occurred, important in multi-threaded scenarios to help isolate concurrency issues.
fatal.stack_traceAE_FATAL_STACK_TRACEContains the stack trace at the point of failure—a critical asset for root cause analysis and understanding the chain of function calls leading to the error.
dedup[].@timestampAE_DEDUP_{n}__TIMESTAMPRecords the timestamp for each deduplication entry; used to filter out duplicate crash events and correlate error clusters over time.
dedup[].hashAE_DEDUP_{n}_HASHProvides a unique hash for each deduplication entry, which helps in recognizing recurring error signatures and preventing redundant alerts.

Implementation and Results

The entire implementation process took only a few days, though we encountered several technical obstacles:

  • Robust Stack Trace Capture: Obtaining reliable stack traces across various crash scenarios required multiple iterations of our signal handling mechanisms to ensure complete diagnostic information.
  • Schema Optimization: Identifying the precise fields necessary for comprehensive root cause analysis involved careful refinement of the JSON schema sent to the agent-events backend.

Once these challenges were resolved, this approach produced remarkable results. The Netdata Logs UI transformed into a powerful analytical platform. It became an exceptional analytical environment, allowing us to filter, correlate, and investigate any combination of events through powerful journald queries. We gained unprecedented visibility into our codebase’s behavior across diverse environments and configurations.

image

By implementing this system, we identified and resolved dozens of critical issues, dramatically improving Netdata’s reliability. The solution proved highly scalable—each backend instance processes approximately 20,000 events per second, with the ability to deploy additional instances as needed. From a resource perspective, the system requires only modest computing resources and storage capacity to maintain appropriate retention periods.

Conclusion

Our monitoring journey taught us that effective systems must balance customization, cost, and performance.

By combining a lightweight HTTP server with log2journal and systemd’s journald, we created a solution that captures every event without sampling or compromise.

This straightforward approach provides powerful insights while eliminating per-event costs, proving that native tools and thoughtful engineering can outperform expensive specialized solutions.

The system has already helped us identify and resolve dozens of critical issues, significantly improving Netdata’s reliability for our users worldwide.

As we continue to grow, this foundation will evolve with us, providing the visibility we need to maintain the highest standards of software quality.

FAQ

Q1: Do you collect IPs?

No, we don’t need IPs, and we don’t collect any.

Q2: Why do you collect normal restarts?

We only collect normal restarts when anonymous telemetry is enabled (enabled by default, users can opt-out). And we only collect at most one normal restart per day per agent (actually, we collect only one of all kinds of restarts/crashes per agent per day).

We collect normal restarts to understand the spread of an issue. We provide packages for a lot of Linux distributions, Linux kernel versions, CPU architectures, library versions, etc. For many combinations there are just a handful of users.

When we have just a few events reported, it is important to know if all of them are crashing, a few are crashing, or just a tiny percentage is crashing. It completely changes the context of the investigation needed.

So, the normal restarts act like a baseline for us to understand the severity of issues.

Q3: Why do you need the unique UUID of the agent?

UUIDs allow us to correlate events over time. How many events do we have for this agent? Is it always crashing with the same event? Was it working a few days ago? Did we fix the issue for this particular agent after troubleshooting?

Using these UUIDs, we can find regressions and confirm our patches actually fix the problems for our users.

Keep in mind that we don’t have the technical ability to correlate UUIDs to anonymous open-source users. So, these UUIDs are just random numbers for us.

Q4: Do you collect crash reports when anonymous telemetry is disabled?

No. We only collect crash reports when anonymous telemetry is enabled.

Q5: Does telemetry affect my system’s performance?

No. Telemetry is kicking in instantly when the agent starts, stops or crashes. Other than that, the agent is not affected.

Q6: Is any of my telemetry data shared with third parties or used for purposes beyond crash analysis?

All telemetry data is used exclusively for internal analysis to diagnose issues, track regressions, and improve Netdata’s reliability. We don’t share raw data or any personally identifiable information with third parties.

In some cases, these reports can be used to steer our focus. A great example is the breakdown per operating system that allows us to focus on providing binary packages for the distributions that are used more than others.

Q7: How long do you store the telemetry data?

We keep these data for 3 months, to identify how the different Netdata releases affect the stability and reliability of the agent over time.

Q8: Can I opt out of telemetry entirely, and what are the implications of doing so?

Of course. If you choose to disable it, you won’t contribute data that helps identify and resolve Netdata issues. Opting out might also mean you miss out on receiving some benefits derived from our improved diagnostics and performance enhancements.