Manage Diagnostics Without Restarting

Eduard Mishkurov

3 days ago

Collapse Macros in logme: How to Reduce Duplicate Log Errors

Runtime logging control is especially important for production services.

Detailed debug logs are most useful when something has already gone wrong. However, keeping them enabled all the time is usually not acceptable.

Verbose logs create noise. They increase file size. They can slow down hot code paths. In some cases, they may also expose details that should not be written to logs during normal operation.

The simplest approach is to change the configuration and restart the service. But in a real production system, this is not always acceptable.

The problem may appear rarely. It may happen only for one client. The pro may require load. It may also exist only inside an already running process.

A restart can destroy the exact state you need to investigate.

This is why logme library supports a separate idea: logging can be controlled while the application is running.

Runtime Control

In logme, logging control is exposed through the control API.

This is not a separate “second configuration”. It is a command interface built on top of the existing logme model: channels, levels, backends, trace points, subsystems, and other library objects.

For example, you can enable a detailed level for a specific channel:

level --channel raw debug

You can also enable trace points:

trace enable Raw:*:*

You can temporarily add a backend, change flags, enable a channel, or disable a channel.

The important point is that the control API does not work with an abstract global logging level. It works with the real logme model.

This matters in large applications.

In a big service, it is rarely useful to enable all debug logs everywhere. Usually, you need diagnostics only for one part of the system: a network protocol, SSL, storage, one subsystem, or one trace point.

This level of granularity is exactly what production diagnostics need.

The control API can be used in several ways:

directly from application code;
through the control server;
through logmectl;
through logmeweb.

In an interactive scenario, the application is already running. You connect to it with a control client and change logging behavior without restarting the process.

Why This Is Better Than Reloading Configuration

Reloading configuration often looks like a simple solution. But it has one important downside.

A configuration file usually describes the desired state of the application as a whole. If you change it only for diagnostics, you must keep track of what is temporary and what is permanent.

You also need to make sure that the next service start will not inherit a temporary debug setting by accident.

A control command works better for temporary actions:

level --channel raw debug

It is easy to apply. It is also easy to undo:

level --channel raw info

This approach is especially useful when you need to enable additional diagnostics quickly in a running service without mixing permanent configuration with temporary investigation steps.

Startup Control Through Environment Variables

There is another common scenario.

Sometimes you need to change logging not in an already running process, but right at startup. This is useful for tests, CI, Docker containers, systemd units, and Windows service wrappers.

For this case, logme supports environment control.

The application can explicitly allow control commands to be read from environment variables:

Logme::EnvironmentControlOptions options;

options.Policy = Logme::ControlPolicy::Safe();
options.ErrorMode = Logme::ENV_CONTROL_CONTINUE_ON_ERROR;

Logme::Instance->ApplyEnvironmentControl(options);

After that, the application can be started like this:

LOGME_CONTROL="level --channel raw debug; trace enable Raw:*:*" ./service

One environment variable can contain several commands separated by ;.

This is convenient because a diagnostic startup often requires more than one action. For example, you may need to enable a channel, change its level, and enable trace points:

LOGME_CONTROL="channel --enable raw; level --channel raw debug; trace enable Raw:*:*"

You can also use several variables:

LOGME_CONTROL_1="channel --enable raw"
LOGME_CONTROL_2="level --channel raw debug; trace enable Raw:*:*"

logme does not read environment variables automatically.

If the application does not call ApplyEnvironmentControl(), variables such as LOGME_CONTROL and LOGME_CONTROL_N have no effect.

This is an important security decision.

The environment is an external source of data. In some deployments, it may be controlled by someone who should not have full diagnostic control over the application.

Because of that, environment control in logme is an opt-in mechanism. The application decides whether it trusts the environment for this particular run.

Why Control Policies Are Needed

If the control API has full access, it can do many things.

For example, it can add a file backend, change routing, read logs, or call extension commands. This is acceptable for a trusted local API. But it may be too broad for environment control or for a network control server.

This is why logme has control policies.

A policy defines which control commands are allowed in a specific scenario.

For example, a safe policy is suitable for normal startup overrides:

options.Policy = Logme::ControlPolicy::Safe();

In this mode, the user can change levels, change flags, enable trace points, and manage channels and subsystems.

However, dangerous actions, such as adding an arbitrary file backend, are forbidden.

This makes environment control safer for production use.

For example, you can allow an engineer to enable debug logging for one channel:

LOGME_CONTROL="level --channel raw debug"

At the same time, you can prevent the environment from redirecting logs to an arbitrary file or executing an extended application command.

For deeper diagnostics, an application can use a diagnostic policy.

A diagnostic policy may allow temporary memory backend scenarios, such as RingBufferBackend, while still blocking actions that are dangerous for a particular deployment.

If control commands come from a fully trusted source, the application can use a full policy. This behavior is compatible with the older control API model.

Policies Are Not Only for Environment Control

Environment control is only one source of control commands.

The same policy model can also be applied to the control server.

For example, an application can start the control server with a restricted policy:

Logme::ControlConfig config;

config.Enable = true;
config.Port = 3131;
config.Interface = Logme::CONTROL_INTERFACE_LOOPBACK;

Logme::Instance->StartControlServer(
  config
  , Logme::ControlPolicy::Safe()
);

This control server can still manage logging while the process is running. However, it does not have to provide full access to every control command.

This is useful for long-running services.

A control endpoint may need to exist all the time, but its capabilities should be limited.

The policy can also be changed later from code:

Logme::Instance->SetControlServerPolicy(
  Logme::ControlPolicy::Diagnostic()
);

For example, an application can use a safe policy by default. Then it can switch to a diagnostic policy only in special builds, in a test environment, or after additional authorization at the application level.

What Happens When a Command Fails

Environment control runs during startup. Because of that, the application must decide what to do when a command fails.

For example, a command may refer to a channel that does not exist. A command may also be blocked by the selected policy.

EnvironmentControlOptions includes an error handling mode for this situation.

In soft mode, the error is logged, but the remaining commands continue to run. This is useful for diagnostic launches where one failed command should not prevent the whole application from starting.

In strict mode, processing stops on the first error. This is useful for tests and CI, where an invalid diagnostic configuration must be visible immediately.

All environment control errors are written through normal internal logme logging via CHINT.

This is important. Problems with control commands do not disappear silently.

If a command was blocked by policy or failed during execution, it should be visible in the logs.

Why This Is Not Just LOG_LEVEL

Many logging libraries allow the log level to be configured through the environment.

For example:

LOG_LEVEL=debug

This is useful, but limited.

In a small application, this may be enough. In a large service, it usually is not.

The problem is that “debug everywhere” is too broad.

You often need diagnostics for one specific part of the system. Today it may be the raw channel. Tomorrow it may be http2. Later it may be a specific subsystem or a trace point around suspicious code.

This is why logme does not introduce a separate environment settings language.

Environment control uses the same commands as the main control API:

LOGME_CONTROL="level --channel raw debug; trace enable Raw:*:*"

This makes the mechanism much more powerful.

At the same time, it does not create a second parallel configuration system. All commands go through one control layer and can be restricted by the same policy model.

Practical Use Cases

Runtime logging control is useful in several common scenarios.

In production, you can keep a normal quiet logging level by default. When a problem appears, you can quickly enable detailed diagnostics for one channel without restarting the service.

In tests, you can run the same binary with different diagnostic scenarios without changing configuration files.

Or in containers, you can pass temporary logging overrides through environment variables.

Also in services, you can keep a control server enabled, but restrict it with a safe or diagnostic policy.

The main idea is that logging becomes operationally manageable.

The application is no longer limited to “write a file” or “enable debug”. It gets a dedicated operational interface for diagnostics.

The developer can also decide in advance which actions are allowed through that interface.

Why This Matters for Long-Running C and C++ Services

Runtime logging control is especially valuable for long-running C and C++ services.

In these systems, the cost of restarting a process can be high. The cost of permanent debug logging can also be high. And the cost of losing diagnostic state can be even higher.

A rare production problem may disappear after restart. Any race condition may need a specific runtime state. A protocol issue may happen only for one client connection. A performance problem may appear only after hours of load.

In all these cases, logging without restart is not just a convenience. It is a practical diagnostic requirement.

logme solves this problem with three related mechanisms:

Control API             runtime command interface
Environment control     startup-time control commands
Control policies        safety model for allowed commands

Together, they provide flexibility without giving up safety and predictability.

Conclusion

Runtime logging control in logme makes production diagnostics much more practical.

The control API allows an application to change channels, levels, trace points, backends, and other logging objects while the process is running (see also).

Environment control allows startup-time overrides for tests, CI, containers, service wrappers, and controlled production launches.

Control policies define which commands are allowed in each scenario.

Together, these features allow you to enable the diagnostics you need, exactly where you need them, without restarting the service and without turning logging into an unsafe global switch.

The result is a logging system that is not only a file writer, but also an operational diagnostic interface for real production services.