Internal Metrics
This section provides reference descriptions about the internal metrics exported from Redpanda’s /metrics
endpoint.
Use /public_metrics for your primary dashboards for system health. Use /metrics for detailed analysis and debugging. |
In a live system, Redpanda metrics are exported only for features that are in use. For example, a metric for consumer groups is not exported when no groups are registered. To see the available internal metrics in your system, query the
bash |
Internal metrics
Most internal metrics are useful for debugging. The following subset of internal metrics can be useful to monitor system health.
vectorized_cloud_storage_read_bytes
Number of bytes Redpanda has read from cloud storage. This is tracked on a per-topic and per-partition basis, and increments each time a cloud storage read operation successfully completes.
vectorized_cluster_partition_last_stable_offset
Last stable offset.
If this is the last record received by the cluster, then the cluster is up-to-date and ready for maintenance.
vectorized_cluster_partition_schema_id_validation_records_failed
Number of records that failed schema ID validation.
vectorized_io_queue_delay
Total delay time in the queue.
Can indicate latency caused by disk operations in seconds.
vectorized_io_queue_queue_length
Number of requests in the queue.
Can indicate latency caused by disk operations.
vectorized_kafka_quotas_balancer_runs
Number of times the throughput quota balancer has executed.
Type: counter
vectorized_kafka_quotas_quota_effective
Current effective quota for the quota balancer, in bytes per second.
Type: counter
vectorized_kafka_quotas_traffic_intake
Total amount of Kafka traffic (in bytes) taken in from clients for processing that was considered by the throttler.
Type: counter
vectorized_kafka_quotas_traffic_egress
Total amount of Kafka traffic (in bytes) published to clients that was considered by the throttler.
Type: counter
vectorized_kafka_rpc_active_connections
Number of currently active Kafka RPC connections, or clients.
vectorized_kafka_rpc_connects
Number of accepted Kafka RPC connections.
Compare to the value at a previous time to derive the rate of accepted connections.
vectorized_kafka_rpc_produce_bad_create_time
An incrementing counter for the number of times a producer created a message with a timestamp skewed from the broker’s date and time. This metric is related to the following properties:
-
log_message_timestamp_alert_before_ms
: Increment this gauge when thecreate_timestamp
on a message is too far in the past as compared to the broker’s time. -
log_message_timestamp_alert_after_ms
: Increment this gauge when thecreate_timestamp
on a message is too far in the future as compared to the broker’s time.
vectorized_kafka_rpc_received_bytes
Number of bytes received from Kafka RPC clients in valid requests.
Compare to the value at a previous time to derive the throughput in Kafka layer in bytes/sec received.
vectorized_kafka_rpc_requests_completed
Number of successful Kafka RPC requests.
Compare to the value at a previous time to derive the messages per second per shard.
vectorized_ntp_archiver_compacted_replaced_bytes
Number of bytes removed from cloud storage by compaction operations. This is tracked on a per-topic and per-partition basis.
This metric resets every time partition leadership changes. It tracks whether or not compaction is performing operations on cloud storage.
The namespace
label supports the following options for this metric:
-
kafka
- User topics -
kafka_internal
- Internal Kafka topic, such as consumer groups -
redpanda
- Redpanda-only internal data
Labels:
-
namespace=("kafka" | "kafka_internal" | "redpanda")
-
topic
-
partition
vectorized_ntp_archiver_pending
The difference between the last committed offset and the last offset uploaded to Tiered Storage for each partition. A value of zero for this metric indicates that all data for a partition is uploaded to Tiered Storage.
This metric is impacted by the cloud_storage_segment_max_upload_interval_sec
tunable property. If this interval is set to 5 minutes, the archiver will upload committed segments to Tiered Storage every 5 minutes or less. If this metric continues growing for longer than the configured interval, it can indicate a potential network issue with the upload path for that partition.
The namespace
label supports the following options for this metric:
-
kafka
- User topics -
kafka_internal
- Internal Kafka topic, such as consumer groups -
redpanda
- Redpanda-only internal data
Labels:
-
namespace=("kafka" | "kafka_internal" | "redpanda")
-
topic
-
partition
vectorized_raft_leadership_changes
Number of leadership changes.
High value can indicate nodes failing and causing leadership changes.
vectorized_reactor_utilization
Redpanda process utilization.
Shows the true utilization of the CPU by a Redpanda process.
vectorized_storage_log_compaction_removed_bytes
Number of bytes removed from local storage by compaction operations. This is tracked on a per-topic and per-partition basis. It tracks whether compaction is performing operations on local storage.
The namespace
label supports the following options for this metric:
-
kafka
- User topics -
kafka_internal
- Internal Kafka topic, such as consumer groups -
redpanda
- Redpanda-only internal data
Labels:
-
namespace=("kafka" | "kafka_internal" | "redpanda")
-
topic
-
partition