-
Notifications
You must be signed in to change notification settings - Fork 754
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(typha): add new integration #2545
base: master
Are you sure you want to change the base?
Conversation
Hey guys. Would anyone be able to help me with the failed checks? |
Created a Jira card for Docs Team editorial review. |
I'm not too sure about those. Someone from the Agent Integrations team should be able to help you out. |
@cmotta2016 Are you working with someone from the Datadog team already? If not, can you please share your email address so we can communicate on a separate thread. |
Hello @dd-dominic , we are contacting our partner (Apoena) to request a review of this PR with Datadog. |
|
||
This check monitors [Typha][1] to collect Prometheus metrics through the Datadog Agent. | ||
|
||
## Enabling Prometheus Metrics |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
## Enabling Prometheus Metrics | |
## Enabling Prometheus metrics |
TYPHA_PROMETHEUSMETRICSPORT: "9093" | ||
``` | ||
|
||
See the official [documentation][2] for more information. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See the official [documentation][2] for more information. | |
See [Configuring Typha][2] for more information. |
|
||
### Validation | ||
|
||
[Run the Agent's status subcommand][5] and look for `typha` under the Checks section. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Run the Agent's status subcommand][5] and look for `typha` under the Checks section. | |
Run the Agent's [status subcommand][5] and look for `typha` under the Checks section. |
|
||
[Run the Agent's status subcommand][5] and look for `typha` under the Checks section. | ||
|
||
For containerized environment: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For containerized environment: | |
For a containerized environment, use the following command: |
options: | ||
- name: prometheus_url | ||
required: true | ||
description: The metric endpoint of your typha instance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
description: The metric endpoint of your typha instance. | |
description: The metric endpoint of your Typha instance. |
typha.breadcrumb.size.count,count,,,,Number of KVs recorded in each breadcrumb.,0,typha,,, | ||
typha.breadcrumb.size.sum,count,,,,Number of KVs recorded in each breadcrumb.,0,typha,,, | ||
typha.client.latency.secs.count,count,,,,Per-client latency. I.e. how far behind the current state is each client.,0,typha,,, | ||
typha.client.latency.secs.sum,count,,,,Per-client latency. I.e. how far behind the current state is each client.,0,typha,,, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typha.client.latency.secs.sum,count,,,,Per-client latency. I.e. how far behind the current state is each client.,0,typha,,, | |
typha.client.latency.secs.sum,count,,,,Per-client latency. That is, how far behind the current state is each client.,0,typha,,, |
typha.client.write.latency.secs.sum,count,,,,Per-client write. How long each write call is taking.,0,typha,,, | ||
typha.connections.accepted,count,,,,Total number of connections accepted over time.,0,typha,typha.connections.accepted,, | ||
typha.connections.active,gauge,,,,Number of open client connections (including connections that have not completed the handshake).,0,typha,typha.connections.active,, | ||
typha.connections.streaming,gauge,,,,Number of client connections that are actively streaming (i.e. connections that successfully completed the handshake).,0,typha,typha.connections.streaming,, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typha.connections.streaming,gauge,,,,Number of client connections that are actively streaming (i.e. connections that successfully completed the handshake).,0,typha,typha.connections.streaming,, | |
typha.connections.streaming,gauge,,,,Number of client connections that are actively streaming (that is, connections that successfully completed the handshake).,0,typha,typha.connections.streaming,, |
typha.next.breadcrumb.latency.secs.sum,count,,,,Time to retrieve next breadcrumb when already behind.,0,typha,,, | ||
typha.ping.latency.count,count,,,,Round-trip ping/pong latency to client. Typha's protocol includes a regular ping/pong keepalive to verify that the connection is still up.,0,typha,,, | ||
typha.ping.latency.sum,count,,,,Round-trip ping/pong latency to client. Typha's protocol includes a regular ping/pong keepalive to verify that the connection is still up.,0,typha,,, | ||
typha.updates.skipped,count,,,,Total number of updates skipped because the datastore change was not relevant. (For example an update to a Kubernetes Pod field that Calico does not read.),0,typha,typha.updates.skipped,, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typha.updates.skipped,count,,,,Total number of updates skipped because the datastore change was not relevant. (For example an update to a Kubernetes Pod field that Calico does not read.),0,typha,typha.updates.skipped,, | |
typha.updates.skipped,count,,,,Total number of updates skipped because the datastore change was not relevant. (For example, an update to a Kubernetes Pod field that Calico does not read.),0,typha,typha.updates.skipped,, |
typha.snapshots.reused,count,,,,The number of binary snapshots that Typha was able to reuse for multiple clients thus reducing CPU usage.,0,typha,typha.snapshots.reused,, | ||
typha.snapshot.raw.bytes,gauge,,,,The size of the most recent binary snapshot in bytes pre-compression.,0,typha,typha.snapshot.raw.bytes,, | ||
typha.snapshots.compressed.bytes,gauge,,,,The size of the most recent binary snapshot in bytes post-compression.,0,typha,typha.snapshots.compressed.bytes,, | ||
typha.breadcrumb.block,count,,,,Count of the number of times Typha got the next Breadcrumb after blocking.,0,typha,typha.breadcrumb.block.,, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typha.breadcrumb.block,count,,,,Count of the number of times Typha got the next Breadcrumb after blocking.,0,typha,typha.breadcrumb.block.,, | |
typha.breadcrumb.block,count,,,,Count of the number of times Typha got the next breadcrumb after blocking.,0,typha,typha.breadcrumb.block.,, |
typha.snapshot.raw.bytes,gauge,,,,The size of the most recent binary snapshot in bytes pre-compression.,0,typha,typha.snapshot.raw.bytes,, | ||
typha.snapshots.compressed.bytes,gauge,,,,The size of the most recent binary snapshot in bytes post-compression.,0,typha,typha.snapshots.compressed.bytes,, | ||
typha.breadcrumb.block,count,,,,Count of the number of times Typha got the next Breadcrumb after blocking.,0,typha,typha.breadcrumb.block.,, | ||
typha.breadcrumb.non.block,count,,,,Count of the number of times Typha got the next Breadcrumb without blocking.,0,typha,typha.breadcrumb.non.block,, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typha.breadcrumb.non.block,count,,,,Count of the number of times Typha got the next Breadcrumb without blocking.,0,typha,typha.breadcrumb.non.block,, | |
typha.breadcrumb.non.block,count,,,,Count of the number of times Typha got the next breadcrumb without blocking.,0,typha,typha.breadcrumb.non.block,, |
What does this PR do?
Create a new integration called
Typha
.Motivation
We need to monitor the entire Calico stack, but there is no integration for the Typha component. So we write this custom agent check to collect Typha metrics and send it to the Datadog.
Review checklist
no-changelog
label attachedAdditional Notes
Anything else we should know when reviewing?