Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v2] Consolidate storage selection into jaeger_storage extension #6225

Open
yurishkuro opened this issue Nov 19, 2024 · 4 comments
Open

[v2] Consolidate storage selection into jaeger_storage extension #6225

yurishkuro opened this issue Nov 19, 2024 · 4 comments

Comments

@yurishkuro
Copy link
Member

I think I made a small mistake in the design of the v2 storage configuration. The jaeger_storage extension defines configurations for all storage backends, but does not actually takes a position which ones are which. This leads to two problems:

  1. The other components need to make this choice and be in sync. Specifically jaeger_storage_exporter and jaeger_query both require TraceStorage name which they use to retrieve a storage factory from jaeger_storage
  2. The primary/archive storage distinction work [WIP] Remove dependency on ArchiveFactory #6156 revealed that storage factory needs the is_archive input parameter to customize the backend implementation specifically for archive storage needs. However, in the current setup jaeger_storage does not know this distinction, so in [WIP] Remove dependency on ArchiveFactory #6156 a new config flag is being proposed, which is technically redundant.

Proposal

Move selection of specific storage implementations into jaeger_storage extension as well. Right now the config might look like:

  jaeger_query:
    storage:
      traces: some_storage
      metrics: some_metrics_storage

  jaeger_storage:
    backends:
      some_storage:
        memory:
    metric_backends:
      some_metrics_storage:
        prometheus:
          endpoint: http://prometheus:9090

We can add a new section that would explicitly assign the roles to different backends:

  jaeger_storage:
    roles:
      traces: some_storage
      traces_archive: some_other_storage
      metrics: some_metrics_storage
    backends:
      ...

And we would remove the corresponding entries from jaeger_query and exporter. Currently query/exporter use the GetFactory(name) function from the extension, instead they would be asking for specific role like GetTracesStorageFactory, GetTraceArchiveStorageFactory, etc.

Benefits:

  • query/exporter do not need to be in sync on the name of the storage backend
  • jaeger_storage will know to which role a particular backend is assigned, so when instantiating a factory it can pass additional flags like is_archive as needed
@mahadzaryab1
Copy link
Collaborator

@yurishkuro This proposal sounds good to me. I can get started on the implementation. For the first PR, I'm thinking of adding the roles field to the configuration along with its validation. Do you think that's a good place to start?

@yurishkuro
Copy link
Member Author

I actually tried doing this the other day, but realized a couple of limitations of such refactoring:

  • in v1 it was possible to run collector with multiple output storage types, while jaeger-query only allowed a single storage (plus archive). In v2 it is currently possible to do by defining the jaeger exporter twice, pointing to different storage names. If we move the resolution of name to role into the storage extension, how would we be able to support multi-writes? Would we have to make GetStorage functions return an array? Then the query extension will get ambiguous results.
  • A bit more abstractly, this brings more coupling within storage extension between a backend and a role. That's just a trade-off. Technically we already have roles in the storage extension - metrics vs. trace storage, but beyond that things are a bit more dynamic, e.g. the sampling store API is determined at runtime, not via strong type coupling.

@mahadzaryab1
Copy link
Collaborator

@yurishkuro Yeah, I ran into the same issue while trying to do this as well. Do you have any thoughts on how we should proceed with the given limitations?

@yurishkuro
Copy link
Member Author

Need to think more about it, maybe give up. It's a breaking change to config, plus the multi-storage issue above, so the trade-off is not an obvious benefit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: No status
Development

No branches or pull requests

2 participants