-
Notifications
You must be signed in to change notification settings - Fork 516
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Throughput benchmarks #424
Comments
This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions. |
I think I'm going to take this next, we need to test the performance of what we have now. Utilizing the integration test with something like pprof to scrape for statistics seems appealing. |
Running on top the integration tests seems like a good start! One thing to think about beyond that is how performance would differ under different load patterns: lots of streams against the same type, lots of streams against an opaque type, lots of streams with different types, etc. and how the types of updates that are happening affect the stream (what % of streams are getting updates, the ADS case of multiple resources being sent over the same stream etc.) |
Excellent, myself and @dougfort might iterate on a design doc to see what we can come up with. We'll see what we can capture, theres a gold mine of data here |
@snowp if we intend to utilize the integration tests for what we want, is it safe to assume we don't intend on doing any long term storage of this throughput data? I was iterating design with a few co-workers and we were curious as to what you think is a good fit for this situation:
We initially sketched out a mix of 1 and 2 but we don't want to creep the scope here to something that is unnecessary for the data we want to collect. Here's a quick proposal we drafted up. It just sketches out the concepts of each component of the benchmark. We can go back and add technical artifacts to that later. With a code example of the producer we talk about. I think we want a system that will allow us to at least run pprof so we can see what's going on at runtime. The go benchmarking framework is limited as we've come to learn. If we do end up going with a system that complex it might even be worth separating from the integration test as a whole and having some sort of standalone test under |
Want to note that eventually we should use this benchmark to test code change too: #451 |
Just FYI this PR enables profiling for cpu, lock contention (block), mutex switching, go routine spawning and memory usage. It currently runs off the integration test but to test throughput I want to build a test client that puts the server through its paces. Once that's done I can profile that code and collect runtime data so we can have some throughput numbers |
Hi @snowp, @alecholmez, I am new to the go-control-plane and trying to gauge the scope of this issue. I see that Alec has a PR up for benchmarking integration tests. What can I do to help complete this issue? I was thinking maybe I can add benchmarking to As for #424 (comment), what did you guys have in mind exactly? I am looking at Do let me know how I can contribute! |
@snowp any input here? Hiromi has found that the go benchmarks aren't particularly accurate for detecting throughput here, did you have in mind a separate framework that we might need to build out to measure this? I guess we're just looking for some clarification here |
My original thought was to have a system where we simulate continuous updates to the cache and understand how long it takes for these updates to make it to clients under increasing rate of change. Benchmarking small pieces of the code might be beneficial as microbenchmarks that can be optimized independently, but in order to understand the actual throughput of the system we probably need something end to end. |
Add benchmarks providing some baseline for performance. An example of this would be a producer that produces new snapshots at a rapid pace and seeing how long it takes for these changes to get sent to the client.
Should at the very least cover the new delta code (as it's the more performant protocol), but could also target sotw.
This would help us better understand the impact of larger changes (like #413) and the cost of per resource computation in delta.
@alecholmez
The text was updated successfully, but these errors were encountered: