Reduce fmt.Sprintf allocations in query encoding #2919

kgeckhart · 2024-12-04T17:20:45Z

This PR eliminates some calls to fmt.Sprintf in favor of string concatenation when the values being concatenated are strings. This should reduce the memory required to construct queries with complex objects / larger arrays. Arrays which are not flat should see a further improvement as the prefix is now computed once ahead of time instead of doing it with an fmt.Sprintf call on every element.

Included are some benchmarks which I used to decide that the maps + array use cases with fmt.Sprintf seem okay. Dropping fmt.Sprintf is faster but won't lead to any reduction in allocations and in the case of arrays a size between 100 and 499 is actually more allocations.

String formatting benchmark results

Benchmark_sprintf_strings-11            26738427                44.62 ns/op            8 B/op          1 allocs/op
Benchmark_concat_strings-11             1000000000               0.5688 ns/op          0 B/op          0 allocs/op

Int formatting benchmark results

Benchmark_int_formatting/array_-_sprintf_with_1_size-11                 25135780                48.53 ns/op            5 B/op          1 allocs/op
Benchmark_int_formatting/array_-_sprintf_with_10_size-11                23695940                50.23 ns/op            8 B/op          1 allocs/op
Benchmark_int_formatting/array_-_sprintf_with_100_size-11               23888797                51.02 ns/op            8 B/op          1 allocs/op
Benchmark_int_formatting/array_-_sprintf_with_250_size-11               23499748                50.84 ns/op            8 B/op          1 allocs/op
Benchmark_int_formatting/array_-_sprintf_with_500_size-11               21207704                56.39 ns/op           16 B/op          2 allocs/op
Benchmark_int_formatting/array_-_sprintf_with_1000_size-11              21082281                56.70 ns/op           16 B/op          2 allocs/op
Benchmark_int_formatting/array_-_concat_strconv_with_1_size-11          61038285                18.95 ns/op            5 B/op          1 allocs/op
Benchmark_int_formatting/array_-_concat_strconv_with_10_size-11         65347315                17.87 ns/op            8 B/op          1 allocs/op
Benchmark_int_formatting/array_-_concat_strconv_with_100_size-11        38897104                30.29 ns/op           16 B/op          2 allocs/op
Benchmark_int_formatting/array_-_concat_strconv_with_250_size-11        39651289                29.95 ns/op           16 B/op          2 allocs/op
Benchmark_int_formatting/array_-_concat_strconv_with_500_size-11        40254046                29.64 ns/op           16 B/op          2 allocs/op
Benchmark_int_formatting/array_-_concat_strconv_with_1000_size-11       40166244                29.21 ns/op           16 B/op          2 allocs/op

Map key formatting benchmark results

Benchmark_int_formatting/map_-_sprintf_with_1_size-11               9843427               122.3 ns/op            32 B/op          2 allocs/op
Benchmark_int_formatting/map_-_sprintf_with_10_size-11              9815258               121.7 ns/op            32 B/op          2 allocs/op
Benchmark_int_formatting/map_-_sprintf_with_100_size-11             9635235               124.1 ns/op            32 B/op          2 allocs/op
Benchmark_int_formatting/map_-_sprintf_with_250_size-11             9713184               124.4 ns/op            32 B/op          2 allocs/op
Benchmark_int_formatting/map_-_sprintf_with_500_size-11             8893167               136.0 ns/op            32 B/op          4 allocs/op
Benchmark_int_formatting/map_-_sprintf_with_1000_size-11            8691573               136.6 ns/op            32 B/op          4 allocs/op
Benchmark_int_formatting/map_-_concat_strconv_with_1_size-11        22724402                51.69 ns/op           32 B/op          2 allocs/op
Benchmark_int_formatting/map_-_concat_strconv_with_10_size-11       23549650                51.40 ns/op           32 B/op          2 allocs/op
Benchmark_int_formatting/map_-_concat_strconv_with_100_size-11      18886756                62.81 ns/op           32 B/op          3 allocs/op
Benchmark_int_formatting/map_-_concat_strconv_with_250_size-11      18890992                62.93 ns/op           32 B/op          3 allocs/op
Benchmark_int_formatting/map_-_concat_strconv_with_500_size-11      18890448                62.75 ns/op           32 B/op          3 allocs/op
Benchmark_int_formatting/map_-_concat_strconv_with_1000_size-11     19919188                59.89 ns/op           32 B/op          3 allocs/op

Madrigal · 2024-12-16T17:36:51Z

Thanks for your contribution. Just letting you know this is still on our radar, but we've been a bit busy.

For context, was there a particular scenario where this was causing a performance issue for you?

kgeckhart · 2024-12-16T19:37:40Z

Thanks for the response. I work on an OSS CloudWatch exporter https://github.com/prometheus-community/yet-another-cloudwatch-exporter/. When running high volumes we see some rather spiky memory and I was trying to reduce it a bit. This shows the top allocators from pprof after running for some time,

fmt.Sprintf was a lot higher than I expected it be on the list.

Almost all of the fmt.Sprintf allocations appear to stem from these two functions

when serializing the large GetMetricDataInput payload.

lucix-aws · 2025-01-06T18:12:46Z

aws/protocol/query/array.go

 	// Lists can't have flat members
-	return newValue(a.values, fmt.Sprintf("%s.%d", prefix, a.size), false)
+	return newValue(a.values, fmt.Sprintf("%s.%d", a.prefix, a.size), false)


Why leave Sprintf here?

The extra int formatting for this case causes an increase in allocations when the size of the array being serialized was between 100-500. There's a nice drop in CPU but a lot of the CPU utilization we see is driven by GC so I chose to leave this one alone. I'll edit the full benchmark results in to the PR description.

lucix-aws · 2025-01-07T17:46:04Z

Can we see some benchmarks that actually use the query encoder for some structure (ideally something copied from say EC2)? This seems fine on paper but all the benchmarks are demonstrating is the difference between these operations in isolation.

Madrigal · 2025-01-08T18:15:44Z

We recently added instructions on how to generate a changelog entry for PRs. In the past, we had to do this on behalf of contributors, slowing down the review process. Can you go through this document and generate a changelog entry for this PR?

kgeckhart added 2 commits December 4, 2024 11:52

Pre-compute prefix when array is not flat

4cd0fa3

Switch to string concat for object keys

5baebeb

kgeckhart requested a review from a team as a code owner December 4, 2024 17:20

Merge branch 'main' into kgeckhart/reduce-sprintf-overhead-in-encoding

3509624

lucix-aws reviewed Jan 6, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce fmt.Sprintf allocations in query encoding #2919

Reduce fmt.Sprintf allocations in query encoding #2919

kgeckhart commented Dec 4, 2024 •

edited

Loading

Madrigal commented Dec 16, 2024

kgeckhart commented Dec 16, 2024

lucix-aws Jan 6, 2025

kgeckhart Jan 6, 2025

lucix-aws commented Jan 7, 2025

Madrigal commented Jan 8, 2025

Reduce fmt.Sprintf allocations in query encoding #2919

Are you sure you want to change the base?

Reduce fmt.Sprintf allocations in query encoding #2919

Conversation

kgeckhart commented Dec 4, 2024 • edited Loading

Madrigal commented Dec 16, 2024

kgeckhart commented Dec 16, 2024

lucix-aws Jan 6, 2025

Choose a reason for hiding this comment

kgeckhart Jan 6, 2025

Choose a reason for hiding this comment

lucix-aws commented Jan 7, 2025

Madrigal commented Jan 8, 2025

kgeckhart commented Dec 4, 2024 •

edited

Loading