Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replay using kafka #1262

Open
bb-prasunasarana opened this issue Jan 6, 2025 · 1 comment
Open

Replay using kafka #1262

bb-prasunasarana opened this issue Jan 6, 2025 · 1 comment
Labels
bug Something isn't working

Comments

@bb-prasunasarana
Copy link

bb-prasunasarana commented Jan 6, 2025

Hi,

We used Goreplay to capture and replay traffic using KAFKA for our system, which gets a throughput of 1.6 lakh RPM (equivalent to 2700 RPS).

We successfully captured data over a 2-hour run, pushing 2 crore messages during that period. However, when we attempt to replay the captured traffic, we observe a throughput of only ~300 RPS, significantly lower than the captured rate of 2700 RPS.

./gor --input-kafka-host broker-ip --input-kafka-topic topic-name --input-kafka-offset 0 --output-http prod_url --stats --output-http-stats --verbose 1
[DEBUG][elapsed 472.957µs]:
output_http:latest,mean,max,count,count/second,gcount
[DEBUG][elapsed 68.516µs]:
output_http:0,0,0,0,0,5
2025/01/04 23:30:13 [PPID 14882 and PID 15550] Version:2.0.0
[DEBUG][elapsed 5.000066334s]:
output_http:0,0,50,1427,285,99
[DEBUG][elapsed 5.000299959s]:
output_http:33,0,53,1390,278,12
[DEBUG][elapsed 1.387815704s]:
output_http:0,1,55,1404,280,112
[DEBUG][elapsed 5.000939106s]:
output_http:50,0,50,1085,217,118
[DEBUG][elapsed 5.000809986s]:
output_http:13,1,43,1043,208,42
[DEBUG][elapsed 5.000092741s]:
output_http:16,1,60,1404,280,136

Copy link

dosubot bot commented Jan 6, 2025

Hey @bb-prasunasarana 👋, I'm here to help you out with your bug or answer any questions you might have. Let's work together to solve this!

The issue of replay throughput using Kafka with GoReplay being significantly lower than the captured rate is a known challenge. Here are some steps you can take to improve the throughput:

  1. Check Bandwidth: Ensure that the network bandwidth is sufficient to handle the replay traffic. If the network adapter is overwhelmed, it can cause the output-http queue to fill up, leading to reduced throughput.

  2. Adjust Output Workers: Consider adjusting the number of HTTP output workers. You can use dynamic worker scaling by setting --output-http-workers=-1, or calculate the optimal number of workers using the formula:

    output-workers = (Average number of requests per second) / (Average target response time per second)
    
  3. System Tuning: Adjust system limits on the source server for optimal performance. Here are some recommended settings:

    net.ipv4.tcp_max_tw_buckets = 65536
    net.ipv4.tcp_tw_recycle = 1
    net.ipv4.tcp_tw_reuse = 0
    net.ipv4.tcp_max_syn_backlog = 131072
    net.ipv4.tcp_syn_retries = 3
    net.ipv4.tcp_synack_retries = 3
    net.ipv4.tcp_retries1 = 3
    net.ipv4.tcp_retries2 = 8
    net.ipv4.tcp_rmem = 16384 174760 349520
    net.ipv4.tcp_wmem = 16384 131072 262144
    net.ipv4.tcp_mem = 262144 524288 1048576
    net.ipv4.tcp_max_orphans = 65536
    net.ipv4.tcp_fin_timeout = 10
    net.ipv4.tcp_low_latency = 1
    net.ipv4.tcp_syncookies = 0

By addressing these potential bottlenecks and tuning the system, you should be able to improve the replay throughput [1].

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other

@dosubot dosubot bot added the bug Something isn't working label Jan 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant