fix(runtime) - Mitigate the receipt size limit bug #12633

jancionear · 2024-12-17T02:51:14Z

There is a bug (#12606) which allows to create receipts which are slightly larger than max_receipt_size.

Let's mitigate the issue by making the runtime able to handle receipts that are above max_receipt_size.
First the size limit check is relaxed to check only new receipts. Then the receipt forwarding logic is modified to pretend that all receipts have congestion_size below max_receipt_size.
This mostly matches the code that was added in 2.3.1 and 2.4.0, but this PR contains one more change that is needed for bandwidth scheduler. Before bandwidth scheduler we were able to forward receipts that are slightly above the max_receipt_size because the big outgoing limit was 4.5 MiB and try_forward pretended that all receipts are at most 4 MiB. With bandwidth scheduler this changes, bandwidth scheduler expects all receipts to be at most max_receipt_size, otherwise there's no guarantee that the receipt will ever be forwarded. To deal with this there's the change which pretends that all receipts are at most max_receipt_size when generating bandwidth requests.
This is just a mitigation, a proper fix will be done in the future.

One more advantage of this patch is that it should allow us to reduce max_receipt_size without crazy protocol upgrade logic. We could reduce it to e.g 2MB and the scheduler would still be able to handle 4MB receipts created in the previous protocol version.

I added some of tests which try to create a receipt above max size and check that everything still works fine.

As usual the test_transaction_limit_for_local_congestion test broke when I added more methods to the test-contract-rs contract. For some reason this test is very sensitive to contract size and the congestion level rapidly drops when size of the contract increases. To deal with this I added a separate small contract that is used exclusively by these congestion control tests, that should fix the issue once and for all.

The change is divided into commits for easier review.

… bandwidth requests

…act for it

codecov · 2024-12-17T04:23:08Z

Codecov Report

Attention: Patch coverage is 94.59459% with 2 lines in your changes missing coverage. Please review.

Project coverage is 70.61%. Comparing base (88ec6c8) to head (aae20a6).
Report is 13 commits behind head on master.

Files with missing lines	Patch %	Lines
runtime/runtime/src/congestion_control.rs	90.47%	2 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##           master   #12633       +/-   ##
===========================================
+ Coverage    1.66%   70.61%   +68.94%     
===========================================
  Files         669      847      +178     
  Lines      119942   173400    +53458     
  Branches   119942   173400    +53458     
===========================================
+ Hits         2001   122448   +120447     
+ Misses     117838    45825    -72013     
- Partials      103     5127     +5024

Flag	Coverage Δ
backward-compatibility	`0.16% <0.00%> (-0.01%)`	⬇️
db-migration	`0.16% <0.00%> (-0.01%)`	⬇️
genesis-check	`1.36% <0.00%> (-0.01%)`	⬇️
linux	`69.19% <83.78%> (+67.52%)`	⬆️
linux-nightly	`70.20% <94.59%> (?)`
pytests	`1.66% <0.00%> (-0.01%)`	⬇️
sanity-checks	`1.47% <0.00%> (-0.01%)`	⬇️
unittests	`70.44% <94.59%> (?)`
upgradability	`0.20% <0.00%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

wacban

LGTM

wacban · 2024-12-23T11:36:43Z

runtime/runtime/src/verifier.rs

+    /// NewReceipt validation. Tolerates some receipts that wouldn't pass new validation. It has to
+    /// be less strict because:
+    /// 1) Older receipts might have been created before new validation rules.
+    /// 2) There is a bug which allows to create receipts that are above the size limit. Runtime has


I thought it's not possible to create them above the size limit but that the receipt may increase in size later?

Here I used "create" in a broader sense, I wanted to say that it's possible for a large receipt to occur.
And when a max size value is returned, the created receipt is larger than max_receipt_size, so I'd say that counts as "create"

wacban · 2024-12-23T11:42:46Z

runtime/runtime/src/verifier.rs

+    /// 2) There is a bug which allows to create receipts that are above the size limit. Runtime has
+    ///    to handle them gracefully until the receipt size limit bug is fixed.
+    ///    See https://github.com/near/nearcore/issues/12606 for details.
+    OldReceipt,


I can see that the delayed and incoming receipts are validated but the buffered receipts are not, as far as I can see. Is it an omission?

Is the validation for old receipts actually necessary or is it just an extra protection? Those receipts are a result of on chain computation so I would assume we trust them here already.

I think validating old receipts is just a sanity check. When an old receipt fails validation the runtime panics, so we don't expect this validation to ever fail. I think it's fine to not validate buffered receipts.

wacban · 2024-12-23T11:43:46Z

runtime/runtime/src/congestion_control.rs

+        if size > max_receipt_size {
+            if size > max_receipt_size {


this condition got duplicated

wacban · 2024-12-23T11:45:23Z

runtime/runtime/src/congestion_control.rs

+            if size > max_receipt_size {
+                tracing::warn!(
+                    target: "runtime",
+                    "try_forward observed a receipt with size exceeding the size limit! receipt_id: {} size: {} size_limit: {}",


nit: use structured logging - set the values as fields rather than in the formatted message

tracing::warn!( receipt_id=?.. size, max_size, "try_forward observed a receipt with size exceeding the size limit" );

I find these structured logs a bit harder to read than manually formatted messages, but I guess that's the proper way to do it. Changed to structured.

wacban · 2024-12-23T11:48:57Z

integration-tests/src/test_loop/tests/max_receipt_size.rs

+
+// A function call will generate a new receipt. Size of this receipt will be equal to
+// `max_receipt_size`, it'll pass validation, but then `output_data_receivers` will be modified and
+// the receipt's size will go above max_receipt_size. The receipt should be rejected, but currently


Can you assert that this condition is actually triggered?

Added an assert that there really was a receipt with size above max_receipt_size.

wacban · 2024-12-23T11:50:11Z

runtime/near-test-contracts/test-contract-rs/src/lib.rs

+/// Returns a value of size "value_size".
+/// Accepts json args, e.g {"value_size": 1000}


just out of curiosity, why not pass it as simple integer instead of the json?

Json is easy to use and it's readable. I find it easier to think about than manually serializing/deserializing integers. And it's easier to add more arguments to a function.

wacban · 2024-12-23T11:53:02Z

runtime/near-test-contracts/congestion-control-test-contract/src/lib.rs

+#[allow(unused)]
+extern "C" {


Is all of it needed?

Is there any way we can reuse the existing one or refactor it to avoid copy pasting?

Yeah there's a lot of new functions. I tried to use the existing ones, but I want to do very specific things and the existing ones don't really do what I want to do.
It'd be nice if there was an easy way to create a per-test contract, maybe inside the test code itself. One giant contract gets a bit chaotic :/

It'd be nice if there was an easy way to create a per-test contract, maybe inside the test code itself. One giant contract gets a bit chaotic :/

So there are some programmatically constructed contracts in near-test-contracts but trust me the experience maintaining those is quite a bit worse as you end up having to hand-roll WebAssembly, pretty much.

If we wanted to make it conversible from Rust I think it would still be possible technically, but has very significant drawbacks. There's this saying about maintainability along the lines of "code is written once, read many times", but this quote gets cuts off before it gets to "and compiled a further thousand times more." Unfortunately compiling and linking up a huge number of Rust binaries would be a huge hit for our development experience.

While I don't necessarily object to having a couple additional contract crates, I think in general we should probably aim to have as few of these as possible and try to achieve a good maintainability/readability within the confines of a crate (by e.g. splitting the code up into modules – this also naturally enables reuse of the duplicate code at hand.)

nagisa

No blocking comments from me but see inline for my general thoughts.

nagisa · 2025-01-08T13:11:10Z

runtime/near-test-contracts/congestion-control-test-contract/src/lib.rs

+#[allow(unused)]
+extern "C" {


It'd be nice if there was an easy way to create a per-test contract, maybe inside the test code itself. One giant contract gets a bit chaotic :/

So there are some programmatically constructed contracts in near-test-contracts but trust me the experience maintaining those is quite a bit worse as you end up having to hand-roll WebAssembly, pretty much.

If we wanted to make it conversible from Rust I think it would still be possible technically, but has very significant drawbacks. There's this saying about maintainability along the lines of "code is written once, read many times", but this quote gets cuts off before it gets to "and compiled a further thousand times more." Unfortunately compiling and linking up a huge number of Rust binaries would be a huge hit for our development experience.

While I don't necessarily object to having a couple additional contract crates, I think in general we should probably aim to have as few of these as possible and try to achieve a good maintainability/readability within the confines of a crate (by e.g. splitting the code up into modules – this also naturally enables reuse of the duplicate code at hand.)

nagisa · 2025-01-08T13:22:58Z

runtime/runtime/src/lib.rs

@@ -1933,6 +1936,7 @@ impl Runtime {
                &processing_state.apply_state.config.wasm_config.limit_config,
                receipt,
                protocol_version,
+                ValidateReceiptMode::OldReceipt,


Hm, maybe it would be better to name this variant something along the lines of StoredReceipt or something like that? My intuition is that for delayed receipts "old" is perhaps still applicable, sure, but incoming receipts don't feel like they would qualify for the "old" qualifier.

At the same time I'm trying to wrap around if it is really alright to not validate the incoming receipts for their size? I guess not doing so is the reason why this PR doesn't actually fix the associated issue fully?

Hm, maybe it would be better to name this variant something along the lines of StoredReceipt or something like that? My intuition is that for delayed receipts "old" is perhaps still applicable, sure, but incoming receipts don't feel like they would qualify for the "old" qualifier.

Stored doesn't really fit for incoming receipts. Maybe "ExistingReceipt"? The distinction is between receipts that were already created and validated and totally new ones

At the same time I'm trying to wrap around if it is really alright to not validate the incoming receipts for their size? I guess not doing so is the reason why this PR doesn't actually fix the associated issue fully?

I think it's fine because all incoming receipts were created by runtime, which we trust fully. We can trust that the runtime doesn't create receipts that are too big, invalid, etc.
In this case it turned out that it's possible to bypass runtime's size check, and because of that the runtime produced invalid receipts. Fixing that would require adding proper max_receipt_size checks when applying an action receipt. This is a bit complicated, so for now we have a mitigation.

nagisa · 2025-01-08T13:25:07Z

runtime/runtime/src/congestion_control.rs

+        // See https://github.com/near/nearcore/issues/12606
+        let max_receipt_size = apply_state.config.wasm_config.limit_config.max_receipt_size;
+        if size > max_receipt_size {
+            tracing::warn!(


How frequently might this happen in production? If this warning is straightforward to trigger, then this probably shouldn't be a warn: we will inevitably get validators asking if something is wrong with their node and how to fix this warning.

I don't really expect this to happen with normal traffic. Someone would have to deliberately try to trigger it, or be doing some testing with max size receipts.

But tbh I don't see a lot of value in this warning, even if it got triggered, the only effect is that validators will get upset, we won't really react to it in any way. So maybe it's better to downgrade it to a debug-level log 👍

jancionear added 7 commits December 17, 2024 01:13

Relax receipt size limit validation

5270c47

Forward receipts above max_receipt_size

838b2df

Pretend that receipt size is at most max_receipt_size when generating…

8770203

… bandwidth requests

Test max_receipt_size with promise_return

012f5e4

Test returning large value

0f5d97b

Test yield/resume

ec572a2

Fix test_transaction_limit_for_local_congestion, use a separate contr…

37000a2

…act for it

jancionear requested review from nagisa and wacban December 17, 2024 02:51

jancionear requested a review from a team as a code owner December 17, 2024 02:51

jancionear added 2 commits December 17, 2024 03:04

Merge branch 'master' into jan_receipt_size_bug_mitigation

fc84695

Fix merge conflict

9adc521

wacban approved these changes Dec 23, 2024

View reviewed changes

jancionear added 5 commits January 4, 2025 05:20

Deduplicate the if

a8172a6

structured logging

083cae3

Assert that an oversized receipt is actually produced

0c476fc

Merge branch 'master' into jan_receipt_size_bug_mitigation

ff8c7b1

Fix merge conflict

48765df

jancionear requested a review from wacban January 5, 2025 14:12

nagisa approved these changes Jan 8, 2025

View reviewed changes

jancionear added 2 commits January 8, 2025 13:41

Rename OldReceipt to ExistingReceipt

c3aa372

Downgrade warning about oversized receipt to a debug log

aae20a6

jancionear added this pull request to the merge queue Jan 8, 2025

Merged via the queue into master with commit 83bf747 Jan 8, 2025
28 checks passed

jancionear deleted the jan_receipt_size_bug_mitigation branch January 8, 2025 14:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(runtime) - Mitigate the receipt size limit bug #12633

fix(runtime) - Mitigate the receipt size limit bug #12633

jancionear commented Dec 17, 2024 •

edited

Loading

codecov bot commented Dec 17, 2024 •

edited

Loading

wacban left a comment

wacban Dec 23, 2024

jancionear Jan 5, 2025

wacban Dec 23, 2024

jancionear Jan 5, 2025 •

edited

Loading

wacban Dec 23, 2024

wacban Dec 23, 2024

jancionear Jan 5, 2025

wacban Dec 23, 2024

jancionear Jan 5, 2025

wacban Dec 23, 2024

jancionear Jan 5, 2025

wacban Dec 23, 2024

jancionear Jan 5, 2025

nagisa Jan 8, 2025 •

edited

Loading

nagisa left a comment

nagisa Jan 8, 2025 •

edited

Loading

nagisa Jan 8, 2025

jancionear Jan 8, 2025

jancionear Jan 8, 2025

nagisa Jan 8, 2025

jancionear Jan 8, 2025

		/// Returns a value of size "value_size".
		/// Accepts json args, e.g {"value_size": 1000}

		#[allow(unused)]
		extern "C" {

		#[allow(unused)]
		extern "C" {

fix(runtime) - Mitigate the receipt size limit bug #12633

fix(runtime) - Mitigate the receipt size limit bug #12633

Conversation

jancionear commented Dec 17, 2024 • edited Loading

codecov bot commented Dec 17, 2024 • edited Loading

Codecov Report

wacban left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jancionear Jan 5, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nagisa Jan 8, 2025 • edited Loading

Choose a reason for hiding this comment

nagisa left a comment

Choose a reason for hiding this comment

nagisa Jan 8, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jancionear commented Dec 17, 2024 •

edited

Loading

codecov bot commented Dec 17, 2024 •

edited

Loading

jancionear Jan 5, 2025 •

edited

Loading

nagisa Jan 8, 2025 •

edited

Loading

nagisa Jan 8, 2025 •

edited

Loading