Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

<regex>: Fix integer overflow in _Buf and implement geometric buffer expansion #5175

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

muellerj2
Copy link
Contributor

Fixes an undetected integer overflow in _Buf::_Insert(_Elem) when increasing the buffer size.

It's unlikely anyone ever ran into this bug in practice, since the overflow could only happen following about $2^{28}$ reallocations, but if one were to wait long enough, _Buf would write to and read from unallocated memory.

Since fixing the overflow bug meant rewriting the size calculations anyway, I also quickly added three lines to implement geometric expansion of the buffer, ensuring that inserting a new character runs in amortized constant rather than linear time. The selected growth factor is 1.5, same as vector's. (The geometric expansion kicks in when more than 48 characters are inserted into a character buffer. I think most practically used regular expressions don't even get close to adding 48 characters to one of these buffers.)

Finally, this PR makes _Buf throw a regex_error with error code error_space on allocation failure, which I think is the more appropriate exception when running out of memory while parsing a regular expression and building the corresponding NFA. But feel free to change this to bad_alloc etc. if you prefer these exceptions instead.

No tests added since they would require about 8 GB of virtual memory and run for several minutes, but here is a small x64 test program to see that overflow is handled correctly now:

#include <cassert>
#include <regex>
#include <string>

using namespace std;

int main() {
    {
        string s(static_cast<size_t>(static_cast<unsigned int>(-1)), 'a');
        regex regex(s);
    }

    {
        string s(static_cast<size_t>(static_cast<unsigned int>(-1)) + 1, 'a');
        try {
            regex regex(s);
            assert(false);
        } catch (const regex_error& ex) {
            assert(ex.code() == regex_constants::error_space);
        }
    }
    return 0;
}

@muellerj2 muellerj2 requested a review from a team as a code owner December 9, 2024 13:58
@@ -1257,9 +1257,9 @@ struct _Buf { // character buffer
return _Chrs;
}

void _Insert(_Elem _Ch) { // append _Ch
void _Insert2(_Elem _Ch) { // append _Ch
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think MSVC STL prefers _v2 suffix over 2. A plain 2 seemingly introduces some semantic implication, although it should not in this case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't mind either way, but the _v2 suffix is not the style that has been used in <regex> for new versions up till now. See Casey's #131 and all the other occurrences of functions ending in 1 or 2 in this file (although I have to admit I'm not following this precedent strictly as well because I find it weird to label the second version of a function by appending 1). On the other hand, there are no functions ending in _vxxx.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've got examples of both, e.g. _Stodx_v3 and _Orphan_all_unlocked_v3. I'm fine with either.

@StephanTLavavej StephanTLavavej added the bug Something isn't working label Dec 9, 2024
@StephanTLavavej StephanTLavavej self-assigned this Dec 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: Initial Review
Development

Successfully merging this pull request may close these issues.

3 participants