Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve API #18

Open
jethrogb opened this issue Aug 6, 2016 · 9 comments
Open

Improve API #18

jethrogb opened this issue Aug 6, 2016 · 9 comments

Comments

@jethrogb
Copy link
Contributor

jethrogb commented Aug 6, 2016

I would like to build a function

fn read_headers<'a,'b,R: BufRead>(
        stream: &mut R,
        buf: &'b mut Vec<u8>,
        headers: &'a mut [Header<'b>]
    ) -> Result<Request<'a,'b>,E>

that reads as much from stream into buf as necessary to get a Complete return from Request::parse. This turns out to not be trivial and require lots of extra allocations and work.

Here's what I came up with:

fn read_headers<'a,'b,R: BufRead>(clnt: &mut R, buf: &'b mut Vec<u8>, headers: &'a mut [Header<'b>]) -> Result<Request<'a,'b>,String> {
    fn extend_and_parse<R: BufRead>(clnt: &mut R, headers: &mut [Header]) -> Result<Vec<u8>,String> {
        let mut buf=Vec::<u8>::new();
        let len=headers.len();
        loop {
            let buf_orig_len=buf.len();
            let additional_len={
                let additional=try!(clnt.fill_buf().map_err(|e|e.to_string()));
                buf.extend_from_slice(additional);
                additional.len()
            };
            let mut headers=Vec::with_capacity(len);
            headers.resize(len,httparse::EMPTY_HEADER);
            let mut req=Request::new(&mut headers);
            match req.parse(&buf) {
                Ok(httparse::Status::Complete(n)) => {
                    clnt.consume(n-buf_orig_len);
                    break
                },
                Ok(httparse::Status::Partial) => {
                    clnt.consume(additional_len);
                }
                Err(e) => return Err(format!("HTTP parse error {:?}",e)),
            };
        }
        Ok(buf)
    }

    let result=extend_and_parse(clnt,headers);
    result.map(move|nb|{
        ::core::mem::replace(buf,nb);
        let mut req=Request::new(headers);
        req.parse(buf);
        req
    })
}

The main issues are having to allocate a new array of temporary headers for every iteration, and having to parse the succesful result twice.

I think this is partially Rust's fault, but also partially httparse for having a not so great API. For example, the lifetime of everything is fixed upon Request creation, so that a parse failure doesn't release the borrow of the input buffer.

@seanmonstar
Copy link
Owner

I agree that the lifetime situation feels difficult. If I could fix anything, I'd fix the feeling of trying to use httparse.

As for your specific example, I don't believe allocationg a Vec of headers is necessary, unless you need to at runtime decide how many is the maximum allowed.

@jethrogb
Copy link
Contributor Author

jethrogb commented Aug 9, 2016

As for your specific example, I don't believe allocationg a Vec of headers is necessary, unless you need to at runtime decide how many is the maximum allowed.

You need a buffer that's as big as the input slice headers. You can't do this statically until we get integer generics.

@seanmonstar
Copy link
Owner

Sure. Who provides the headers slice? I'm actually quite curious at this design, because I haven't experienced it both in sync hyper and async hyper, and I haven't seen others use it that way in other mini http implementations.

For reference, this is how hyper 0.9 would keep reading into a buffer synchronously and try to parse again: https://github.com/hyperium/hyper/blob/0.9.x/src/http/h1.rs#L869-L899

@jethrogb
Copy link
Contributor Author

jethrogb commented Aug 10, 2016

I'm actually quite curious at this design, because I haven't experienced it both in sync hyper and async hyper, and I haven't seen others use it that way in other mini http implementations.

For reference, this is how hyper 0.9 would keep reading into a buffer synchronously and try to parse again: https://github.com/hyperium/hyper/blob/0.9.x/src/http/h1.rs#L869-L899

This looks like it basically runs the same loop as me above, except with a fixed max headers, which means you're exactly avoiding the issues I'm running into. I'd like the caller to specify the max headers as well as avoid large additional allocations (either on the heap or stack) during the parsing. Also I'd like to keep the httparse Request representation, in hyper it you're just interpreting the Request directly before returning.

@seanmonstar
Copy link
Owner

Been thinking about this, what if Headers were not an array of slices, but array of offsets? It'd remove the lifetimes, but require you to slice the buffer yourself with the offsets to get the name and value.

@seanmonstar
Copy link
Owner

Another option is to make the parse function to update the lifetime of the Request. This might require an unsafe transmute, which would require that parse ensures all of the fields of Request, include the headers array, are reset, to prevent any dangling references.

@jethrogb
Copy link
Contributor Author

What about merging new and parse into one associated function?

fn parse(headers: &'h mut [Header<'b>], buf: &'b [u8]) -> Result<(usize,Request<'h, 'b>)>

You'd need to change the Status enum as well...

@seanmonstar
Copy link
Owner

That could be done, but it still ties the lifetime of the Header slice to the buf, and it makes the Status trickier, as you said.

I think a reset() method can be added to Request, that will empty all the fields and reset the Header slice, and update it's lifetime to allow a new buffer to be used.

@adriangb
Copy link

adriangb commented Nov 7, 2022

I wrote a little Python wrapper for this library. From the perspective of writing a Python wrapper with PyO3 the lifetime on Headers is problematic.

If you're using the flow of "try to parse, if not tell the Python caller to come back with more data" you have to re-allocate headers every time because you can't persist anything with a Header in it (including the buffer) to Python because of the lifetime (PyO3 can't hand off lifetimes to Python).

The lifetime requirement on the data is not a problem, PyO3 let's you access the underlaying memory without copying stuff.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants