BorshSchema vs custom serialisation #211

mina86 · 2023-08-31T13:12:31Z

Say I’d like to use varint in borsh. Or have a custom SmallVec type which is encoded with 8-bit length rather than 32-bit length.

This is easy enough to do by implementing custom BorshSerialize and BorshDeserialize. However, BorshSchema becomes an issue. Varint could be modelled as a nested enum with 256 variants. Similarly SmallVec could be modeled as an enum with 256 variants each being an array. That’s hardly a clean solution though.

Do you guys have any thoughts on that?

frol · 2023-08-31T13:36:12Z

I would avoid expanding the scope of borsh spec with varint/smallvec specializations. I would treat these types as application-specific ones and leave app developers to optimize their custom types on their end.

mina86 · 2023-08-31T13:58:27Z

So my question is how do I implement BorshSchema for such type? There’s no Definition for an application-specific encoding. The options seems to be:

Accept there’s not going to be impl BorshSchema.
Add Declaration for the type but don’t provide Definition for it.
Do extremely hacky stuff with Definition::Enum.

Perhaps it would make sense to have Definition::AppSpecific with some at least rudimentary description of the format (e.g. min and max encoded length). For varint for example this would mean a definition "VarInt<u32>" → Definition::AppSpecific(1..5).

I think this also maybe relates to #181. Perhaps it would make sense to extend Sequence and Enum by adding length_size and tag_size fields respectively? So currently we’d have Sequence { length_size: 4, elements: ... } and Enum { tag_size: 1, variants: ... }. This would allow expressing smallvec and enums with different tag representation.

dj8yfo · 2023-09-03T13:05:27Z

A vector of varints Vec<VarInt> can be serialized as Vec<u8> first and then presented as that to borsh, if the need for compression, that varint provides, is required.
The info about total num of VarInt-s will be lost, the info about total bytes - not. So it will look like a Sequence { elements: "u8".to_string() } with respect to schema.

It's about the same with rust's String at the moment. A String is essentially a Vec<VarInt>. It's serialized as Vec<u8> with info about total characters lost in serialized form, and having a "string" Declaration for itself and empty Definition. (second option in comment )

Similarly to String, one can define a type VarintsVec(Vec<VarInt>), serialize and deserialize the contents as Vec<u8>, with error checking during deserialization (about the lengths of encountered varints), and define BorshSchema as special "varint_vector" Declaration and empty Definition.

A SmallVec type will on average be 127 bytes long (with minimal nonzero length of a type defined as 1 byte according to #209 ), and defining header_size field in Definition::Sequence for the gain of 3 bytes less spent on header of an average ~120 bytes payload doesn't appear a big gain compared to just using Vec.

mina86 · 2023-09-03T14:54:31Z

It's about the same with rust's String at the moment. A String is essentially a Vec<VarInt>. It's serialized as Vec<u8> with info about total characters lost in serialized form, and having a string Declaration for itself and empty Definition.

That’s not quite the same though. In String case, I can deserialise Vec<u8> and then convert it with no additional allocations to String. With Vec<VarInt> I’d have to first deserialise Vec<u8> and then allocate a new (say) Vec<VarInt<u32>>.

However, this is a bit besides the point. Of course, I can always write serialisation which can be described by BorshSchema. The question is what to do when serialisation I’m using cannot be described by BorschSchema.

frol added the question Further information is requested label Aug 31, 2023

dj8yfo mentioned this issue Sep 3, 2023

feat: Introduce max_serialized_size function #209

Merged

This was referenced Sep 22, 2023

feat!: add length_width to schema::Definition::Sequence #228

Closed

feat!: add length_width to schema::Definition::Sequence #229

Merged

dj8yfo closed this as completed in #229 Sep 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BorshSchema vs custom serialisation #211

BorshSchema vs custom serialisation #211

mina86 commented Aug 31, 2023

frol commented Aug 31, 2023

mina86 commented Aug 31, 2023 •

edited

Loading

dj8yfo commented Sep 3, 2023 •

edited

Loading

mina86 commented Sep 3, 2023

BorshSchema vs custom serialisation #211

BorshSchema vs custom serialisation #211

Comments

mina86 commented Aug 31, 2023

frol commented Aug 31, 2023

mina86 commented Aug 31, 2023 • edited Loading

dj8yfo commented Sep 3, 2023 • edited Loading

mina86 commented Sep 3, 2023

mina86 commented Aug 31, 2023 •

edited

Loading

dj8yfo commented Sep 3, 2023 •

edited

Loading