Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal for new data types #2274

Open
joto opened this issue Dec 9, 2024 · 2 comments
Open

Proposal for new data types #2274

joto opened this issue Dec 9, 2024 · 2 comments

Comments

@joto
Copy link
Collaborator

joto commented Dec 9, 2024

Several issues have come up related to data types that can be used in an define_table() command and how data is converted from Lua to osm2pgsql. In all these cases it is possible to use the generic sql_type mechanism of osm2pgsql to reach the intended goal, but it is a bit awkward, always needs explaining and exposes the user to possible errors which are hard to understand (cryptic error messages from COPY). It would be nice, if we can say: "Yes, osm2pgsql supports these common constructs out of the box".

Geography

Osm2pgsql has always supported the geometry datatype with its subtypes like Point, Linestring, etc. and the setting of the SRID. But it doesn't natively support geography data types with their variants. It is easy enough to work around this using the sql_type setting, but still a bit awkward. This has been discussed here.

Proposal: Create new datatypes geography, geography-point, geography-linestring, and so on. Default projection would be 4326. Generate an error if the projection is not valid for a geography data type.

Variant A: Also add geometry-point as alias for point geometry and so on, giving us a consistent naming scheme.

Implementation: Is trivial, just the new types need to be recognized and the different default for the projection. There is no need to write special WKB or so.

Identity

Unique IDs on tables are often useful. Sometimes they need to be generated. How to do this is documented in the manual. But this is a bit awkward and we always get questions on this and the use of the serial type.

Proposal: Create new types id2, id4, and id8 (with aliases smallid, id, and bigid) that create integer identity columns of the specified size with GENERATED ALWAYS AS IDENTITY. It will also set create_only to true and not_null, i.e. osm2pgsql will not try to fill this column.

Variant A: Use identity instead of id in the type names. Not so easy to confuse with int.

Variant B: Also automatically generate a unique index for all ID columns. Might be too "magic", and not every use case needs one.

Variant C: Add a sequence option, which allows setting the sequence name instead of creating a default one. For special use cases (say using the same ID space for several tables), users can do a CREATE SEQUENCE before running osm2pgsql and then refer to that.

Implementation: Only the new types need to be recognized and a few SQL templates extended. A bit more to do if a sequence can be set, but still easily done.

Arrays

PostgreSQL can store arrays of any type. We use this in osm2pgsql to store the list of member node IDs of a way, for instance. Currently you need to define them in Lua as sql_type = 'int8[]' etc. and then build the context as text yourself: nodes = '{' .. table.concat(object.nodes, ',') .. '}'. See also the question in discussions forum on this.

Proposal: Add new data types int-array and its variants for the different integer types and add an automatic conversion from Lua tables with integers. The user doesn't have to write the conversion any more and we can generate better error message if the data is invalid for the type.

Variant A: Allow scalar values in conversion, will result in a single-element array.

Variant B: Also add this for some other scalar types. The only ones that make sense are probably real and text.

Variant C: Use the PostgreSQL syntax int[] and its variants instead.

Variant D: Make this a general feature that works for all data types, i.e. adding [] makes any type into an array.

Implementation: Add the new data types and new conversion functions. Depends on the variant how much work this will be.

@lonvia
Copy link
Collaborator

lonvia commented Jan 6, 2025

My 2 cents:

Geography: Implementation is easy but the type definition looks somewhat hacky to me. I'd leave that be for now, as it is rather niche.

** Identity **: Shouldn't that be rather a sub-function of ids, i.e. it is another ID column next to the one based on OSM IDs. Maybe we need to rethink here how rows are identified.

** Arrays**: automatic array building from Lua tables sounds useful. I'd go for variant D and error out on types, where Posgresql doesn't support arrays.

@joto
Copy link
Collaborator Author

joto commented Jan 6, 2025

Regarding the ids column: I don't think this is something that has to do with ids, because columns defined with ids are the ones that osm2pgsql uses for its own needs (when updating the data). They must be tied to something osm2pgsql knows, i.e. the OSM object type/id or a tile x/y for generalized data. Creating unique ids is not useful for osm2pgsql, it is only used downstream.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants