You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Several issues have come up related to data types that can be used in an define_table() command and how data is converted from Lua to osm2pgsql. In all these cases it is possible to use the generic sql_type mechanism of osm2pgsql to reach the intended goal, but it is a bit awkward, always needs explaining and exposes the user to possible errors which are hard to understand (cryptic error messages from COPY). It would be nice, if we can say: "Yes, osm2pgsql supports these common constructs out of the box".
Geography
Osm2pgsql has always supported the geometry datatype with its subtypes like Point, Linestring, etc. and the setting of the SRID. But it doesn't natively support geography data types with their variants. It is easy enough to work around this using the sql_type setting, but still a bit awkward. This has been discussed here.
Proposal: Create new datatypes geography, geography-point, geography-linestring, and so on. Default projection would be 4326. Generate an error if the projection is not valid for a geography data type.
Variant A: Also add geometry-point as alias for point geometry and so on, giving us a consistent naming scheme.
Implementation: Is trivial, just the new types need to be recognized and the different default for the projection. There is no need to write special WKB or so.
Identity
Unique IDs on tables are often useful. Sometimes they need to be generated. How to do this is documented in the manual. But this is a bit awkward and we always get questions on this and the use of the serial type.
Proposal: Create new types id2, id4, and id8 (with aliases smallid, id, and bigid) that create integer identity columns of the specified size with GENERATED ALWAYS AS IDENTITY. It will also set create_only to true and not_null, i.e. osm2pgsql will not try to fill this column.
Variant A: Use identity instead of id in the type names. Not so easy to confuse with int.
Variant B: Also automatically generate a unique index for all ID columns. Might be too "magic", and not every use case needs one.
Variant C: Add a sequence option, which allows setting the sequence name instead of creating a default one. For special use cases (say using the same ID space for several tables), users can do a CREATE SEQUENCE before running osm2pgsql and then refer to that.
Implementation: Only the new types need to be recognized and a few SQL templates extended. A bit more to do if a sequence can be set, but still easily done.
Arrays
PostgreSQL can store arrays of any type. We use this in osm2pgsql to store the list of member node IDs of a way, for instance. Currently you need to define them in Lua as sql_type = 'int8[]' etc. and then build the context as text yourself: nodes = '{' .. table.concat(object.nodes, ',') .. '}'. See also the question in discussions forum on this.
Proposal: Add new data types int-array and its variants for the different integer types and add an automatic conversion from Lua tables with integers. The user doesn't have to write the conversion any more and we can generate better error message if the data is invalid for the type.
Variant A: Allow scalar values in conversion, will result in a single-element array.
Variant B: Also add this for some other scalar types. The only ones that make sense are probably real and text.
Variant C: Use the PostgreSQL syntax int[] and its variants instead.
Variant D: Make this a general feature that works for all data types, i.e. adding [] makes any type into an array.
Implementation: Add the new data types and new conversion functions. Depends on the variant how much work this will be.
The text was updated successfully, but these errors were encountered:
Geography: Implementation is easy but the type definition looks somewhat hacky to me. I'd leave that be for now, as it is rather niche.
** Identity **: Shouldn't that be rather a sub-function of ids, i.e. it is another ID column next to the one based on OSM IDs. Maybe we need to rethink here how rows are identified.
** Arrays**: automatic array building from Lua tables sounds useful. I'd go for variant D and error out on types, where Posgresql doesn't support arrays.
Regarding the ids column: I don't think this is something that has to do with ids, because columns defined with ids are the ones that osm2pgsql uses for its own needs (when updating the data). They must be tied to something osm2pgsql knows, i.e. the OSM object type/id or a tile x/y for generalized data. Creating unique ids is not useful for osm2pgsql, it is only used downstream.
Several issues have come up related to data types that can be used in an
define_table()
command and how data is converted from Lua to osm2pgsql. In all these cases it is possible to use the genericsql_type
mechanism of osm2pgsql to reach the intended goal, but it is a bit awkward, always needs explaining and exposes the user to possible errors which are hard to understand (cryptic error messages fromCOPY
). It would be nice, if we can say: "Yes, osm2pgsql supports these common constructs out of the box".Geography
Osm2pgsql has always supported the geometry datatype with its subtypes like Point, Linestring, etc. and the setting of the SRID. But it doesn't natively support geography data types with their variants. It is easy enough to work around this using the
sql_type
setting, but still a bit awkward. This has been discussed here.Proposal: Create new datatypes
geography
,geography-point
,geography-linestring
, and so on. Default projection would be 4326. Generate an error if the projection is not valid for a geography data type.Variant A: Also add
geometry-point
as alias forpoint
geometry and so on, giving us a consistent naming scheme.Implementation: Is trivial, just the new types need to be recognized and the different default for the projection. There is no need to write special WKB or so.
Identity
Unique IDs on tables are often useful. Sometimes they need to be generated. How to do this is documented in the manual. But this is a bit awkward and we always get questions on this and the use of the
serial
type.Proposal: Create new types
id2
,id4
, andid8
(with aliasessmallid
,id
, andbigid
) that create integer identity columns of the specified size withGENERATED ALWAYS AS IDENTITY
. It will also setcreate_only
totrue
andnot_null
, i.e. osm2pgsql will not try to fill this column.Variant A: Use
identity
instead ofid
in the type names. Not so easy to confuse withint
.Variant B: Also automatically generate a unique index for all ID columns. Might be too "magic", and not every use case needs one.
Variant C: Add a
sequence
option, which allows setting the sequence name instead of creating a default one. For special use cases (say using the same ID space for several tables), users can do aCREATE SEQUENCE
before running osm2pgsql and then refer to that.Implementation: Only the new types need to be recognized and a few SQL templates extended. A bit more to do if a sequence can be set, but still easily done.
Arrays
PostgreSQL can store arrays of any type. We use this in osm2pgsql to store the list of member node IDs of a way, for instance. Currently you need to define them in Lua as
sql_type = 'int8[]'
etc. and then build the context as text yourself:nodes = '{' .. table.concat(object.nodes, ',') .. '}'
. See also the question in discussions forum on this.Proposal: Add new data types
int-array
and its variants for the different integer types and add an automatic conversion from Lua tables with integers. The user doesn't have to write the conversion any more and we can generate better error message if the data is invalid for the type.Variant A: Allow scalar values in conversion, will result in a single-element array.
Variant B: Also add this for some other scalar types. The only ones that make sense are probably
real
andtext
.Variant C: Use the PostgreSQL syntax
int[]
and its variants instead.Variant D: Make this a general feature that works for all data types, i.e. adding
[]
makes any type into an array.Implementation: Add the new data types and new conversion functions. Depends on the variant how much work this will be.
The text was updated successfully, but these errors were encountered: