-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
read in date-time columns #43
Comments
I think we should definitely handle dates and date times. Internally, we'd have to represent date times as some sort of collection columns that break down the components. For example, we'd have
Then there is the cyclic nature of dates and times. sunday is close to saturday, but sunday = 0 and saturday = 6. We should think about how we would represent this. We also need to represent it so that it can be exactly converted back into a date(time). Please add suggestions. If we need to add a new model (e.g. cyclic) to make this work, feel free to propose it. We can add another issue for that. |
Hi! I came here to create a similar issue around date-time columns but I think this captures it. I've been experimenting with the Lace package and I'm really enjoying it. Most of the data I work with is time-series sensor data so am looking for a recommendation on how to optimally prepare that data for Lace. Is it better to leave as is (categorical as noted above), convert to a sequential integer index, or to break out into several features like augment_timeseries_signature or tsfresh? Maybe it all depends on my use-case but wanted to get your thoughts. Thanks! |
Hi @joshualeond - glad you're enjoying lace! The rows of the table are modeled as independent observations, so the way we typically do timeseries is by keeping a certain amount of history and lookahead in the columns. For example, for sensor data a row might look like this
here i've used n to represent the number of timesteps back and m for the number of timesteps forward. You can of course use whatever granularity of data you like. The best way to represent a datetime depends on your application. You might represent it as the number of hours since an experiment started, or you can break it into several features depending on what components of the datetime share information with the things you're interested in. You could do a categorical day of the week or a float proportion of the week. It all depends. If the cyclic nature of days/weeks/months/years is important, you can use sin and cos on the proportion*2pi. |
A lot of data have date (or data-time) columns. Right now lace treats it as categorical (as they are str). This is not ideal both in terms of the number of date/time it can represent and in terms of its actual semantics (more like a continuous variable).
The text was updated successfully, but these errors were encountered: