Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improvements to send_dataframe API #8619

Open
jleibs opened this issue Jan 8, 2025 · 1 comment
Open

Improvements to send_dataframe API #8619

jleibs opened this issue Jan 8, 2025 · 1 comment
Labels
enhancement New feature or request feat-dataframe-api Everything related to the dataframe API 🐍 Python API Python logging API

Comments

@jleibs
Copy link
Member

jleibs commented Jan 8, 2025

The send_dataframe API depends on arrow metadata tags to figure out the types of each columns.

However, this means adding another user-step of converting to a metadata-preserving format and then manually applying all of the correct tags.

We should try to reduce this friction where possible

Better timeline inference

Unlike arrow/polars/datafusion dataframes, which are pure tables with uniform columns, Pandas dataframes have a concrete index, which we could always map to a timeline of the corresponding name.

Entity/Component Tagging

As for the other columns, it would still be helpful to provide some way of informing Rerun of the entity/component for each columns. where augmenting the arrow metadata may be non-trivial.

This could maybe look something like:

df_components = [rr.Position3D, rr.Colors, "user.Confidence"]

rr.send_dataframe(df, components=df_components)

where the components arrray must match the number of columns in the dataframe.

Or maybe with an object-helper similar to AnyValues:

rr.send_dataframe(rr.TaggedDataframe(df, df_components))
@jleibs jleibs added enhancement New feature or request feat-dataframe-api Everything related to the dataframe API 🐍 Python API Python logging API labels Jan 8, 2025
@Famok
Copy link

Famok commented Jan 9, 2025

I would love to see this!

Maybe you could also just assume its a timeseries when the values are numeric when no components are given or when one component is given (but more cols) then apply that type to every column (same as radii and colors do for some archetypes)

Concerning the time column, you could solve it with:
rr.send_dataframe(df, components=df_components, time_column = "index" ) # with index as default

about the components: I'd prefer a dictionary, so that one can skip columns or they can change in order, e.g. for a df with cols = ['xyz', 'colors']:
df_components = {'xyz': rr.Position3D, 'colors': rr.Colors}
(maybe this could also cover the time column by adding {'index': rr.Time} given rr.Time exists)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request feat-dataframe-api Everything related to the dataframe API 🐍 Python API Python logging API
Projects
None yet
Development

No branches or pull requests

2 participants