Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create geoclip_embedding_function.py #3353

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

Latticeworks1
Copy link

This PR introduces a GeoClipEmbeddingFunction to Chroma, enabling the creation of embeddings from geographic coordinates (latitude and longitude). It supports various input formats (strings, lists, and dictionaries) and includes robust error handling and logging.

Description of changes

This PR adds a GeoClipEmbeddingFunction to Chroma, enabling the creation of embeddings from geographic coordinates (latitude and longitude). It supports string, list, and dictionary input formats and includes robust error handling.

Test plan

The changes are covered by unit tests using pytest. The tests verify:

Successful embedding generation for valid "lat,lon" strings, [lat, lon] lists, and {"latitude": lat, "longitude": lon} dictionaries.
Correct handling of edge cases, such as coordinates at the poles and the antimeridian.
Proper error handling for invalid input formats (e.g., incorrect number of values, non-numeric values) and out-of-range coordinates.
Logging of warnings for invalid inputs.
Device handling (CPU/CUDA) is tested where applicable.

Documentation Changes

Yes, we need to make changes.

Purpose:

The GeoClipEmbeddingFunction allows you to create embeddings from geographic coordinates (latitude and longitude) using the GeoCLIP model. These embeddings can then be used within Chroma for various geospatial applications, such as:

Similarity Search: Find locations that are geographically close to a given query location.
Clustering: Group similar locations together based on their geographic proximity.
Geographic Data Analysis: Perform analysis on datasets with geographic components, leveraging the semantic understanding of location encoded by GeoCLIP.
GeoCLIP is a CLIP-inspired model that aligns locations and images, providing a rich representation of geographic space. By using GeoClipEmbeddingFunction, you can bring this powerful model's capabilities into Chroma.

To use the GeoClipEmbeddingFunction, you first need to install the geoclip and torch Python packages:

Bash

pip install geoclip torch
Then, you can instantiate the embedding function and use it to generate embeddings from geographic coordinates. The function supports three input formats:

String: A string in the format "latitude,longitude" (e.g., "37.7749,-122.4194").
List: A list containing two floats: [latitude, longitude] (e.g., [37.7749, -122.4194]).
Dictionary: A dictionary with "latitude" and "longitude" keys (e.g., {"latitude": 37.7749, "longitude": -122.4194}).
When instantiating the GeoClipEmbeddingFunction, you can optionally specify the device to use for computation ('cpu' or 'cuda'). If no device is provided, the function will automatically attempt to use a CUDA-enabled GPU if available and fall back to CPU otherwise.

This PR introduces a GeoClipEmbeddingFunction to Chroma, enabling the creation of embeddings from geographic coordinates (latitude and longitude). It supports various input formats (strings, lists, and dictionaries) and includes robust error handling and logging.
Copy link

Reviewer Checklist

Please leverage this checklist to ensure your code review is thorough before approving

Testing, Bugs, Errors, Logs, Documentation

  • Can you think of any use case in which the code does not behave as intended? Have they been tested?
  • Can you think of any inputs or external events that could break the code? Is user input validated and safe? Have they been tested?
  • If appropriate, are there adequate property based tests?
  • If appropriate, are there adequate unit tests?
  • Should any logging, debugging, tracing information be added or removed?
  • Are error messages user-friendly?
  • Have all documentation changes needed been made?
  • Have all non-obvious changes been commented?

System Compatibility

  • Are there any potential impacts on other parts of the system or backward compatibility?
  • Does this change intersect with any items on our roadmap, and if so, is there a plan for fitting them together?

Quality

  • Is this code of a unexpectedly high quality (Readability, Modularity, Intuitiveness)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant