Schema and data format changes are a necessary evil of evolving software. We take changes seriously and make them infrequently and only when necessary.
Chroma's commitment is whenever schema or data format change, we will provide a seamless and easy-to-use migration tool to move to the new schema/format.
Specifically we will announce schema changes on:
- Discord (#migrations channel)
- Github (here)
- Email listserv Sign up
We will aim to provide:
- a description of the change and the rationale for the change.
- a CLI migration tool you can run
- a video walkthrough of using the tool
We are migrating:
metadata store
: where metadata is storedindex on disk
: how indexes are stored on disk
Metadata Store
: Previously Chroma used underlying storage engines DuckDB
for the in-memory
version of Chroma, and Clickhouse
for the single-node server
version of Chroma. These decisions were made when Chroma was addressing more batch analytical workloads and are no longer the best choice for users. The new metadata store for the in-memory
and single-node server
version of Chroma will be sqlite
. (The distributed version of Chroma (forthcoming), will use a different distributed metadata store.)
Index store
: Previously Chroma saved the entire index on every write. This because painfully slow when the collection grew to a reasonable amount of embeddings. The new index store saves only the change and should scale seamlessly!
Here are the 9-possible migration paths, and any notes, if applicable.
From 👇 ➡️ To 👉 | Persistent Chroma | Local Chroma Server | Remote Chroma Server |
---|---|---|---|
Persistent Chroma | ✅ | ✅ | 1️⃣ |
Local Chroma Server | ✅ | 2️⃣ | 1️⃣ |
Remote Chroma Server | ✅ | ✅ | 1️⃣ 2️⃣ |
1️⃣ - Make sure to configure any auth headers correctly
2️⃣ - Run both the existing version of Chroma and the new 0.4.0
version of Chroma at same time. Run the new version on a new port if local.
-
pip
install this utility.pip install chroma_migrate
-
Running the CLI. In your terminal run:
chroma_migrate
-
Choose whether the data you want to migrate is locally on disk (duckdb) on clickhouse instance used by chroma, or directly from another chroma server
-
Choose where you want to write the new data to.
Run python main.py to test locally