Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

an attempt to use DeterministicFakeEmbedding with pinecone vectorstore fails with PineconeApiTypeError #28996

Open
5 tasks done
MichaelSkralivetsky opened this issue Jan 2, 2025 · 0 comments
Labels
Ɑ: vector store Related to vector store module

Comments

@MichaelSkralivetsky
Copy link

MichaelSkralivetsky commented Jan 2, 2025

Checked other resources

  • I added a very descriptive title to this issue.
  • I searched the LangChain documentation with the integrated search.
  • I used the GitHub search to find a similar question and didn't find it.
  • I am sure that this is a bug in LangChain rather than my code.
  • The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).

Example Code

import time
from pinecone import Pinecone, ServerlessSpec

pinecone_api_key = "mykey"
pc = Pinecone(api_key=pinecone_api_key)

index_name = "langchain-test-index"

existing_indexes = [index_info["name"] for index_info in pc.list_indexes()]

if index_name not in existing_indexes:
    pc.create_index(
        name=index_name,
        dimension=4096,
        metric="cosine",
        spec=ServerlessSpec(cloud="aws", region="us-east-1"),
    )
    while not pc.describe_index(index_name).status["ready"]:
        time.sleep(1)

from langchain_core.embeddings import DeterministicFakeEmbedding
embeddings = DeterministicFakeEmbedding(size=4096)

from langchain_pinecone import PineconeVectorStore

index = pc.Index(index_name)
vector_store = PineconeVectorStore(index=index, embedding=embeddings)

from uuid import uuid4

from langchain_core.documents import Document

document_1 = Document(
    page_content="I had chocalate chip pancakes and scrambled eggs for breakfast this morning.",
    metadata={"source": "tweet"},
)

documents = [
    document_1
]
uuids = [str(uuid4()) for _ in range(len(documents))]

vector_store.add_documents(documents=documents, ids=uuids)

Error Message and Stack Trace (if applicable)

PineconeApiTypeError                      Traceback (most recent call last)
Cell In[9], line 15
     10 documents = [
     11     document_1
     12 ]
     13 uuids = [str(uuid4()) for _ in range(len(documents))]
---> 15 vector_store.add_documents(documents=documents, ids=uuids)

File ~/.pythonlibs/mlrun-base/lib/python3.9/site-packages/langchain_core/vectorstores/base.py:287, in VectorStore.add_documents(self, documents, **kwargs)
    285     texts = [doc.page_content for doc in documents]
    286     metadatas = [doc.metadata for doc in documents]
--> 287     return self.add_texts(texts, metadatas, **kwargs)
    288 msg = (
    289     f"`add_documents` and `add_texts` has not been implemented "
    290     f"for {self.__class__.__name__} "
    291 )
    292 raise NotImplementedError(msg)

File ~/.pythonlibs/mlrun-base/lib/python3.9/site-packages/langchain_pinecone/vectorstores.py:283, in PineconeVectorStore.add_texts(self, texts, metadatas, ids, namespace, batch_size, embedding_chunk_size, async_req, id_prefix, **kwargs)
    280 vector_tuples = zip(chunk_ids, embeddings, chunk_metadatas)
    281 if async_req:
    282     # Runs the pinecone upsert asynchronously.
--> 283     async_res = [
    284         self._index.upsert(
    285             vectors=batch_vector_tuples,
    286             namespace=namespace,
    287             async_req=async_req,
    288             **kwargs,
    289         )
    290         for batch_vector_tuples in batch_iterate(batch_size, vector_tuples)
    291     ]
    292     [res.get() for res in async_res]
    293 else:

File ~/.pythonlibs/mlrun-base/lib/python3.9/site-packages/langchain_pinecone/vectorstores.py:284, in <listcomp>(.0)
    280 vector_tuples = zip(chunk_ids, embeddings, chunk_metadatas)
    281 if async_req:
    282     # Runs the pinecone upsert asynchronously.
    283     async_res = [
--> 284         self._index.upsert(
    285             vectors=batch_vector_tuples,
    286             namespace=namespace,
    287             async_req=async_req,
    288             **kwargs,
    289         )
    290         for batch_vector_tuples in batch_iterate(batch_size, vector_tuples)
    291     ]
    292     [res.get() for res in async_res]
    293 else:

File ~/.pythonlibs/mlrun-base/lib/python3.9/site-packages/pinecone/utils/error_handling.py:11, in validate_and_convert_errors.<locals>.inner_func(*args, **kwargs)
      8 @wraps(func)
      9 def inner_func(*args, **kwargs):
     10     try:
---> 11         return func(*args, **kwargs)
     12     except MaxRetryError as e:
     13         if isinstance(e.reason, ProtocolError):

File ~/.pythonlibs/mlrun-base/lib/python3.9/site-packages/pinecone/data/index.py:175, in Index.upsert(self, vectors, namespace, batch_size, show_progress, **kwargs)
    168     raise ValueError(
    169         "async_req is not supported when batch_size is provided."
    170         "To upsert in parallel, please follow: "
    171         "https://docs.pinecone.io/docs/insert-data#sending-upserts-in-parallel"
    172     )
    174 if batch_size is None:
--> 175     return self._upsert_batch(vectors, namespace, _check_type, **kwargs)
    177 if not isinstance(batch_size, int) or batch_size <= 0:
    178     raise ValueError("batch_size must be a positive integer")

File ~/.pythonlibs/mlrun-base/lib/python3.9/site-packages/pinecone/data/index.py:206, in Index._upsert_batch(self, vectors, namespace, _check_type, **kwargs)
    201 args_dict = self._parse_non_empty_args([("namespace", namespace)])
    202 vec_builder = lambda v: VectorFactory.build(v, check_type=_check_type)
    204 return self._vector_api.upsert(
    205     UpsertRequest(
--> 206         vectors=list(map(vec_builder, vectors)),
    207         **args_dict,
    208         _check_type=_check_type,
    209         **{k: v for k, v in kwargs.items() if k not in _OPENAPI_ENDPOINT_PARAMS},
    210     ),
    211     **{k: v for k, v in kwargs.items() if k in _OPENAPI_ENDPOINT_PARAMS},
    212 )

File ~/.pythonlibs/mlrun-base/lib/python3.9/site-packages/pinecone/data/index.py:202, in Index._upsert_batch.<locals>.<lambda>(v)
    194 def _upsert_batch(
    195     self,
    196     vectors: Union[List[Vector], List[tuple], List[dict]],
   (...)
    199     **kwargs,
    200 ) -> UpsertResponse:
    201     args_dict = self._parse_non_empty_args([("namespace", namespace)])
--> 202     vec_builder = lambda v: VectorFactory.build(v, check_type=_check_type)
    204     return self._vector_api.upsert(
    205         UpsertRequest(
    206             vectors=list(map(vec_builder, vectors)),
   (...)
    211         **{k: v for k, v in kwargs.items() if k in _OPENAPI_ENDPOINT_PARAMS},
    212     )

File ~/.pythonlibs/mlrun-base/lib/python3.9/site-packages/pinecone/data/vector_factory.py:26, in VectorFactory.build(item, check_type)
     24     return item
     25 elif isinstance(item, tuple):
---> 26     return VectorFactory._tuple_to_vector(item, check_type)
     27 elif isinstance(item, Mapping):
     28     return VectorFactory._dict_to_vector(item, check_type)

File ~/.pythonlibs/mlrun-base/lib/python3.9/site-packages/pinecone/data/vector_factory.py:42, in VectorFactory._tuple_to_vector(item, check_type)
     38     raise ValueError(
     39         "Sparse values are not supported in tuples. Please use either dicts or Vector objects as inputs."
     40     )
     41 else:
---> 42     return Vector(
     43         id=id,
     44         values=convert_to_list(values),
     45         metadata=metadata or {},
     46         _check_type=check_type,
     47     )

File ~/.pythonlibs/mlrun-base/lib/python3.9/site-packages/pinecone/core/openapi/shared/model_utils.py:33, in convert_js_args_to_python_args.<locals>.wrapped_init(_self, *args, **kwargs)
     31 if spec_property_naming:
     32     kwargs = change_keys_js_to_python(kwargs, _self if isinstance(_self, type) else _self.__class__)
---> 33 return fn(_self, *args, **kwargs)

File ~/.pythonlibs/mlrun-base/lib/python3.9/site-packages/pinecone/core/openapi/data/model/vector.py:289, in Vector.__init__(self, id, values, *args, **kwargs)
    286 self._visited_composed_classes = _visited_composed_classes + (self.__class__,)
    288 self.id = id
--> 289 self.values = values
    290 for var_name, var_value in kwargs.items():
    291     if (
    292         var_name not in self.attribute_map
    293         and self._configuration is not None
   (...)
    296     ):
    297         # discard variable.

File ~/.pythonlibs/mlrun-base/lib/python3.9/site-packages/pinecone/core/openapi/shared/model_utils.py:156, in OpenApiModel.__setattr__(self, attr, value)
    154 def __setattr__(self, attr, value):
    155     """set the value of an attribute using dot notation: `instance.attr = val`"""
--> 156     self[attr] = value

File ~/.pythonlibs/mlrun-base/lib/python3.9/site-packages/pinecone/core/openapi/shared/model_utils.py:432, in ModelNormal.__setitem__(self, name, value)
    429     self.__dict__[name] = value
    430     return
--> 432 self.set_attribute(name, value)

File ~/.pythonlibs/mlrun-base/lib/python3.9/site-packages/pinecone/core/openapi/shared/model_utils.py:132, in OpenApiModel.set_attribute(self, name, value)
    129     raise PineconeApiTypeError(error_msg, path_to_item=path_to_item, valid_classes=(str,), key_type=True)
    131 if self._check_type:
--> 132     value = validate_and_convert_types(
    133         value,
    134         required_types_mixed,
    135         path_to_item,
    136         self._spec_property_naming,
    137         self._check_type,
    138         configuration=self._configuration,
    139     )
    140 if (name,) in self.allowed_values:
    141     check_allowed_values(self.allowed_values, (name,), value)

File ~/.pythonlibs/mlrun-base/lib/python3.9/site-packages/pinecone/core/openapi/shared/model_utils.py:1489, in validate_and_convert_types(input_value, required_types_mixed, path_to_item, spec_property_naming, _check_type, configuration)
   1487         inner_path = list(path_to_item)
   1488         inner_path.append(index)
-> 1489         input_value[index] = validate_and_convert_types(
   1490             inner_value,
   1491             inner_required_types,
   1492             inner_path,
   1493             spec_property_naming,
   1494             _check_type,
   1495             configuration=configuration,
   1496         )
   1497 elif isinstance(input_value, dict):
   1498     if input_value == {}:
   1499         # allow an empty dict

File ~/.pythonlibs/mlrun-base/lib/python3.9/site-packages/pinecone/core/openapi/shared/model_utils.py:1453, in validate_and_convert_types(input_value, required_types_mixed, path_to_item, spec_property_naming, _check_type, configuration)
   1451         return converted_instance
   1452     else:
-> 1453         raise get_type_error(input_value, path_to_item, valid_classes, key_type=False)
   1455 # input_value's type is in valid_classes
   1456 if len(valid_classes) > 1 and configuration:
   1457     # there are valid classes which are not the current class

PineconeApiTypeError: Invalid type for variable '0'. Required value type is float and passed type was float64 at ['values'][0]

Description

the error is unexpected. the code is identical to the example in https://python.langchain.com/docs/integrations/vectorstores/pinecone/
same code works if using OllamaEmbeddings

System Info

System Information
------------------
> OS:  Linux
> OS Version:  #1 SMP Mon Dec 2 06:32:20 EST 2024
> Python Version:  3.9.18 | packaged by conda-forge | (main, Dec 23 2023, 16:33:10) 
[GCC 12.3.0]

Package Information
-------------------
> langchain_core: 0.3.28
> langchain: 0.3.13
> langchain_community: 0.3.13
> langsmith: 0.1.147
> langchain_chroma: 0.1.4
> langchain_milvus: 0.1.7
> langchain_ollama: 0.2.2
> langchain_pinecone: 0.2.0
> langchain_text_splitters: 0.3.4

Optional packages not installed
-------------------------------
> langserve

Other Dependencies
------------------
> aiohttp: 3.10.11
> async-timeout: 4.0.3
> chromadb: 0.5.23
> dataclasses-json: 0.6.7
> fastapi: 0.115.6
> httpx: 0.27.2
> httpx-sse: 0.4.0
> jsonpatch: 1.33
> langsmith-pyo3: Installed. No version info available.
> numpy: 1.26.4
> ollama: 0.4.5
> orjson: 3.10.12
> packaging: 24.0
> pinecone-client: 5.0.1
> pydantic: 2.10.4
> pydantic-settings: 2.7.0
> pymilvus: 2.5.3
> PyYAML: 6.0.2
> requests: 2.32.3
> requests-toolbelt: 1.0.0
> SQLAlchemy: 1.4.54
> tenacity: 8.5.0
> typing-extensions: 4.12.2
@dosubot dosubot bot added the Ɑ: vector store Related to vector store module label Jan 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Ɑ: vector store Related to vector store module
Projects
None yet
Development

No branches or pull requests

1 participant