Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow results #85

Open
BerserkerMother opened this issue Aug 26, 2024 · 2 comments
Open

Slow results #85

BerserkerMother opened this issue Aug 26, 2024 · 2 comments

Comments

@BerserkerMother
Copy link

Hi, I have checked the package and tested search both in Python and Rust; however, Python version is significantly faster. I am using Ultra 5 chip.

@Enet4
Copy link
Owner

Enet4 commented Aug 27, 2024

There isn't anything about the Rust bindings that would influence performance so drastically, so this is likely a situation with linking against a version of the library with less optimizations or a reduced instruction set. If you can describe how you built faiss-rs and from where you got the Python version, we can draw some conclusions.

@BerserkerMother
Copy link
Author

Thank you for your response and the great work.
Using archlinux, first I did:

sudo pacman -Sy intel-oneapi-mkl

then

cmake -B build -DFAISS_ENABLE_GPU=OFF -DFAISS_ENABLE_C_API=ON -DBUILD_SHARED_LIBS=ON -DFAISS_ENABLE_PYTHON=OFF -DMKL_LIBRARIES=/opt/intel/mkl/lib/intel64/libmkl_rt.so
make -C build -j 4   
cd build
sudo make install

In the project directory, cargo add faiss and the code:

use std::time;

use faiss::{index_factory, Index, MetricType};

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let my_data = vec![1f32 / 1000.0; 256 * 100000];
    let mut index = index_factory(256, "Flat", MetricType::L2)?;
    index.add(&my_data)?;
    let my_query = vec![2f32 / 1000.0; 256 * 10000];
    let start = time::Instant::now();
    let result = index.search(&my_query, 5)?;
    let duration = start.elapsed();
    // for (i, (l, d)) in result
    //     .labels
    //     .iter()
    //     .zip(result.distances.iter())
    //     .enumerate()
    // {
    //     println!("#{}: {} (D={})", i + 1, *l, *d);
    // }
    println!("{:?}", duration);
    Ok(())
}

takes 3.702652552s, but code

import numpy as np

import time

d = 256                           # dimension
nb = 100000# database size
nq = 10000# nb of queries
np.random.seed(1234)             # make reproducible
xb = np.random.random((nb, d)).astype('float32')
xb[:, 0] += np.arange(nb) / 1000.
xq = np.random.random((nq, d)).astype('float32')
xq[:, 0] += np.arange(nq) / 1000.

import faiss                   # make faiss available
index = faiss.IndexFlatL2(d)   # build the index
print(index.is_trained)
index.add(xb)                  # add vectors to the index
print(index.ntotal)

k = 10                          # we want to see 4 nearest neighbors
start = time.time()
D, I = index.search(xq, k)     # actual search
print(I[:5])                   # neighbors of the 5 first queries
print(I[-5:])                  # neighbors of the 5 last queries
duration = time.time() - start
print(duration * 1000)

takes 1.55.612159729004s. For Python version I just did pip install faiss-cpu.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants