GridGain Developers Hub

Vector Search

GridGain can index vectors stored in a field and then search the cache based on the provided vector.

Requirements

  • GridGain must be running on Java 11 or later.

  • GridGain license must provide access to vector search feature.

  • Vector search can only be implemented for REPLICATED caches.

  • Vectors for the field must be acquired by using a separate model, as no model is provided with GridGain.

Installation

To start using vector store, enable the optional gridgain-vector-query module.

Vector Fields

When creating the field for vector, mark the field that will hold the vector with the QueryVectorField annotation. This field must have the float[] type. GridGain will create a vector index based on the provided embedding.

The example below shows a class that uses a text field and a vector field:

public class Article {
    /**
     * Content (indexed).
     */
    @QueryTextField
    private String content;

    @QueryVectorField
    private float[] contentVector;

    /**
     * Required for binary deserialization.
     */
    public Article() {
        // No-op.
    }

    public Article(String contentVector, float[] contentVec) {
        this.contentVector = contentVector;
        this.vec = contentVec;
    }
}

Objects with vector fields can be stored as normal. GridGain will build an additional index for the vector column that can be queried.

Performing a Vector Query

To perform a vector query, you would need a search vector provided by the same model as the one used to create the original vectors for the database objects. In this example, we will assume that you have procured the required vector already. Once the vector is available, you can use the VectorQuery object to create a query and send it to the cluster with the query method:

float[] searchVector = // get from model
VectorQuery myQuery = new VectorQuery(Article.class, "myField", searchVector, 5)
cache.query(myQuery).getAll());

The VectorQuery constructor accepts the following parameters:

  • The first parameter specifies the Article object representing the cache entry type.

  • The second parameter specifies the name of the vector field that will be searched.

  • The third parameter specifies the previously obtained search vector.

  • The fourth parameter specifies the maximum number of results to return. This parameter, often referred to as k in nearest neighbor searches, determines how many nearest neighbors the query will retrieve.

  • You can also specify an optional fifth threshold parameter to control the quality of results returned, for example:

    float[] searchVector = // get from model
    // Using the overloaded constructor with threshold parameter
    VectorQuery myQuery = new VectorQuery(Article.class, "myField", searchVector, 5, 0.75)
    cache.query(myQuery).getAll());

    The threshold must be a float value between 0.0 and 1.0, where higher values mean the results must be more similar to the search vector. This example returns up to 5 nearest neighbors, but only those that have a similarity score of at least 0.75. If fewer than 5 neighbors meet this threshold, fewer results will be returned. Using a threshold can help ensure that your search only returns relevant results and filters out vectors that are too dissimilar from the search vector.