How to use OpenSearch with your Teammate

Using Opensearch with Teammat allows to make both Opensearch syntax queries or vector search queries to retrieve relevant data to answer a given natural language prompt. This document covers how to properly configure an OpenSearch cluster to be best used by Workforce and how to fine tune the settings for your cluster.

What is OpenSearch

A highly scaleable no sql data store. This is a fork of ElasticSearch that AWS created some years back. Opensearch also has vector search capabilities.

For more info read more at https://opensearch.org/

Prerequisites

Your own OpenSearch cluster deployed with version 2.11 or greater.

Your cluster must be publicly available and not behind a firewall or a non public VPC.
If you will be using vector search capabilities within OpenSearch, you must embed all vectors using one of the below allowed models:

text-embedding-ada-002

BAAI/bge-small-en-v1.5

How to configure your OpenSearch cluster

Setting up your index mappings

Create your index with the following mapping schema.

{
        "settings": {"index": {"knn": true, "knn.algo_param.ef_search": 512}},
        "mappings": {
            "properties": {
                // Your properties
                .....

                "<geo field name>": {
                    "type": "geo_point"
                },
                "<vector field name>": {
                    "type": "knn_vector",
                    "dimension": 1536,
                    "method": {
                        "engine": "nmslib",
                        "space_type": "l2",
                        "name": "hnsw",
                        "parameters": {
                            "ef_construction": 512,
                            "m": 16
                        }
                    }
                }
            }
        }
}

Be sure to change <vector field name> and <geo field name> with what you would like to name your vector field.
<vector field name> and <geo field name> are both optional fields

Updating your documents with embeddings

Below is a Python example of how to enrich your documents with OpenAI embeddings using Langchain lib.

Configure OpenAI embeddings

from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(
    model= 'text-embedding-ada-002',
    openai_api_key='<your OpenAI API Key>'
)

Configure your OpenSearch client.

client = OpenSearch(
    hosts=[{'host': '<host>', 'port': '<port>'}],
    http_auth=('<username>', '<password>'),
    use_ssl=True,
    verify_certs=True,
    timeout=30,
)

Scan your index and embed your text data.

# Define a query to scan through the index (for example, match all documents)
query = {
    "query": {
        "match_all": {}
    }
}

# Scroll through the documents
response = client.search(index='<your index name>', body=query, scroll='2m')

for doc in response['hits']['hits']:
      doc_id = doc['_id']
      doc_source = doc['_source']

      # Perform the upsert (modify or add the property)
      embedding = embeddings.embed_query(doc_source.get('<text field to embed>'))
      doc_source.update({
          "<vector field name>": embedding,
      })

      # Upsert the document back into OpenSearch
      client.index(index='<your index name>', id=doc_id, body=doc_source)

Notice above we are only embedding text from one field doc_source.get('<text field to embed>'), you can aggregate text here however you'd like weather it's concatenating multiple fields or adding static values to the embeddings.

Connect a Teammate with OpenSearch

Start by creating a new teammate, select any agent type from https://app.buildworkforce.ai/my-teammates/new
Change your model to gpt-4o

In Knowledge Base section, select OpenSearch.

Follow the form fields and fill out values accordingly.

Name field is the name that the agent will know your dataset by. Please name this in context of what your dataset is about.
For Opensearch clusters that use basic http auth, define your host url as https://<username>:<password>@myhostdomain.com.
Geo and Vector fields should be the field name that you defined above.
Description field is used to explain to the agent what this dataset is about and when it should use it.

Updated on: 06/02/2025

Was this article helpful?

Thank you!