Skip to content

Embedding Generation

Introduction

OpenAI Embedding is a powerful technique that allows us to represent words, sentences, or even entire documents as dense vectors in a high-dimensional space. These vectors capture the semantic meaning of the text and can be used for various natural language processing tasks such as sentiment analysis, language translation, and text classification.

By leveraging advanced machine learning models, OpenAI Embedding can generate highly expressive and context-aware representations of text. These embeddings enable us to perform complex operations like measuring semantic similarity between different pieces of text, finding nearest neighbors in the embedding space, and even generating meaningful text completions.

In this guide, we will explore the fundamentals of OpenAI Embedding and learn how to leverage its power to enhance our natural language processing applications.

Best Practices

When you want to use embedding model, it's very easy to use with api or python openai library. You can use the following code to generate embeddings for your text data:

curl --request POST \
  --url https://genaiapi.cloudsway.net/v1/ai/zUcfeMfrpNqyE/embeddings \
  --header 'Authorization: Bearer YOUR_API_KEY' \
  --header 'content-type: application/json' \
  --data '{
    "input": "The quick brown fox jumps over the lazy dog."
}'

The parameters you can specify in the request are:

  • input: The input text for which you want to generate embeddings.
  • encoding_format(Optional): The format of the embeddings (e.g., "float" or "base64"). This is an optional parameter, the default value is "float".
  • dimension(Optional): The dimension of the embeddings. This is an optional parameter, only supported in text-embedding-3.
  • user(Optional): The user id of the user who is making the request.

After sending the request, you will receive a response containing the embeddings for the input text. You can then use these embeddings for various natural language processing tasks.

The response example will be:

{
    "object": "list",
    "data": [
        {
            "object": "embeddng",
            "embedding": [
                0.1,
                0.2,
                0.3,
                ...
            ],
            index: 0
        }
    ],
    model: "text-embedding-ada-002",
    usage: {
        "prompt_tokens": 8,
        "total_tokens": 8
    }
}

You can also use the Python OpenAI library to generate embeddings. Here is an example code snippet:

from openai import OpenAI
client=OpenAI(
    base_url="https://genaiapi.cloudsway.net/v1/ai/zUcfeMfrpNqyE/embeddings",
    api_key='YOUR_API_KEY'
)

response = client.embeddings.create(
    model="text-embedding-ada-002", # model = "deployment_name"
    input = ["plz introduce china in 10 words."],
)

print(response)