Skip to main content

Vertex AI

Vertex AI Extensions

Vertex AI Extensions are API endpoints your GenAI applications can use to import data not within the model. Extensions often wrap another API.

An example is the Code Extension, which lets you execute code in your GenAI workflow.

Since each VAC running has its own API endpoints, they are candidates for becoming Vertex AI Extensions to be called from other VACs or other GenAI applications not running upon Multivac Cloud. Vertex AI Extensions have different authentication options ranging from free to an API key or OAuth2.

The VertexAIExtensions class provides methods for executing, creating and deploying Vertex AI extensions.

Set extensions within your vacConfig to use specific extensions in your VAC:

  my_extension_powered_vac:
llm: vertex
model: gemini-1.5-pro-001
agent: vertex-genai
extensions:
- operation_id: post_edmonbrain_invoke
vac: edmonbrain # optional - if extension is calling a vac then this is used to determine the URL for the extension
extension_display_name: 'Edmonbrain Database' # specify this or extension_id
#extension_id: 123123123
operation_params: # helps get_extension_content() to know what schema will send in data and how to parse it out its reply
output:
answer: "output.content" # which key to use for question
metadata: "output.metadata" # which key to use for metadata
input:
question: "" # Placeholder for the question parameter
chat_history: [] # Optional chat history
# other input parameters as needed by your extension
animal: ""

You could then fetch data from the Vertex AI Extension from within your app using the helper function get_extension_content()

from sunholo.vertex import get_extension_content
from sunholo.utils import ConfigManager

config = ConfigManager('my_extension_powered_vac')
question = "What is in my database that talks about kittens?")

# maybe other params your extension handles i.e. 'animal'
extension_content = get_extension_content(question, config=config, animal="cat")

VertexAIExtensions()

The underlying VertexAIExtensions() class has methods to aid creating extensions and executing them. See its documentation for more information.

Formally called Enterprise Search and AI Search and Conversation, this is a data store chunk version.

Set vectorstore: vertex_ai_search to use in your application

memory:
- discovery_engine_vertex_ai_search:
vectorstore: vertex_ai_search # or 'discovery_engine'

LlamaIndex on Vertex AI

To use Llama Index on Vertex AI, set it as a memory within your vacConfig file.

Set vectorstore: llamaindex

memory:
- llamaindex-native:
vectorstore: llamaindex
rag_id: 4611686018427387904

Calling Vertex AI Search and LlamaIndex

First add vectorstore: llamaindex and/or vectorstore: vertex_ai_search to your vacConfig file:

kind: vacConfig
apiVersion: v1
vac:
personal_llama:
llm: vertex
model: gemini-1.5-pro-preview-0514
agent: vertex-genai
display_name: Gemini with grounding via LlamaIndex and Vertex AI Search
memory:
- llamaindex-native:
vectorstore: llamaindex
rag_id: 4611686018427387904 # created via cli beforehand
- discovery_engine_vertex_ai_search:
vectorstore: vertex_ai_search # or discovery_engine

Then you can call those memory types (vertex_ai_search or llamaindex) in your Vertex GenAI apps like this:

from sunholo.vertex import init_vertex, get_vertex_memories, vertex_safety
from sunholo.utils import ConfigManager

from vertexai.preview.generative_models import GenerativeModel, Tool

vac_name = "must_match_your_vacConfig"

# will init vertex client
init_vertex()

# get_vertex_memories() will look in your vacConfig for vertex-ai-search and llamaindex vectorstores
# Fetches a Vertex AI Search chunked memory (Discovery Engine)
# also fetches a LlamaIndex chunked memory (LlamaIndexc on Vertex)
config = ConfigManager(vac_name)
corpus_tools = get_vertex_memories(config)

# load model from config
model = config.vacConfig("model")

# use vertex Generative model with your tools
rag_model = GenerativeModel(
model_name=model or "gemini-1.5-flash-001",
tools=corpus_tools,
)

# call the model
response = rag_model.generate_content(contents,
safety_settings=vertex_safety(),
stream=True)
for chunk in response:
print(chunk)

There is also a class for working with Vertex Corpus, which can be accessed via the CLI or in python code:

sunholo llamaindex -h  
usage: sunholo llamaindex [-h] {create,delete,fetch,find,list} ...

positional arguments:
{create,delete,fetch,find,list}
LlamaIndex subcommands
create Create a new corpus
delete Delete a corpus
fetch Fetch a corpus
find Find a corpus
list List all corpus

optional arguments:
-h, --help show this help message and exit

Calling Vertex AI Search via Langchain

The above example used the vertex python library, but you can use Vertex AI Search from any python script.

LlamaIndex on Vertex can't be used from non-Vertex framworks, but you can deploy a native LlamaIndex VAC and use it instead - perhaps via Vertex AI Extensions

A popular GenAI framework is Langchain.

To use Vertex AI Search within Langchain, the DiscoveryEngineClient can be used to import or export chunks from the Vertex AI Search data store.

DiscoveryEngine is the old name for Vertex AI Search

An example for a vac_service.py file is below, based of a Langchain QA Chat to docs tutorial.

from sunholo.components import pick_retriever, get_llm, get_embeddings
from sunholo.discovery_engine.discovery_engine_client import DiscoveryEngineClient
from sunholo.utils.gcp_project import get_gcp_project
from sunholo.utils.parsers import escape_braces

def vac(question: str, vector_name, chat_history=[], **kwargs):

llm = get_llm(vector_name)
embeddings = get_embeddings(vector_name)
retriever = pick_retriever(vector_name, embeddings=embeddings)
intro_prompt = load_prompt_from_yaml("intro", prefix=vector_name)

# create data store client, that has the vector_name VAC as its id
de = DiscoveryEngineClient(vector_name, project_id=get_gcp_project())

chunks = de.get_chunks(question)
chunk_prompt = intro_prompt.format(context=chunks)

# we stuff chunks into a langchain prompt that may contain { }
# so use escape_braces() so it doesn't break langchain promptTemplate
chunked_prompt = escape_braces(chunk_prompt) + "\n{context}\nQuestion:{input}\nYour Answer:\n"

message_tuples = [
("system", "You are an assistant bot who is very helpful in your answers"),
("human", {"type": "text", "text": chunked_prompt})
]

prompt = ChatPromptTemplate.from_messages(message_tuples)

summarise_prompt = PromptTemplate.from_template(load_prompt_from_yaml("summarise", prefix=vector_name))

question_answer_chain = create_stuff_documents_chain(llm, prompt)
history_aware_retriever = create_history_aware_retriever(
llm, retriever, summarise_prompt
)

chain = create_retrieval_chain(history_aware_retriever, question_answer_chain)

response = chain.invoke({"input": question, "chat_history": chat_history})

return {"answer": response}

Vertex Model Garden

To use GenAI model's deployed to Vertex Model Garden, you can set your 'llm' config and supply an endpoint_id

vac_model_garden:
llm: model_garden
gcp_config:
project_id: model_garden_project
endpoint_id: 12345678
location: europe-west1
Sunholo Multivac

Get in touch to see if we can help with your GenAI project.

Contact us

Other Links

Sunholo Multivac - GenAIOps

Copyright ©

Holosun ApS 2024