Vertex AI
Vertex AI Extensions
Vertex AI Extensions are API endpoints your GenAI applications can use to import data not within the model. Extensions often wrap another API.
An example is the Code Extension, which lets you execute code in your GenAI workflow.
Since each VAC running has its own API endpoints, they are candidates for becoming Vertex AI Extensions to be called from other VACs or other GenAI applications not running upon Multivac Cloud. Vertex AI Extensions have different authentication options ranging from free to an API key or OAuth2.
The VertexAIExtensions
class provides methods for executing, creating and deploying Vertex AI extensions.
Set extensions
within your vacConfig
to use specific extensions in your VAC:
my_extension_powered_vac:
llm: vertex
model: gemini-1.5-pro-001
agent: vertex-genai
extensions:
- operation_id: post_edmonbrain_invoke
vac: edmonbrain # optional - if extension is calling a vac then this is used to determine the URL for the extension
extension_display_name: 'Edmonbrain Database' # specify this or extension_id
#extension_id: 123123123
operation_params: # helps get_extension_content() to know what schema will send in data and how to parse it out its reply
output:
answer: "output.content" # which key to use for question
metadata: "output.metadata" # which key to use for metadata
input:
question: "" # Placeholder for the question parameter
chat_history: [] # Optional chat history
# other input parameters as needed by your extension
animal: ""
You could then fetch data from the Vertex AI Extension from within your app using the helper function get_extension_content()
from sunholo.vertex import get_extension_content
from sunholo.utils import ConfigManager
config = ConfigManager('my_extension_powered_vac')
question = "What is in my database that talks about kittens?")
# maybe other params your extension handles i.e. 'animal'
extension_content = get_extension_content(question, config=config, animal="cat")
VertexAIExtensions()
The underlying VertexAIExtensions() class has methods to aid creating extensions and executing them. See its documentation for more information.
Vertex AI Search
Formally called Enterprise Search and AI Search and Conversation, this is a data store chunk version.
Set vectorstore: vertex_ai_search
to use in your application
memory:
- discovery_engine_vertex_ai_search:
vectorstore: vertex_ai_search # or 'discovery_engine'
LlamaIndex on Vertex AI
To use Llama Index on Vertex AI, set it as a memory
within your vacConfig
file.
Set vectorstore: llamaindex
memory:
- llamaindex-native:
vectorstore: llamaindex
rag_id: 4611686018427387904
Calling Vertex AI Search and LlamaIndex
First add vectorstore: llamaindex
and/or vectorstore: vertex_ai_search
to your vacConfig
file:
kind: vacConfig
apiVersion: v1
vac:
personal_llama:
llm: vertex
model: gemini-1.5-pro-preview-0514
agent: vertex-genai
display_name: Gemini with grounding via LlamaIndex and Vertex AI Search
memory:
- llamaindex-native:
vectorstore: llamaindex
rag_id: 4611686018427387904 # created via cli beforehand
- discovery_engine_vertex_ai_search:
vectorstore: vertex_ai_search # or discovery_engine
Then you can call those memory types (vertex_ai_search
or llamaindex
) in your Vertex GenAI apps like this:
from sunholo.vertex import init_vertex, get_vertex_memories, vertex_safety
from sunholo.utils import ConfigManager
from vertexai.preview.generative_models import GenerativeModel, Tool
vac_name = "must_match_your_vacConfig"
# will init vertex client
init_vertex()
# get_vertex_memories() will look in your vacConfig for vertex-ai-search and llamaindex vectorstores
# Fetches a Vertex AI Search chunked memory (Discovery Engine)
# also fetches a LlamaIndex chunked memory (LlamaIndexc on Vertex)
config = ConfigManager(vac_name)
corpus_tools = get_vertex_memories(config)
# load model from config
model = config.vacConfig("model")
# use vertex Generative model with your tools
rag_model = GenerativeModel(
model_name=model or "gemini-1.5-flash-001",
tools=corpus_tools,
)
# call the model
response = rag_model.generate_content(contents,
safety_settings=vertex_safety(),
stream=True)
for chunk in response:
print(chunk)
There is also a class for working with Vertex Corpus, which can be accessed via the CLI or in python code:
sunholo llamaindex -h
usage: sunholo llamaindex [-h] {create,delete,fetch,find,list} ...
positional arguments:
{create,delete,fetch,find,list}
LlamaIndex subcommands
create Create a new corpus
delete Delete a corpus
fetch Fetch a corpus
find Find a corpus
list List all corpus
optional arguments:
-h, --help show this help message and exit
Calling Vertex AI Search via Langchain
The above example used the vertex
python library, but you can use Vertex AI Search from any python script.
LlamaIndex on Vertex can't be used from non-Vertex framworks, but you can deploy a native LlamaIndex VAC and use it instead - perhaps via Vertex AI Extensions
A popular GenAI framework is Langchain.
To use Vertex AI Search within Langchain, the DiscoveryEngineClient
can be used to import or export chunks from the Vertex AI Search data store.
DiscoveryEngine is the old name for Vertex AI Search
An example for a vac_service.py
file is below, based of a Langchain QA Chat to docs tutorial.
from sunholo.components import pick_retriever, get_llm, get_embeddings
from sunholo.discovery_engine.discovery_engine_client import DiscoveryEngineClient
from sunholo.utils.gcp_project import get_gcp_project
from sunholo.utils.parsers import escape_braces
def vac(question: str, vector_name, chat_history=[], **kwargs):
llm = get_llm(vector_name)
embeddings = get_embeddings(vector_name)
retriever = pick_retriever(vector_name, embeddings=embeddings)
intro_prompt = load_prompt_from_yaml("intro", prefix=vector_name)
# create data store client, that has the vector_name VAC as its id
de = DiscoveryEngineClient(vector_name, project_id=get_gcp_project())
chunks = de.get_chunks(question)
chunk_prompt = intro_prompt.format(context=chunks)
# we stuff chunks into a langchain prompt that may contain { }
# so use escape_braces() so it doesn't break langchain promptTemplate
chunked_prompt = escape_braces(chunk_prompt) + "\n{context}\nQuestion:{input}\nYour Answer:\n"
message_tuples = [
("system", "You are an assistant bot who is very helpful in your answers"),
("human", {"type": "text", "text": chunked_prompt})
]
prompt = ChatPromptTemplate.from_messages(message_tuples)
summarise_prompt = PromptTemplate.from_template(load_prompt_from_yaml("summarise", prefix=vector_name))
question_answer_chain = create_stuff_documents_chain(llm, prompt)
history_aware_retriever = create_history_aware_retriever(
llm, retriever, summarise_prompt
)
chain = create_retrieval_chain(history_aware_retriever, question_answer_chain)
response = chain.invoke({"input": question, "chat_history": chat_history})
return {"answer": response}
Vertex Model Garden
To use GenAI model's deployed to Vertex Model Garden, you can set your 'llm' config and supply an endpoint_id
vac_model_garden:
llm: model_garden
gcp_config:
project_id: model_garden_project
endpoint_id: 12345678
location: europe-west1