VertexAI Grounding with LlamaIndex and Google Search
LlamaIndex is available within the VertexAI platform via a serverless integration
Grounding is also available using a Google Search
sunholo
integrates with this application by providing HTTP endpoints for the indexing or new documents placed within Google Cloud Storage and via streaming or static VAC endpoints. Whilst only some embedding features are implemented at the moment, the LlamaIndex on VertexAI integration takes care of a lot of aspects such as chunking and embedding, with no server to set up. This makes it a good choice for quick and low-maintenance RAG applications.
The code for this VAC is available at the Public VAC GitHub repository.
Setup
You need a corpus ID created when you make one (only available via API at the moment):
import vertexai
from vertexai.preview import rag
vertexai.init(project=<project_id>, location="us-central1")
corpus = rag.create_corpus(display_name=..., description=...)
print(corpus)
Use the project_id, location and corpus_id within your config below.
You need these sunholo
modules:
pip install sunholo[gcp,http]
If you want to test using the CLI also install sunholo[cli]
e.g
pip install sunholo[gcp,http,cli]
File Indexing
Once your configuration is loaded within Multivac, embed and index them by adding files to your Google Cloud Storage bucket to have the files indexed, via llamaindex.import_files.py
. This supports large amounts of files.
For more details on how to set up indexing, see the embedding pipeline documentation.
Config
To use LlamaIndex on VertexAI, set up a memory store to send data to:
llm
- LlamaIndex on VertexAI is only available on "vertex"model
- Needs to be one of the supported models listed hereagent
-vertex-genai
is the VAC code shown in this exampledisplay_name
- for UI integrationsdescription
- for UI integrationsgrounding
- Set to add Google Search grounding to the context of the answersmemory
- configure thevectorstore
setting tollamaindex
to trigger sending data to the VertexAI rag corpus. You can also send data to other memory types, such asalloydb
.gcp_config
- settings that determine which VertexAI rag instance the data is sent to. Only available inus-central1
at the moment.rag_id
is the numeric identifier that you get when usingrag.create()
to make the RAG corpus.chunker
- settings to configure on how LlamaIndex splits the data.
kind: vacConfig
apiVersion: v1
gcp_config: # reached via vac='global'
project_id: your-gcp-project
location: europe-west1
vac:
personal_llama:
llm: vertex
model: gemini-1.5-pro-preview-0514
agent: vertex-genai
display_name: Gemini with grounding via Google Search and LlamaIndex
description: Using LlamaIndex RAG and Google Search to ground the answers
grounding:
google_search: true # if true will use Google Search in grounding results
memory:
- llamaindex-native:
vectorstore: llamaindex
rag_id: 123123132 # you create this during setup
gcp_config:
project_id: your-gcp-project
location: us-central1 # llamaindex is only available in us-central1 atm
chunker:
chunk_size: 1000
overlap: 200
Test calls
Locally
Start up the Flask server:
python vertex-genai/app.py
curl query against the URLs:
curl http://127.0.0.1:8080/vac/personal_llama \
-H "Content-Type: application/json" \
-d '{
"user_input": "What do you know about MLOps?"
}'
curl http://127.0.0.1:8080/vac/streaming/personal_llama \
-H "Content-Type: application/json" \
-d '{
"user_input": "What do you know about MLOps?"
}'
Deployed
If deployed on Multivac, you can use the sunholo
CLI to chat with an instance via a proxy for the authenticated calls:
Assuming the same config as above, which has a practioners guide to MLOPs within its Llamaindex:
sunholo vac chat personal_llama
VAC Proxies - `sunholo proxy list`
┏━━━━━━━━━━━━━━┳━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━┓
┃ VAC ┃ Port ┃ PID ┃ URL ┃ Local ┃ Logs ┃
┡━━━━━━━━━━━━━━╇━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━┩
│ vertex-genai │ 8080 │ 48434 │ http://127.0.0.1:8080 │ No │ No log file specified │
└──────────────┴──────┴───────┴───────────────────────┴───────┴───────────────────────┘
╭───────────────────────────────── Gemini with grounding via Google Search and LlamaIndex ─────────────────────────────────╮
│ Starting VAC chat session │
╰────── stream: http://127.0.0.1:8080/vac/streaming/personal_llama invoke: http://127.0.0.1:8080/vac/personal_llama ───────╯
You: what is MLOps?
✺ Thinking...
The reply below takes the grounding from LlamaIndex:
personal_llama: MLOps, short for Machine Learning Operations, is a methodology bridging the gap between machine learning application
development and their real-world deployment and operation. It aims to streamline and automate the entire lifecycle of an ML
model, from its initial development to deployment and ongoing maintenance.
Here's a breakdown of what MLOps encompasses:
* **Standardized Processes:** MLOps establishes standardized workflows and best practices for building, deploying, and
managing ML systems, ensuring consistency and efficiency.
* **Automation:** It emphasizes automating repetitive tasks within the ML lifecycle, like model training, testing, and
deployment, to accelerate development and reduce errors.
* **Reliability and Scalability:** MLOps promotes practices that guarantee the reliability, scalability, and performance of
ML models in production environments.
* **Continuous Improvement:** By incorporating monitoring, logging, and feedback mechanisms, MLOps enables continuous model
improvement and adaptation over time.
**In essence, MLOps combines the best of software engineering and machine learning to deliver ML solutions in a robust,
scalable, and automated manner.**
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Testing the Google Search grounding:
You: what happened this week in the run up to the UK election?
✺ Thinking...
personal_llama: Combining the information provided:
While I cannot access real-time information to confirm specific news from last week, it's likely that news regarding the UK
general election included:
* **Reform UK launching their campaign in Clacton.** This suggests the party is actively campaigning in specific areas.
* **The Liberal Democrats announcing an NHS funding pledge.** This indicates parties are revealing key policy points in
their election manifestos.