discovery_engine_client.py
Source: sunholo/discovery_engine/discovery_engine_client.py
Classes
DiscoveryEngineClient
Client for interacting with Google Cloud Discovery Engine.
Args: project_id (str): Your Google Cloud project ID. data_store_id (str): The ID of your Discovery Engine data store. location (str, optional): The location of the data store (default is 'eu').
Example:
client = DiscoveryEngineClient(project_id='your-project-id', data_store_id='your-data-store-id')
# Create a collection
collection_name = client.create_collection("my_new_collection")
# Perform a search
search_response = client.get_chunks("your query", "your_collection_id")
Parsing:
# Perform a search
search_response = client.get_chunks("your query", "your_collection_id")
# Iterate through the search results
for result in search_response.results:
# Get the document (which contains the chunks)
document = result.document
# Iterate through the chunks within the document
for chunk in document.chunks:
chunk_text = chunk.snippet # Extract the text content of the chunk
chunk_document_name = chunk.document_name # Get the name of the document the chunk belongs to
# Do something with the chunk_text and chunk_document_name (e.g., print, store, etc.)
print(f"Chunk Text: {chunk_text}")
print(f"Document Name: {chunk_document_name}")
-
init(self, data_store_id, project_id, location='eu')
- Initialize self. See help(type(self)) for accurate signature.
-
async_get_chunks(self, query: str, num_previous_chunks: int = 3, num_next_chunks: int = 3, page_size: int = 10, parse_chunks_to_string: bool = True, serving_config: str = 'default_serving_config')
- Retrieves chunks or documents based on a query.
Args: query (str): The search query. collection_id (str): The ID of the collection to search. num_previous_chunks (int, optional): Number of previous chunks to return for context (default is 3). num_next_chunks (int, optional): Number of next chunks to return for context (default is 3). page_size (int, optional): The maximum number of results to return per page (default is 10). parse_chunks_to_string: If True will put chunks in one big string, False will return object serving_config: The resource name of the Search serving config
Returns: discoveryengine.SearchResponse: The search response object containing the search results.
Example:
search_response = client.get_chunks('your query', 'your_collection_id')
for result in search_response.results:
for chunk in result.document.chunks:
print(f"Chunk: {chunk.snippet}, document name: {chunk.document_name}")
-
async_process_chunks(self, response)
- No docstring available.
-
chunk_format(self, chunk)
- No docstring available.
-
create_data_store(self, type='chunk', chunk_size: int = 500, collection: str = 'default_collection')
- No docstring available.
-
create_data_store_chunk(self, chunk_size: int = 500, collection: str = 'default_collection') -> str
- Creates a new data store with default configuration.
Args: chunk_size (int, optional): The size of the chunks to create for documents (default is 500).
Returns: str: The name of the long-running operation for data store creation.
-
create_engine(self, engine_id: str, data_store_ids: List[str], solution_type=None, search_tier=None, search_add_ons=None) -> str
- You only need this if calling Data Store via Vertex Tools.
-
data_store_path(self, collection: str = 'default_collection')
- No docstring available.
-
get_chunks(self, query: str, num_previous_chunks: int = 3, num_next_chunks: int = 3, page_size: int = 10, parse_chunks_to_string: bool = True, serving_config: str = 'default_serving_config')
- Retrieves chunks or documents based on a query.
Args: query (str): The search query. collection_id (str): The ID of the collection to search. num_previous_chunks (int, optional): Number of previous chunks to return for context (default is 3). num_next_chunks (int, optional): Number of next chunks to return for context (default is 3). page_size (int, optional): The maximum number of results to return per page (default is 10). parse_chunks_to_string: If True will put chunks in one big string, False will return object serving_config: The resource name of the Search serving config
Returns: discoveryengine.SearchResponse: The search response object containing the search results.
Example:
search_response = client.get_chunks('your query', 'your_collection_id')
for result in search_response.results:
for chunk in result.document.chunks:
print(f"Chunk: {chunk.snippet}, document name: {chunk.document_name}")
- import_documents(self, gcs_uri: Optional[str] = None, data_schema='content', branch='default_branch', bigquery_dataset: Optional[str] = None, bigquery_table: Optional[str] = None, bigquery_project_id: Optional[str] = None) -> str
- Args:
- gcs_uri: Required. List of Cloud Storage URIs to input files. Each URI can be up to 2000 characters long. URIs can match the full object path (for example, gs://bucket/directory/object.json) or a pattern matching one or more files, such as gs://bucket/directory/*.json. A request can contain at most 100 files (or 100,000 files if data_schema is content). Each file can be up to 2 GB (or 100 MB if data_schema is content).
- data_schema: Must be one of 'user_event', 'custom' or 'document' if using BigQuery. Default 'content' only for GCS. The schema to use when parsing the data from the source. Supported values for document imports: - document (default): One JSON Document per line. Each document must have a valid Document.id. - content: Unstructured data (e.g. PDF, HTML). Each file matched by input_uris becomes a document, with the ID set to the first 128 bits of SHA256(URI) encoded as a hex string. - custom: One custom data JSON per row in arbitrary format that conforms to the defined Schema of the data store. This can only be used by the GENERIC Data Store vertical. - csv: A CSV file with header conforming to the defined Schema of the data store. Each entry after the header is imported as a Document. This can only be used by the GENERIC Data Store vertical. Supported values for user event imports: - user_event (default): One JSON UserEvent per line.
- process_chunks(self, response)
- No docstring available.