discovery_engine_client.py

Source: sunholo/discovery_engine/discovery_engine_client.py

Classes

DiscoveryEngineClient

Client for interacting with Google Cloud Discovery Engine.

Args: project_id (str): Your Google Cloud project ID. data_store_id (str): The ID of your Discovery Engine data store. location (str, optional): The location of the data store (default is 'eu').

Example:

client = DiscoveryEngineClient(project_id='your-project-id', data_store_id='your-data-store-id')

# Create a collection
collection_name = client.create_collection("my_new_collection")

# Perform a search
search_response = client.get_chunks("your query", "your_collection_id")

Parsing:

# Perform a search
search_response = client.get_chunks("your query", "your_collection_id")

# Iterate through the search results
for result in search_response.results:
    # Get the document (which contains the chunks)
    document = result.document

    # Iterate through the chunks within the document
    for chunk in document.chunks:
        chunk_text = chunk.snippet  # Extract the text content of the chunk
        chunk_document_name = chunk.document_name  # Get the name of the document the chunk belongs to
        
        # Do something with the chunk_text and chunk_document_name (e.g., print, store, etc.)
        print(f"Chunk Text: {chunk_text}")
        print(f"Document Name: {chunk_document_name}")

init(self, data_store_id, project_id, location='eu')
- Initialize self. See help(type(self)) for accurate signature.
async_get_chunks(self, query: str, num_previous_chunks: int = 3, num_next_chunks: int = 3, page_size: int = 10, parse_chunks_to_string: bool = True, serving_config: str = 'default_serving_config')
- Retrieves chunks or documents based on a query.

Args: query (str): The search query. collection_id (str): The ID of the collection to search. num_previous_chunks (int, optional): Number of previous chunks to return for context (default is 3). num_next_chunks (int, optional): Number of next chunks to return for context (default is 3). page_size (int, optional): The maximum number of results to return per page (default is 10). parse_chunks_to_string: If True will put chunks in one big string, False will return object serving_config: The resource name of the Search serving config

Returns: discoveryengine.SearchResponse: The search response object containing the search results.

Example:

search_response = client.get_chunks('your query', 'your_collection_id')
for result in search_response.results:
    for chunk in result.document.chunks:
        print(f"Chunk: {chunk.snippet}, document name: {chunk.document_name}")

async_process_chunks(self, response)
- No docstring available.
chunk_format(self, chunk)
- No docstring available.
create_data_store(self, type='chunk', chunk_size: int = 500, collection: str = 'default_collection')
- No docstring available.
create_data_store_chunk(self, chunk_size: int = 500, collection: str = 'default_collection') -> str
- Creates a new data store with default configuration.

Args: chunk_size (int, optional): The size of the chunks to create for documents (default is 500).

Returns: str: The name of the long-running operation for data store creation.

create_engine(self, engine_id: str, data_store_ids: List[str], solution_type=None, search_tier=None, search_add_ons=None) -> str
- You only need this if calling Data Store via Vertex Tools.
data_store_path(self, collection: str = 'default_collection')
- No docstring available.
get_chunks(self, query: str, num_previous_chunks: int = 3, num_next_chunks: int = 3, page_size: int = 10, parse_chunks_to_string: bool = True, serving_config: str = 'default_serving_config')
- Retrieves chunks or documents based on a query.

Returns: discoveryengine.SearchResponse: The search response object containing the search results.

Example:

search_response = client.get_chunks('your query', 'your_collection_id')
for result in search_response.results:
    for chunk in result.document.chunks:
        print(f"Chunk: {chunk.snippet}, document name: {chunk.document_name}")

import_documents(self, gcs_uri: Optional[str] = None, data_schema='content', branch='default_branch', bigquery_dataset: Optional[str] = None, bigquery_table: Optional[str] = None, bigquery_project_id: Optional[str] = None) -> str
- Args:

gcs_uri: Required. List of Cloud Storage URIs to input files. Each URI can be up to 2000 characters long. URIs can match the full object path (for example, gs://bucket/directory/object.json) or a pattern matching one or more files, such as gs://bucket/directory/*.json. A request can contain at most 100 files (or 100,000 files if data_schema is content). Each file can be up to 2 GB (or 100 MB if data_schema is content).
data_schema: Must be one of 'user_event', 'custom' or 'document' if using BigQuery. Default 'content' only for GCS. The schema to use when parsing the data from the source. Supported values for document imports: - document (default): One JSON Document per line. Each document must have a valid Document.id. - content: Unstructured data (e.g. PDF, HTML). Each file matched by input_uris becomes a document, with the ID set to the first 128 bits of SHA256(URI) encoded as a hex string. - custom: One custom data JSON per row in arbitrary format that conforms to the defined Schema of the data store. This can only be used by the GENERIC Data Store vertical. - csv: A CSV file with header conforming to the defined Schema of the data store. Each entry after the header is imported as a Document. This can only be used by the GENERIC Data Store vertical. Supported values for user event imports: - user_event (default): One JSON UserEvent per line.

process_chunks(self, response)
- No docstring available.

Classes​

DiscoveryEngineClient​

Classes

DiscoveryEngineClient