Skip to main content

discovery_engine_client.py

Source: sunholo/discovery_engine/discovery_engine_client.py

Classes

DiscoveryEngineClient

Client for interacting with Google Cloud Discovery Engine.

Args: project_id (str): Your Google Cloud project ID. data_store_id (str): The ID of your Discovery Engine data store. location (str, optional): The location of the data store (default is 'eu').

Example:

client = DiscoveryEngineClient(project_id='your-project-id', data_store_id='your-data-store-id')

# Create a collection
collection_name = client.create_collection("my_new_collection")

# Perform a search
search_response = client.get_chunks("your query", "your_collection_id")

Parsing:

# Perform a search
search_response = client.get_chunks("your query", "your_collection_id")

# Iterate through the search results
for result in search_response.results:
# Get the document (which contains the chunks)
document = result.document

# Iterate through the chunks within the document
for chunk in document.chunks:
chunk_text = chunk.snippet # Extract the text content of the chunk
chunk_document_name = chunk.document_name # Get the name of the document the chunk belongs to

# Do something with the chunk_text and chunk_document_name (e.g., print, store, etc.)
print(f"Chunk Text: {chunk_text}")
print(f"Document Name: {chunk_document_name}")
  • init(self, data_store_id, project_id, location='eu')

    • Initialize self. See help(type(self)) for accurate signature.
  • async_get_chunks(self, query: str, num_previous_chunks: int = 3, num_next_chunks: int = 3, page_size: int = 10, parse_chunks_to_string: bool = True, serving_config: str = 'default_serving_config')

    • Retrieves chunks or documents based on a query.

Args: query (str): The search query. collection_id (str): The ID of the collection to search. num_previous_chunks (int, optional): Number of previous chunks to return for context (default is 3). num_next_chunks (int, optional): Number of next chunks to return for context (default is 3). page_size (int, optional): The maximum number of results to return per page (default is 10). parse_chunks_to_string: If True will put chunks in one big string, False will return object serving_config: The resource name of the Search serving config

Returns: discoveryengine.SearchResponse: The search response object containing the search results.

Example:

search_response = client.get_chunks('your query', 'your_collection_id')
for result in search_response.results:
for chunk in result.document.chunks:
print(f"Chunk: {chunk.snippet}, document name: {chunk.document_name}")
  • async_process_chunks(self, response)

    • No docstring available.
  • chunk_format(self, chunk)

    • No docstring available.
  • create_data_store(self, type='chunk', chunk_size: int = 500, collection: str = 'default_collection')

    • No docstring available.
  • create_data_store_chunk(self, chunk_size: int = 500, collection: str = 'default_collection') -> str

    • Creates a new data store with default configuration.

Args: chunk_size (int, optional): The size of the chunks to create for documents (default is 500).

Returns: str: The name of the long-running operation for data store creation.

  • create_engine(self, engine_id: str, data_store_ids: List[str], solution_type=None, search_tier=None, search_add_ons=None) -> str

    • You only need this if calling Data Store via Vertex Tools.
  • data_store_path(self, collection: str = 'default_collection')

    • No docstring available.
  • get_chunks(self, query: str, num_previous_chunks: int = 3, num_next_chunks: int = 3, page_size: int = 10, parse_chunks_to_string: bool = True, serving_config: str = 'default_serving_config')

    • Retrieves chunks or documents based on a query.

Args: query (str): The search query. collection_id (str): The ID of the collection to search. num_previous_chunks (int, optional): Number of previous chunks to return for context (default is 3). num_next_chunks (int, optional): Number of next chunks to return for context (default is 3). page_size (int, optional): The maximum number of results to return per page (default is 10). parse_chunks_to_string: If True will put chunks in one big string, False will return object serving_config: The resource name of the Search serving config

Returns: discoveryengine.SearchResponse: The search response object containing the search results.

Example:

search_response = client.get_chunks('your query', 'your_collection_id')
for result in search_response.results:
for chunk in result.document.chunks:
print(f"Chunk: {chunk.snippet}, document name: {chunk.document_name}")
  • import_documents(self, gcs_uri: Optional[str] = None, data_schema='content', branch='default_branch', bigquery_dataset: Optional[str] = None, bigquery_table: Optional[str] = None, bigquery_project_id: Optional[str] = None) -> str
    • Args:
  • gcs_uri: Required. List of Cloud Storage URIs to input files. Each URI can be up to 2000 characters long. URIs can match the full object path (for example, gs://bucket/directory/object.json) or a pattern matching one or more files, such as gs://bucket/directory/*.json. A request can contain at most 100 files (or 100,000 files if data_schema is content). Each file can be up to 2 GB (or 100 MB if data_schema is content).
  • data_schema: Must be one of 'user_event', 'custom' or 'document' if using BigQuery. Default 'content' only for GCS. The schema to use when parsing the data from the source. Supported values for document imports: - document (default): One JSON Document per line. Each document must have a valid Document.id. - content: Unstructured data (e.g. PDF, HTML). Each file matched by input_uris becomes a document, with the ID set to the first 128 bits of SHA256(URI) encoded as a hex string. - custom: One custom data JSON per row in arbitrary format that conforms to the defined Schema of the data store. This can only be used by the GENERIC Data Store vertical. - csv: A CSV file with header conforming to the defined Schema of the data store. Each entry after the header is imported as a Document. This can only be used by the GENERIC Data Store vertical. Supported values for user event imports: - user_event (default): One JSON UserEvent per line.
  • process_chunks(self, response)
    • No docstring available.
Sunholo Multivac

Get in touch to see if we can help with your GenAI project.

Contact us

Other Links

Sunholo Multivac - GenAIOps

Copyright ©

Holosun ApS 2024