Skip to main content

loaders.py

Source: src/sunholo/chunker/loaders.py

Functions

convert_to_txt(file_path)

No docstring available.

convert_to_txt_and_extract(gs_file, split=False)

No docstring available.

ignore_files(filepath)

Returns True if the given path's file extension is found within config.json "code_extensions" array Returns False if not

read_file_to_documents(gs_file: pathlib._local.Path, metadata: dict = None)

No docstring available.

read_gdrive_to_document(url: str, metadata: dict = None)

No docstring available.

read_git_repo(clone_url, branch='main', metadata=None)

No docstring available.

read_url_to_document(url: str, metadata: dict = None)

No docstring available.

Classes

MyGoogleDriveLoader

.. deprecated:: 0.0.32 Use ``:class:`~langchain_google_community.GoogleDriveLoader``` instead. It will not be removed until langchain-community==1.0.

Load Google Docs from Google Drive.

  • copy(self) -> 'Self'

    • Returns a shallow copy of the model.
  • deepcopy(self, memo: 'dict[int, Any] | None' = None) -> 'Self'

    • Returns a deep copy of the model.
  • delattr(self, item: 'str') -> 'Any'

    • Implement delattr(self, name).
  • eq(self, other: 'Any') -> 'bool'

    • Return self==value.
  • getattr(self, item: 'str') -> 'Any'

    • No docstring available.
  • getstate(self) -> 'dict[Any, Any]'

    • Helper for pickle.
  • init(self, url, *args, **kwargs)

    • Create a new model by parsing and validating input data from keyword arguments.

Raises [`ValidationError`][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

`self` is explicitly positional-only to allow `self` as a field name.

  • iter(self) -> 'TupleGenerator'

    • So `dict(model)` works.
  • pretty(self, fmt: 'typing.Callable[[Any], Any]', **kwargs: 'Any') -> 'typing.Generator[Any, None, None]'

  • replace(self, **changes: 'Any') -> 'Self'

    • No docstring available.
  • repr(self) -> 'str'

    • Return repr(self).
  • repr_args(self) -> '_repr.ReprArgs'

    • No docstring available.
  • repr_name(self) -> 'str'

    • Name of the instance's class, used in repr.
  • repr_recursion(self, object: 'Any') -> 'str'

    • Returns the string representation of a recursive object.
  • repr_str(self, join_str: 'str') -> 'str'

    • No docstring available.
  • rich_repr(self) -> 'RichReprResult'

  • setattr(self, name: 'str', value: 'Any') -> 'None'

    • Implement setattr(self, name, value).
  • setstate(self, state: 'dict[Any, Any]') -> 'None'

    • No docstring available.
  • str(self) -> 'str'

    • Return str(self).
  • _calculate_keys(self, *args: 'Any', **kwargs: 'Any') -> 'Any'

    • No docstring available.
  • _copy_and_set_values(self, *args: 'Any', **kwargs: 'Any') -> 'Any'

    • No docstring available.
  • _extract_id(self, url)

    • No docstring available.
  • _fetch_files_recursive(self, service: Any, folder_id: str) -> List[Dict[str, Union[str, List[str]]]]

    • Fetch all files and subfolders recursively.
  • _iter(self, *args: 'Any', **kwargs: 'Any') -> 'Any'

    • No docstring available.
  • _load_credentials(self) -> Any

    • Load credentials. The order of loading credentials:
  1. Service account key if file exists
  2. Token path (for OAuth Client) if file exists
  3. Credentials path (for OAuth Client) if file exists
  4. Default credentials. if no credentials found, raise DefaultCredentialsError
  • _load_document_from_id(self, id: str) -> langchain_core.documents.base.Document

    • Load a document from an ID.
  • _load_documents_from_folder(self, folder_id: str, *, file_types: Optional[Sequence[str]] = None) -> List[langchain_core.documents.base.Document]

    • Load documents from a folder.
  • _load_documents_from_ids(self) -> List[langchain_core.documents.base.Document]

    • Load documents from a list of IDs.
  • _load_file_from_id(self, id: str) -> List[langchain_core.documents.base.Document]

    • Load a file from an ID.
  • _load_file_from_ids(self) -> List[langchain_core.documents.base.Document]

    • Load files from a list of IDs.
  • _load_sheet_from_id(self, id: str) -> List[langchain_core.documents.base.Document]

    • Load a sheet and all tabs from an ID.
  • _setattr_handler(self, name: 'str', value: 'Any') -> 'Callable[[BaseModel, str, Any], None] | None'

    • Get a handler for setting an attribute on the model instance.

Returns: A handler for setting an attribute on the model instance. Used for memoization of the handler. Memoizing the handlers leads to a dramatic performance improvement in `setattr` Returns `None` when memoization is not safe, then the attribute is set directly.

  • alazy_load(self) -> 'AsyncIterator[Document]'
    • A lazy loader for Documents.

Yields: the documents.

  • aload(self) -> 'list[Document]'
    • Load data into Document objects.

Returns: the documents.

  • copy(self, *, include: 'AbstractSetIntStr | MappingIntStrAny | None' = None, exclude: 'AbstractSetIntStr | MappingIntStrAny | None' = None, update: 'Dict[str, Any] | None' = None, deep: 'bool' = False) -> 'Self'
    • Returns a copy of the model.

!!! warning "Deprecated" This method is now deprecated; use model_copy instead.

If you need include or exclude, use:

data = self.model_dump(include=include, exclude=exclude, round_trip=True)
data = {**data, **(update or {})}
copied = self.model_validate(data)

Args: include: Optional set or mapping specifying which fields to include in the copied model. exclude: Optional set or mapping specifying which fields to exclude in the copied model. update: Optional dictionary of field-value pairs to override field values in the copied model. deep: If True, the values of fields that are Pydantic models will be deep-copied.

Returns: A copy of the model with included, excluded and updated fields as specified.

  • dict(self, *, include: 'IncEx | None' = None, exclude: 'IncEx | None' = None, by_alias: 'bool' = False, exclude_unset: 'bool' = False, exclude_defaults: 'bool' = False, exclude_none: 'bool' = False) -> 'Dict[str, Any]'

    • No docstring available.
  • json(self, *, include: 'IncEx | None' = None, exclude: 'IncEx | None' = None, by_alias: 'bool' = False, exclude_unset: 'bool' = False, exclude_defaults: 'bool' = False, exclude_none: 'bool' = False, encoder: 'Callable[[Any], Any] | None' = PydanticUndefined, models_as_dict: 'bool' = PydanticUndefined, **dumps_kwargs: 'Any') -> 'str'

    • No docstring available.
  • lazy_load(self) -> 'Iterator[Document]'

    • A lazy loader for Documents.

Yields: the documents.

  • load(self) -> List[langchain_core.documents.base.Document]

    • Load documents.
  • load_and_split(self, text_splitter: 'Optional[TextSplitter]' = None) -> 'list[Document]'

    • Load Documents and split into chunks. Chunks are returned as Documents.

Do not override this method. It should be considered to be deprecated!

Args: text_splitter: TextSplitter instance to use for splitting documents. Defaults to RecursiveCharacterTextSplitter.

Raises: ImportError: If langchain-text-splitters is not installed and no text_splitter is provided.

Returns: List of Documents.

  • load_from_url(self, url: str)

    • No docstring available.
  • model_copy(self, *, update: 'Mapping[str, Any] | None' = None, deep: 'bool' = False) -> 'Self'

    • !!! abstract "Usage Documentation" `model_copy`

Returns a copy of the model.

!!! note The underlying instance's [`dict`][object.dict] attribute is copied. This might have unexpected side effects if you store anything in it, on top of the model fields (e.g. the value of [cached properties][functools.cached_property]).

Args: update: Values to change/add in the new model. Note: the data is not validated before creating the new model. You should trust this data. deep: Set to `True` to make a deep copy of the model.

Returns: New model instance.

  • model_dump(self, *, mode: "Literal['json', 'python'] | str" = 'python', include: 'IncEx | None' = None, exclude: 'IncEx | None' = None, context: 'Any | None' = None, by_alias: 'bool | None' = None, exclude_unset: 'bool' = False, exclude_defaults: 'bool' = False, exclude_none: 'bool' = False, round_trip: 'bool' = False, warnings: "bool | Literal['none', 'warn', 'error']" = True, fallback: 'Callable[[Any], Any] | None' = None, serialize_as_any: 'bool' = False) -> 'dict[str, Any]'
    • !!! abstract "Usage Documentation" `model_dump`

Generate a dictionary representation of the model, optionally specifying which fields to include or exclude.

Args: mode: The mode in which `to_python` should run. If mode is 'json', the output will only contain JSON serializable types. If mode is 'python', the output may contain non-JSON-serializable Python objects. include: A set of fields to include in the output. exclude: A set of fields to exclude from the output. context: Additional context to pass to the serializer. by_alias: Whether to use the field's alias in the dictionary key if defined. exclude_unset: Whether to exclude fields that have not been explicitly set. exclude_defaults: Whether to exclude fields that are set to their default value. exclude_none: Whether to exclude fields that have a value of `None`. round_trip: If True, dumped values should be valid as input for non-idempotent types such as Json[T]. warnings: How to handle serialization errors. False/"none" ignores them, True/"warn" logs errors, "error" raises a [`PydanticSerializationError`][pydantic_core.PydanticSerializationError]. fallback: A function to call when an unknown value is encountered. If not provided, a [`PydanticSerializationError`][pydantic_core.PydanticSerializationError] error is raised. serialize_as_any: Whether to serialize fields with duck-typing serialization behavior.

Returns: A dictionary representation of the model.

  • model_dump_json(self, *, indent: 'int | None' = None, include: 'IncEx | None' = None, exclude: 'IncEx | None' = None, context: 'Any | None' = None, by_alias: 'bool | None' = None, exclude_unset: 'bool' = False, exclude_defaults: 'bool' = False, exclude_none: 'bool' = False, round_trip: 'bool' = False, warnings: "bool | Literal['none', 'warn', 'error']" = True, fallback: 'Callable[[Any], Any] | None' = None, serialize_as_any: 'bool' = False) -> 'str'
    • !!! abstract "Usage Documentation" `model_dump_json`

Generates a JSON representation of the model using Pydantic's `to_json` method.

Args: indent: Indentation to use in the JSON output. If None is passed, the output will be compact. include: Field(s) to include in the JSON output. exclude: Field(s) to exclude from the JSON output. context: Additional context to pass to the serializer. by_alias: Whether to serialize using field aliases. exclude_unset: Whether to exclude fields that have not been explicitly set. exclude_defaults: Whether to exclude fields that are set to their default value. exclude_none: Whether to exclude fields that have a value of `None`. round_trip: If True, dumped values should be valid as input for non-idempotent types such as Json[T]. warnings: How to handle serialization errors. False/"none" ignores them, True/"warn" logs errors, "error" raises a [`PydanticSerializationError`][pydantic_core.PydanticSerializationError]. fallback: A function to call when an unknown value is encountered. If not provided, a [`PydanticSerializationError`][pydantic_core.PydanticSerializationError] error is raised. serialize_as_any: Whether to serialize fields with duck-typing serialization behavior.

Returns: A JSON string representation of the model.

  • model_post_init(self, context: 'Any', /) -> 'None'
    • Override this method to perform additional initialization after `init` and `model_construct`. This is useful if you want to do some validation that requires the entire model to be initialized.
Sunholo Multivac

Get in touch to see if we can help with your GenAI project.

Contact us

Other Links

Sunholo Multivac - GenAIOps

Copyright ©

Holosun ApS 2025