loaders.py

Source: src/sunholo/chunker/loaders.py

Functions

convert_to_txt(file_path)

No docstring available.

convert_to_txt_and_extract(gs_file, split=False)

No docstring available.

ignore_files(filepath)

Returns True if the given path's file extension is found within config.json "code_extensions" array Returns False if not

read_file_to_documents(gs_file: pathlib._local.Path, metadata: dict = None)

No docstring available.

read_gdrive_to_document(url: str, metadata: dict = None)

No docstring available.

read_git_repo(clone_url, branch='main', metadata=None)

No docstring available.

read_url_to_document(url: str, metadata: dict = None)

No docstring available.

Classes

MyGoogleDriveLoader

.. deprecated:: 0.0.32 Use ``:class:`~langchain_google_community.GoogleDriveLoader``` instead. It will not be removed until langchain-community==1.0.

Load Google Docs from Google Drive.

copy(self) -> 'Self'
- Returns a shallow copy of the model.
deepcopy(self, memo: 'dict[int, Any] | None' = None) -> 'Self'
- Returns a deep copy of the model.
delattr(self, item: 'str') -> 'Any'
- Implement delattr(self, name).
eq(self, other: 'Any') -> 'bool'
- Return self==value.
getattr(self, item: 'str') -> 'Any'
- No docstring available.
getstate(self) -> 'dict[Any, Any]'
- Helper for pickle.
init(self, url, *args, **kwargs)
- Create a new model by parsing and validating input data from keyword arguments.

Raises [`ValidationError`][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

`self` is explicitly positional-only to allow `self` as a field name.

iter(self) -> 'TupleGenerator'
- So `dict(model)` works.
pretty(self, fmt: 'typing.Callable[[Any], Any]', **kwargs: 'Any') -> 'typing.Generator[Any, None, None]'
- Used by devtools (https://python-devtools.helpmanual.io/) to pretty print objects.
replace(self, **changes: 'Any') -> 'Self'
- No docstring available.
repr(self) -> 'str'
- Return repr(self).
repr_args(self) -> '_repr.ReprArgs'
- No docstring available.
repr_name(self) -> 'str'
- Name of the instance's class, used in repr.
repr_recursion(self, object: 'Any') -> 'str'
- Returns the string representation of a recursive object.
repr_str(self, join_str: 'str') -> 'str'
- No docstring available.
rich_repr(self) -> 'RichReprResult'
- Used by Rich (https://rich.readthedocs.io/en/stable/pretty.html) to pretty print objects.
setattr(self, name: 'str', value: 'Any') -> 'None'
- Implement setattr(self, name, value).
setstate(self, state: 'dict[Any, Any]') -> 'None'
- No docstring available.
str(self) -> 'str'
- Return str(self).
_calculate_keys(self, *args: 'Any', **kwargs: 'Any') -> 'Any'
- No docstring available.
_copy_and_set_values(self, *args: 'Any', **kwargs: 'Any') -> 'Any'
- No docstring available.
_extract_id(self, url)
- No docstring available.
_fetch_files_recursive(self, service: Any, folder_id: str) -> List[Dict[str, Union[str, List[str]]]]
- Fetch all files and subfolders recursively.
_iter(self, *args: 'Any', **kwargs: 'Any') -> 'Any'
- No docstring available.
_load_credentials(self) -> Any
- Load credentials. The order of loading credentials:

Service account key if file exists
Token path (for OAuth Client) if file exists
Credentials path (for OAuth Client) if file exists
Default credentials. if no credentials found, raise DefaultCredentialsError

_load_document_from_id(self, id: str) -> langchain_core.documents.base.Document
- Load a document from an ID.
_load_documents_from_folder(self, folder_id: str, *, file_types: Optional[Sequence[str]] = None) -> List[langchain_core.documents.base.Document]
- Load documents from a folder.
_load_documents_from_ids(self) -> List[langchain_core.documents.base.Document]
- Load documents from a list of IDs.
_load_file_from_id(self, id: str) -> List[langchain_core.documents.base.Document]
- Load a file from an ID.
_load_file_from_ids(self) -> List[langchain_core.documents.base.Document]
- Load files from a list of IDs.
_load_sheet_from_id(self, id: str) -> List[langchain_core.documents.base.Document]
- Load a sheet and all tabs from an ID.
_setattr_handler(self, name: 'str', value: 'Any') -> 'Callable[[BaseModel, str, Any], None] | None'
- Get a handler for setting an attribute on the model instance.

Returns: A handler for setting an attribute on the model instance. Used for memoization of the handler. Memoizing the handlers leads to a dramatic performance improvement in `setattr` Returns `None` when memoization is not safe, then the attribute is set directly.

alazy_load(self) -> 'AsyncIterator[Document]'
- A lazy loader for Documents.

Yields: the documents.

aload(self) -> 'list[Document]'
- Load data into Document objects.

Returns: the documents.

copy(self, *, include: 'AbstractSetIntStr | MappingIntStrAny | None' = None, exclude: 'AbstractSetIntStr | MappingIntStrAny | None' = None, update: 'Dict[str, Any] | None' = None, deep: 'bool' = False) -> 'Self'
- Returns a copy of the model.

!!! warning "Deprecated" This method is now deprecated; use model_copy instead.

If you need include or exclude, use:

data = self.model_dump(include=include, exclude=exclude, round_trip=True)
data = &#123;**data, **(update or &#123;&#125;)&#125;
copied = self.model_validate(data)

Args: include: Optional set or mapping specifying which fields to include in the copied model. exclude: Optional set or mapping specifying which fields to exclude in the copied model. update: Optional dictionary of field-value pairs to override field values in the copied model. deep: If True, the values of fields that are Pydantic models will be deep-copied.

Returns: A copy of the model with included, excluded and updated fields as specified.

dict(self, *, include: 'IncEx | None' = None, exclude: 'IncEx | None' = None, by_alias: 'bool' = False, exclude_unset: 'bool' = False, exclude_defaults: 'bool' = False, exclude_none: 'bool' = False) -> 'Dict[str, Any]'
- No docstring available.
json(self, *, include: 'IncEx | None' = None, exclude: 'IncEx | None' = None, by_alias: 'bool' = False, exclude_unset: 'bool' = False, exclude_defaults: 'bool' = False, exclude_none: 'bool' = False, encoder: 'Callable[[Any], Any] | None' = PydanticUndefined, models_as_dict: 'bool' = PydanticUndefined, **dumps_kwargs: 'Any') -> 'str'
- No docstring available.
lazy_load(self) -> 'Iterator[Document]'
- A lazy loader for Documents.

Yields: the documents.

load(self) -> List[langchain_core.documents.base.Document]
- Load documents.
load_and_split(self, text_splitter: 'Optional[TextSplitter]' = None) -> 'list[Document]'
- Load Documents and split into chunks. Chunks are returned as Documents.

Do not override this method. It should be considered to be deprecated!

Args: text_splitter: TextSplitter instance to use for splitting documents. Defaults to RecursiveCharacterTextSplitter.

Raises: ImportError: If langchain-text-splitters is not installed and no text_splitter is provided.

Returns: List of Documents.

load_from_url(self, url: str)
- No docstring available.
model_copy(self, *, update: 'Mapping[str, Any] | None' = None, deep: 'bool' = False) -> 'Self'
- !!! abstract "Usage Documentation" `model_copy`

Returns a copy of the model.

!!! note The underlying instance's [`dict`][object.dict] attribute is copied. This might have unexpected side effects if you store anything in it, on top of the model fields (e.g. the value of [cached properties][functools.cached_property]).

Args: update: Values to change/add in the new model. Note: the data is not validated before creating the new model. You should trust this data. deep: Set to `True` to make a deep copy of the model.

Returns: New model instance.

model_dump(self, *, mode: "Literal['json', 'python'] | str" = 'python', include: 'IncEx | None' = None, exclude: 'IncEx | None' = None, context: 'Any | None' = None, by_alias: 'bool | None' = None, exclude_unset: 'bool' = False, exclude_defaults: 'bool' = False, exclude_none: 'bool' = False, round_trip: 'bool' = False, warnings: "bool | Literal['none', 'warn', 'error']" = True, fallback: 'Callable[[Any], Any] | None' = None, serialize_as_any: 'bool' = False) -> 'dict[str, Any]'
- !!! abstract "Usage Documentation" `model_dump`

Generate a dictionary representation of the model, optionally specifying which fields to include or exclude.

Args: mode: The mode in which `to_python` should run. If mode is 'json', the output will only contain JSON serializable types. If mode is 'python', the output may contain non-JSON-serializable Python objects. include: A set of fields to include in the output. exclude: A set of fields to exclude from the output. context: Additional context to pass to the serializer. by_alias: Whether to use the field's alias in the dictionary key if defined. exclude_unset: Whether to exclude fields that have not been explicitly set. exclude_defaults: Whether to exclude fields that are set to their default value. exclude_none: Whether to exclude fields that have a value of `None`. round_trip: If True, dumped values should be valid as input for non-idempotent types such as Json[T]. warnings: How to handle serialization errors. False/"none" ignores them, True/"warn" logs errors, "error" raises a [`PydanticSerializationError`][pydantic_core.PydanticSerializationError]. fallback: A function to call when an unknown value is encountered. If not provided, a [`PydanticSerializationError`][pydantic_core.PydanticSerializationError] error is raised. serialize_as_any: Whether to serialize fields with duck-typing serialization behavior.

Returns: A dictionary representation of the model.

model_dump_json(self, *, indent: 'int | None' = None, include: 'IncEx | None' = None, exclude: 'IncEx | None' = None, context: 'Any | None' = None, by_alias: 'bool | None' = None, exclude_unset: 'bool' = False, exclude_defaults: 'bool' = False, exclude_none: 'bool' = False, round_trip: 'bool' = False, warnings: "bool | Literal['none', 'warn', 'error']" = True, fallback: 'Callable[[Any], Any] | None' = None, serialize_as_any: 'bool' = False) -> 'str'
- !!! abstract "Usage Documentation" `model_dump_json`

Generates a JSON representation of the model using Pydantic's `to_json` method.

Args: indent: Indentation to use in the JSON output. If None is passed, the output will be compact. include: Field(s) to include in the JSON output. exclude: Field(s) to exclude from the JSON output. context: Additional context to pass to the serializer. by_alias: Whether to serialize using field aliases. exclude_unset: Whether to exclude fields that have not been explicitly set. exclude_defaults: Whether to exclude fields that are set to their default value. exclude_none: Whether to exclude fields that have a value of `None`. round_trip: If True, dumped values should be valid as input for non-idempotent types such as Json[T]. warnings: How to handle serialization errors. False/"none" ignores them, True/"warn" logs errors, "error" raises a [`PydanticSerializationError`][pydantic_core.PydanticSerializationError]. fallback: A function to call when an unknown value is encountered. If not provided, a [`PydanticSerializationError`][pydantic_core.PydanticSerializationError] error is raised. serialize_as_any: Whether to serialize fields with duck-typing serialization behavior.

Returns: A JSON string representation of the model.

model_post_init(self, context: 'Any', /) -> 'None'
- Override this method to perform additional initialization after `init` and `model_construct`. This is useful if you want to do some validation that requires the entire model to be initialized.

Functions​

convert_to_txt(file_path)​

convert_to_txt_and_extract(gs_file, split=False)​

ignore_files(filepath)​

read_file_to_documents(gs_file: pathlib._local.Path, metadata: dict = None)​

read_gdrive_to_document(url: str, metadata: dict = None)​

read_git_repo(clone_url, branch='main', metadata=None)​

read_url_to_document(url: str, metadata: dict = None)​

Classes​

MyGoogleDriveLoader​

Functions

convert_to_txt(file_path)

convert_to_txt_and_extract(gs_file, split=False)

ignore_files(filepath)

read_file_to_documents(gs_file: pathlib._local.Path, metadata: dict = None)

read_gdrive_to_document(url: str, metadata: dict = None)

read_git_repo(clone_url, branch='main', metadata=None)

read_url_to_document(url: str, metadata: dict = None)

Classes

MyGoogleDriveLoader