loaders.py
Source: src/sunholo/chunker/loaders.py
Functions
convert_to_txt(file_path)
No docstring available.
convert_to_txt_and_extract(gs_file, split=False)
No docstring available.
ignore_files(filepath)
Returns True if the given path's file extension is found within config.json "code_extensions" array Returns False if not
read_file_to_documents(gs_file: pathlib._local.Path, metadata: dict = None)
No docstring available.
read_gdrive_to_document(url: str, metadata: dict = None)
No docstring available.
read_git_repo(clone_url, branch='main', metadata=None)
No docstring available.
read_url_to_document(url: str, metadata: dict = None)
No docstring available.
Classes
MyGoogleDriveLoader
.. deprecated:: 0.0.32 Use ``:class:`~langchain_google_community.GoogleDriveLoader``` instead. It will not be removed until langchain-community==1.0.
Load Google Docs from Google Drive
.
-
copy(self) -> 'Self'
- Returns a shallow copy of the model.
-
deepcopy(self, memo: 'dict[int, Any] | None' = None) -> 'Self'
- Returns a deep copy of the model.
-
delattr(self, item: 'str') -> 'Any'
- Implement delattr(self, name).
-
eq(self, other: 'Any') -> 'bool'
- Return self==value.
-
getattr(self, item: 'str') -> 'Any'
- No docstring available.
-
getstate(self) -> 'dict[Any, Any]'
- Helper for pickle.
-
init(self, url, *args, **kwargs)
- Create a new model by parsing and validating input data from keyword arguments.
Raises [`ValidationError`][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
`self` is explicitly positional-only to allow `self` as a field name.
-
iter(self) -> 'TupleGenerator'
- So `dict(model)` works.
-
pretty(self, fmt: 'typing.Callable[[Any], Any]', **kwargs: 'Any') -> 'typing.Generator[Any, None, None]'
- Used by devtools (https://python-devtools.helpmanual.io/) to pretty print objects.
-
replace(self, **changes: 'Any') -> 'Self'
- No docstring available.
-
repr(self) -> 'str'
- Return repr(self).
-
repr_args(self) -> '_repr.ReprArgs'
- No docstring available.
-
repr_name(self) -> 'str'
- Name of the instance's class, used in repr.
-
repr_recursion(self, object: 'Any') -> 'str'
- Returns the string representation of a recursive object.
-
repr_str(self, join_str: 'str') -> 'str'
- No docstring available.
-
rich_repr(self) -> 'RichReprResult'
- Used by Rich (https://rich.readthedocs.io/en/stable/pretty.html) to pretty print objects.
-
setattr(self, name: 'str', value: 'Any') -> 'None'
- Implement setattr(self, name, value).
-
setstate(self, state: 'dict[Any, Any]') -> 'None'
- No docstring available.
-
str(self) -> 'str'
- Return str(self).
-
_calculate_keys(self, *args: 'Any', **kwargs: 'Any') -> 'Any'
- No docstring available.
-
_copy_and_set_values(self, *args: 'Any', **kwargs: 'Any') -> 'Any'
- No docstring available.
-
_extract_id(self, url)
- No docstring available.
-
_fetch_files_recursive(self, service: Any, folder_id: str) -> List[Dict[str, Union[str, List[str]]]]
- Fetch all files and subfolders recursively.
-
_iter(self, *args: 'Any', **kwargs: 'Any') -> 'Any'
- No docstring available.
-
_load_credentials(self) -> Any
- Load credentials. The order of loading credentials:
- Service account key if file exists
- Token path (for OAuth Client) if file exists
- Credentials path (for OAuth Client) if file exists
- Default credentials. if no credentials found, raise DefaultCredentialsError
-
_load_document_from_id(self, id: str) -> langchain_core.documents.base.Document
- Load a document from an ID.
-
_load_documents_from_folder(self, folder_id: str, *, file_types: Optional[Sequence[str]] = None) -> List[langchain_core.documents.base.Document]
- Load documents from a folder.
-
_load_documents_from_ids(self) -> List[langchain_core.documents.base.Document]
- Load documents from a list of IDs.
-
_load_file_from_id(self, id: str) -> List[langchain_core.documents.base.Document]
- Load a file from an ID.
-
_load_file_from_ids(self) -> List[langchain_core.documents.base.Document]
- Load files from a list of IDs.
-
_load_sheet_from_id(self, id: str) -> List[langchain_core.documents.base.Document]
- Load a sheet and all tabs from an ID.
-
_setattr_handler(self, name: 'str', value: 'Any') -> 'Callable[[BaseModel, str, Any], None] | None'
- Get a handler for setting an attribute on the model instance.
Returns: A handler for setting an attribute on the model instance. Used for memoization of the handler. Memoizing the handlers leads to a dramatic performance improvement in `setattr` Returns `None` when memoization is not safe, then the attribute is set directly.
- alazy_load(self) -> 'AsyncIterator[Document]'
- A lazy loader for Documents.
Yields: the documents.
- aload(self) -> 'list[Document]'
- Load data into Document objects.
Returns: the documents.
- copy(self, *, include: 'AbstractSetIntStr | MappingIntStrAny | None' = None, exclude: 'AbstractSetIntStr | MappingIntStrAny | None' = None, update: 'Dict[str, Any] | None' = None, deep: 'bool' = False) -> 'Self'
- Returns a copy of the model.
!!! warning "Deprecated"
This method is now deprecated; use model_copy
instead.
If you need include
or exclude
, use:
data = self.model_dump(include=include, exclude=exclude, round_trip=True)
data = {**data, **(update or {})}
copied = self.model_validate(data)
Args: include: Optional set or mapping specifying which fields to include in the copied model. exclude: Optional set or mapping specifying which fields to exclude in the copied model. update: Optional dictionary of field-value pairs to override field values in the copied model. deep: If True, the values of fields that are Pydantic models will be deep-copied.
Returns: A copy of the model with included, excluded and updated fields as specified.
-
dict(self, *, include: 'IncEx | None' = None, exclude: 'IncEx | None' = None, by_alias: 'bool' = False, exclude_unset: 'bool' = False, exclude_defaults: 'bool' = False, exclude_none: 'bool' = False) -> 'Dict[str, Any]'
- No docstring available.
-
json(self, *, include: 'IncEx | None' = None, exclude: 'IncEx | None' = None, by_alias: 'bool' = False, exclude_unset: 'bool' = False, exclude_defaults: 'bool' = False, exclude_none: 'bool' = False, encoder: 'Callable[[Any], Any] | None' = PydanticUndefined, models_as_dict: 'bool' = PydanticUndefined, **dumps_kwargs: 'Any') -> 'str'
- No docstring available.
-
lazy_load(self) -> 'Iterator[Document]'
- A lazy loader for Documents.
Yields: the documents.
-
load(self) -> List[langchain_core.documents.base.Document]
- Load documents.
-
load_and_split(self, text_splitter: 'Optional[TextSplitter]' = None) -> 'list[Document]'
- Load Documents and split into chunks. Chunks are returned as Documents.
Do not override this method. It should be considered to be deprecated!
Args: text_splitter: TextSplitter instance to use for splitting documents. Defaults to RecursiveCharacterTextSplitter.
Raises: ImportError: If langchain-text-splitters is not installed and no text_splitter is provided.
Returns: List of Documents.
-
load_from_url(self, url: str)
- No docstring available.
-
model_copy(self, *, update: 'Mapping[str, Any] | None' = None, deep: 'bool' = False) -> 'Self'
- !!! abstract "Usage Documentation" `model_copy`
Returns a copy of the model.
!!! note The underlying instance's [`dict`][object.dict] attribute is copied. This might have unexpected side effects if you store anything in it, on top of the model fields (e.g. the value of [cached properties][functools.cached_property]).
Args: update: Values to change/add in the new model. Note: the data is not validated before creating the new model. You should trust this data. deep: Set to `True` to make a deep copy of the model.
Returns: New model instance.
- model_dump(self, *, mode: "Literal['json', 'python'] | str" = 'python', include: 'IncEx | None' = None, exclude: 'IncEx | None' = None, context: 'Any | None' = None, by_alias: 'bool | None' = None, exclude_unset: 'bool' = False, exclude_defaults: 'bool' = False, exclude_none: 'bool' = False, round_trip: 'bool' = False, warnings: "bool | Literal['none', 'warn', 'error']" = True, fallback: 'Callable[[Any], Any] | None' = None, serialize_as_any: 'bool' = False) -> 'dict[str, Any]'
- !!! abstract "Usage Documentation" `model_dump`
Generate a dictionary representation of the model, optionally specifying which fields to include or exclude.
Args: mode: The mode in which `to_python` should run. If mode is 'json', the output will only contain JSON serializable types. If mode is 'python', the output may contain non-JSON-serializable Python objects. include: A set of fields to include in the output. exclude: A set of fields to exclude from the output. context: Additional context to pass to the serializer. by_alias: Whether to use the field's alias in the dictionary key if defined. exclude_unset: Whether to exclude fields that have not been explicitly set. exclude_defaults: Whether to exclude fields that are set to their default value. exclude_none: Whether to exclude fields that have a value of `None`. round_trip: If True, dumped values should be valid as input for non-idempotent types such as Json[T]. warnings: How to handle serialization errors. False/"none" ignores them, True/"warn" logs errors, "error" raises a [`PydanticSerializationError`][pydantic_core.PydanticSerializationError]. fallback: A function to call when an unknown value is encountered. If not provided, a [`PydanticSerializationError`][pydantic_core.PydanticSerializationError] error is raised. serialize_as_any: Whether to serialize fields with duck-typing serialization behavior.
Returns: A dictionary representation of the model.
- model_dump_json(self, *, indent: 'int | None' = None, include: 'IncEx | None' = None, exclude: 'IncEx | None' = None, context: 'Any | None' = None, by_alias: 'bool | None' = None, exclude_unset: 'bool' = False, exclude_defaults: 'bool' = False, exclude_none: 'bool' = False, round_trip: 'bool' = False, warnings: "bool | Literal['none', 'warn', 'error']" = True, fallback: 'Callable[[Any], Any] | None' = None, serialize_as_any: 'bool' = False) -> 'str'
- !!! abstract "Usage Documentation" `model_dump_json`
Generates a JSON representation of the model using Pydantic's `to_json` method.
Args: indent: Indentation to use in the JSON output. If None is passed, the output will be compact. include: Field(s) to include in the JSON output. exclude: Field(s) to exclude from the JSON output. context: Additional context to pass to the serializer. by_alias: Whether to serialize using field aliases. exclude_unset: Whether to exclude fields that have not been explicitly set. exclude_defaults: Whether to exclude fields that are set to their default value. exclude_none: Whether to exclude fields that have a value of `None`. round_trip: If True, dumped values should be valid as input for non-idempotent types such as Json[T]. warnings: How to handle serialization errors. False/"none" ignores them, True/"warn" logs errors, "error" raises a [`PydanticSerializationError`][pydantic_core.PydanticSerializationError]. fallback: A function to call when an unknown value is encountered. If not provided, a [`PydanticSerializationError`][pydantic_core.PydanticSerializationError] error is raised. serialize_as_any: Whether to serialize fields with duck-typing serialization behavior.
Returns: A JSON string representation of the model.
- model_post_init(self, context: 'Any', /) -> 'None'
- Override this method to perform additional initialization after `init` and `model_construct`. This is useful if you want to do some validation that requires the entire model to be initialized.