sist2-python
Set of python tools to interface with sist2 index files. Used in user scripts.
- class sist2.Sist2Descriptor(id, version_major, version_minor, version_patch, root, name, rewrite_url, timestamp)[source]
Bases:
Sist2DescriptorSist2 index descriptor
- class sist2.Sist2Document(id, version, mtime, size, json_data, rel_path, path, mime, parent)[source]
Bases:
Sist2DocumentSist2 document - instantiated by sist2.Sist2Index.document_iter
- class sist2.Sist2Index(filename: str)[source]
Bases:
object- property descriptor: Sist2Descriptor
- Returns:
Index descriptor
- document_count(where: str = '') int[source]
Count the number of documents in the index
- Parameters:
where – SQL WHERE clause (ex. ‘size > 100’)
- Returns:
Number of documents in the index
- document_iter(where: str = '')[source]
Iterate documents
- Parameters:
where – SQL WHERE clause (ex. ‘size > 100’)
- Returns:
generator
- get(key: str, default=None)[source]
Get value from key-value table. This is used to store configuration or state in user scripts.
- Parameters:
key – Key
default – Default value to return if not found
- Returns:
Value or default
- register_model(id: int, name: str, url: str, path: str, size: int, type: str) None[source]
Register a machine learning model for this index.
- Parameters:
id – Model ID,
name – Name of the model, must be maximum 15 characters
url – HTTP(s) url to the model for inference in the web UI, in .onnx format.
path – Elasticsearch path. Must begin with idx_512. for indexed dense vector (max 1024-dim) or 512. for dense vectors (replace 512 with the size).
size – Size of the embedding in dimensions.
type – Must be either ‘flat’ (one embedding per document) or ‘nested’ (multiple embeddings per document).
- set(key: str, value: str | int) None[source]
Set value in key-value table.
- Parameters:
key – Key
value – Value
- sync_tag_table() None[source]
Update the tags table. You must call this function for tag filtering to function when using the SQLite search backend. This has no effect when using a ElasticSearch backend
- update_document(doc: Sist2Document) None[source]
Update a document
- Parameters:
doc – document
- upsert_embedding(id: str, start: int, end: int | None, model_id: int, embedding: bytes) None[source]
Upsert an embedding
- Parameters:
id – Document ID
start – Start offset in .content
end – (optional) End offset in .content
model_id – Model ID
embedding – Encoded float32 embeddings (use serialize_float_array() to convert)
- property versions: List[Sist2Version]
Get index version history (starts at 1, is incremented after each incremental scan)
- class sist2.Sist2Version(id, date)[source]
Bases:
Sist2VersionSist2 index version. (starts at version 1, is incremented by one for each incremental scan)