sist2-python

Set of python tools to interface with sist2 index files. Used in user scripts.


class sist2.Sist2Descriptor(id, version_major, version_minor, version_patch, root, name, rewrite_url, timestamp)[source]

Bases: Sist2Descriptor

Sist2 index descriptor

class sist2.Sist2Document(id, version, mtime, size, json_data, rel_path, path, mime, parent)[source]

Bases: Sist2Document

Sist2 document - instantiated by sist2.Sist2Index.document_iter

class sist2.Sist2Index(filename: str)[source]

Bases: object

commit() None[source]

Commit changes to the database

property descriptor: Sist2Descriptor
Returns:

Index descriptor

document_count(where: str = '') int[source]

Count the number of documents in the index

Parameters:

where – SQL WHERE clause (ex. ‘size > 100’)

Returns:

Number of documents in the index

document_iter(where: str = '')[source]

Iterate documents

Parameters:

where – SQL WHERE clause (ex. ‘size > 100’)

Returns:

generator

get(key: str, default=None)[source]

Get value from key-value table. This is used to store configuration or state in user scripts.

Parameters:
  • key – Key

  • default – Default value to return if not found

Returns:

Value or default

get_thumbnail(id: str) bytes | None[source]
Parameters:

id – Document id

Returns:

Thumbnail data

register_model(id: int, name: str, url: str, path: str, size: int, type: str) None[source]

Register a machine learning model for this index.

Parameters:
  • id – Model ID,

  • name – Name of the model, must be maximum 15 characters

  • url – HTTP(s) url to the model for inference in the web UI, in .onnx format.

  • path – Elasticsearch path. Must begin with idx_512. for indexed dense vector (max 1024-dim) or 512. for dense vectors (replace 512 with the size).

  • size – Size of the embedding in dimensions.

  • type – Must be either ‘flat’ (one embedding per document) or ‘nested’ (multiple embeddings per document).

set(key: str, value: str | int) None[source]

Set value in key-value table.

Parameters:
  • key – Key

  • value – Value

sync_tag_table() None[source]

Update the tags table. You must call this function for tag filtering to function when using the SQLite search backend. This has no effect when using a ElasticSearch backend

update_document(doc: Sist2Document) None[source]

Update a document

Parameters:

doc – document

upsert_embedding(id: str, start: int, end: int | None, model_id: int, embedding: bytes) None[source]

Upsert an embedding

Parameters:
  • id – Document ID

  • start – Start offset in .content

  • end – (optional) End offset in .content

  • model_id – Model ID

  • embedding – Encoded float32 embeddings (use serialize_float_array() to convert)

property versions: List[Sist2Version]

Get index version history (starts at 1, is incremented after each incremental scan)

class sist2.Sist2Version(id, date)[source]

Bases: Sist2Version

Sist2 index version. (starts at version 1, is incremented by one for each incremental scan)

sist2.print_progress(done: int = 0, count: int = 0) None[source]

Send current progress to sist2-admin. It will be displayed in the Tasks page

Parameters:
  • done – Number of files processed

  • count – Total number of files to process (including files that have been processed)

sist2.serialize_float_array(array) bytes[source]
Parameters:

array – float32 array (numpy etc.)

Returns:

Encoded bytes, suitable for the embeddings table in sist2