Why adding a DB to ES?
The DB acts as a sync point between RDF and Koha (and eventually CMS or anything else). Koha holds item and circulation data, including statistics, RDF holds metadata on publication level and linking between people and works.
It also simplify the inspection and maintenance of the index: queries on MySQL are easier to master than inspecting ES, and changing ES schema will require a local reindex, without having to re-read the data from koha/rdf.
It also improve debugging of issues, by checking the DB it’s easier to debug when an item is missing or not. And when/where the data was wrongly transmitted.
DB
documents
column | type | comments / examples |
---|---|---|
id | varchar PK | "pub:1a2b3c4d", "place:oslo", "person:p123" |
source | varchar PK | "catalog", "cms", ... |
update_at | datetime | |
koha_id | int | record Id in Koha |
availability | JSON | availability data to index |
metadata | json | {"title":"Abcd","pubYear":"1987"} |
pop_data | structured | [{n:10,d:1},{n:100,d:10},...] |
pop_data_stored | double | popularity stored on ES |
reindex_at | datetime | IDX : When and if the document need to be reindexed |
links
The DB is responsible to mirror linking inside the search container, in order to bleed scoring and popularity from linked works This might be solved in RDF though
column | type | comments / examples |
---|---|---|
from_node | varchar FK | Foreign key items.id |
to_node | varchar | might be missing in items (still to be obtained) |
type | varchar | "author", "work", "place", "genre", "translator"... |
weight | double | Default 1.0 |
stats
column | type | comments / examples |
---|---|---|
id | varchar FK | foreign key items.id, e.g.: "recordId:12345” |
source | varchar PK | e.g.: "koha" |
popularity | structured | e.g.: items_count + loans_count*5 |