Skip to content

GitLab

  • Menu
Projects Groups Snippets
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in
  • S sibyl
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 0
    • Issues 0
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 0
    • Merge requests 0
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • digibib
  • sibyl
  • Wiki
  • schema

Last edited by Benjamin May 09, 2018
Page history

schema

Why adding a DB to ES?

The DB acts as a sync point between RDF and Koha (and eventually CMS or anything else). Koha holds item and circulation data, including statistics, RDF holds metadata on publication level and linking between people and works.

It also simplify the inspection and maintenance of the index: queries on MySQL are easier to master than inspecting ES, and changing ES schema will require a local reindex, without having to re-read the data from koha/rdf.

It also improve debugging of issues, by checking the DB it’s easier to debug when an item is missing or not. And when/where the data was wrongly transmitted.

DB

documents

column type comments / examples
id varchar PK "pub:1a2b3c4d", "place:oslo", "person:p123"
source varchar PK "catalog", "cms", ...
update_at datetime
koha_id int record Id in Koha
availability JSON availability data to index
metadata json {"title":"Abcd","pubYear":"1987"}
pop_data structured [{n:10,d:1},{n:100,d:10},...]
pop_data_stored double popularity stored on ES
reindex_at datetime IDX : When and if the document need to be reindexed

links

The DB is responsible to mirror linking inside the search container, in order to bleed scoring and popularity from linked works This might be solved in RDF though

column type comments / examples
from_node varchar FK Foreign key items.id
to_node varchar might be missing in items (still to be obtained)
type varchar "author", "work", "place", "genre", "translator"...
weight double Default 1.0

stats

column type comments / examples
id varchar FK foreign key items.id, e.g.: "recordId:12345”
source varchar PK e.g.: "koha"
popularity structured e.g.: items_count + loans_count*5
Clone repository
  • Home
  • schema