PeerDB - a collaborative database

PeerDB a database software layer which supports different types of collaboration out of the box. Build collaborative applications like you would build traditional applications and leave it to PeerDB to take care of collaboration. Common user interface components are included.

Features

Schemaless database [done]: Every document can have its own set of properties. Users can define more properties and add them to documents as part of their regular work flow, without involving database administrators. Property definitions are documents like any other that users can add and edit, individually or collaboratively.
Many types of properties [done]: Numerical and numerical ranges, dates and date ranges, keyword, textual (including rich-text), multi-media (files), relational between documents, identifiers (internal or external), references (to external sources).
Powerful search and discovery [done]: Structured data search combined with full-text search across a diverse set of documents. Automatically creates facets based on documents properties for research and analysis.
Scalable [done]: Supports millions of documents and thousands of properties. Example: used with 6 million Wikipedia articles, 80 million files, 80 million Wikidata items described with 2000 properties.
Built on industrial strength technologies [done]: PostgreSQL, Elasticsearch. HTTP/2 & HTTPS. Go, TypeScript, Vue. Mobile friendly UI. OpenID Connect authentication.
Open source [done]: Apache 2 license. No vendor lock-in. Not open core.
Collaboration built-in [in progress]: Real-time & review-based collaboration within and between teams. Open/public collaboration.
History preserving [in progress]: All changes to data are recorded to keep an audit trail and allow provenance.
Supports many user flows [in progress]: Integration of data from various sources. Adding and iterating on documents. Conflicting properties can coexist until resolved through data cleaning.
Reactive and state-syncing [planned]: Subscribe to changes to documents to keep your UI and/or state up-to-date automatically.
Proactive federation [planned]: Two-way federation where data changes are proactively pushed to participating data sources.
History is not immutable [planned]: History is not immutable, but can be changed at a later time (while preserving version identifiers) to allow for real-world use cases in collaborative apps where history might have to be redacted for legal or privacy reasons. Redactions are still logged and auditable.
Multiview [planned]: Multiple different versions of data can coexist at the same time (i.e., branches), enabling applications with diverse viewpoints.
Undo/redo [planned]: First-class support for global and local undo/redo for all types of actions.
Multilingual [planned]: Documents can be in different languages. Documents can be translated.
Annotations [planned]: Open annotation layer on top of any (part of) data.
Data and user organization [planned]: Users can organize documents into collections. Users can organize and discuss data in discussion groups.
Rating, feedback and data quality [planned]: Users can rate documents, properties and values. Users can discuss data. Users can report issues with data.
Notifications [planned]: Users can be notified of changes to data they care about.
LLM-powered [planned]: Large-language model (LLM) powered data cleaning, entity resolution and search query understanding.

Current limitations and trade-offs

Delayed search indexing: Search and query indexing is not immediate after a write but is done in batches every few seconds to improve performance.
Storing all versions and patches between them: PeerDB stores documents at all versions and also patches between those versions which allows better merge conflict resolution.
Precomputing and storing variations of documents: PeerDB precomputes and stores different variations of documents (e.g., compression) to speed-up retrieval.
Not relational: While PeerDB supports references between documents (and even to the web) it is not relational and does not assure referential integrity (but you can retrieve the version at which the integrity held).

Demos

wikipedia.peerdb.org: a search service for English Wikipedia articles, Wikimedia Commons files, and Wikidata data.
moma.peerdb.org: a search service for The Museum of Modern Art (MoMA) artists and artworks.

In Development

The project is in active development. It is open source. Follow and/or contribute to it at gitlab.com/peerdb/peerdb. Feel free to discuss and suggest ideas and use cases, too.

Acknowledgements

This project was funded through the NGI0 Discovery Fund, a fund established by NLnet with financial support from the European Commission's Next Generation Internet programme, under the aegis of DG Communications Networks, Content and Technology under grant agreement No 825322.

The project gratefully acknowledge the HPC RIVR consortium and EuroHPC JU for funding this project by providing computing resources of the HPC system Vega at the Institute of Information Science.

Funded by the European Union. Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union or European Commission. Neither the European Union nor the granting authority can be held responsible for them. Funded within the framework of the NGI Search project under grant agreement No 101069364.