Pour les employeurs
PhD Position F/M Decentralized Data Sharing with Hybrid Graph-Relational Views


Inria
il y a un jour
Date de publication
il y a un jour
S/O
Niveau d'expérience
S/O
Temps pleinType de contrat
Temps plein
Contexte et atouts du poste

This Phd position will be in the context of IPCEI-CIS (Important Project of Common European Interest - Next Generation Cloud Infrastructure and Services) DXP (Data Exchange Platform) project involving Amadeus and three Inria research teams (Loreley, CEDAR and MAGELLAN). This project aims to design and develop an open-source management solution for a federated and distributed data exchange platform (DXP), operating in an open, scalable, and massively distributed environment (cloud-edge continuum).

The PhD will be located at The Inria Center of the University of Lorraine in the Loreley team. It will be supervised by Claudia-Lavinia Ignat, Research Director at Inria in Nancy and Stefania Dumbrava, Assistant Professor at ENSIIE/Inria Paris/Télécom Sud-Paris.

The Inria Center of the University of Lorraine is one of Inria's nine centers and has twenty project teams, located in Nancy, Strasbourg and Saarbrücken. Its activities occupy over 400 people, scientists and research and innovation support staff, including 45 different nationalities. The Inria Center is a major and recognized player in the field of digital sciences. It is at the heart of a rich R&D and innovation ecosystem: highly innovative PMEs, large industrial groups, competitiveness clusters, research and higher education players, laboratories of excellence, technological research institutes, etc.

Mission confiée

Context and Motivation

Modern data management is increasingly shaped by distribution, heterogeneity, and collaboration. Contemporary ecosystems, from scientific knowledge graphs and industrial digital twins to cross-organizational analytics platforms, must manage continuously evolving, jointly produced data shared across autonomous participants. This data often spans clouds, edges, and federated infrastructures and evolves without centralized coordination.

At the same time, data models are converging. The recent SQL/PGQ standard unifies relational querying with graph pattern matching, enabling property graphs ([Angles18]) to be defined as views over relational data. Systems such as Oracle PGQL , Google Cloud Spanner , and DuckPGQ ([Wolde23]) embody this shift, integrating tabular and graph-structured data within a single query framework.

Replication techniques such as CRDTs ([Balegas18], [Preguiça19], [Yu20], [Rault22], [Ignat24]) demonstrate how distributed systems can converge under concurrent updates while preserving constraints. However, they focus on raw state synchronization, whereas in many real-world scenarios, data sharing is guided by views, i.e., query-defined abstractions that determine which parts of the underlying data are exposed. Views can already be incrementally maintained in centralized relational systems, with mature approaches such as DBSP ([Budiu22]) and OpenIVM ([Battiston24]) efficiently propagating changes from base data to derived query results. Recent work has begun to extend incremental view maintenance to decentralized and eventually consistent settings ([Thomassen23]), where replicas evolve independently and must reconcile divergent states. In the context of graph databases, prototype systems have shown how property graph views can be defined, materialized, and incrementally updated over underlying storage layers ([Han24]). However, existing techniques stop short of addressing the more complex challenge of constraint-aware, view-based replication in hybrid graph-relational environments, where data is not only distributed but also governed by specific constraints and access control policies ([Angles21], [Angles23], [Clark22]).

This PhD project addresses this gap by shifting the focus from replicating raw datasets to replicating constraint-aware, incrementally evolving hybrid views representations that combine relational structure with graph connectivity, encode domain semantics, and remain composable and correct across distributed environments. This would enable semantics-preserving, collaborative data platforms for decentralized graph-relational ecosystems .

Objectives of the PhD

The overarching goal of this PhD is to establish the theoretical and system foundations for a replication framework centered on constraint-aware, incrementally evolving hybrid views, elevating views from static query results to first-class, shareable, and composable entities in distributed data systems. The project will pursue three main objectives:
  • Formalizing replicated hybrid views. Define a declarative and semantic foundation for views that unify relational and graph data, support selective exposure and structural transformation, and incorporate explicit constraints. This model will enable reasoning about the correctness of views under replication.
  • Designing incremental and constraint-preserving replication mechanisms. Extend replication semantics beyond raw data by introducing algorithms and protocols that propagate and merge views while preserving convergence, integrity, and domain-specific invariants. This includes defining merge operators, conflict-resolution strategies, and conditions under which replicated views remain consistent and incrementally updatable across sites.
  • Building a decentralized framework for view-based data sharing. Develop a framework in which hybrid views are first-class, shareable abstractions for data exchange among independent nodes. The framework will support their reliable management across distributed infrastructures, enabling collaborative data sharing while upholding correctness guarantees.

Research Questions

The project will address three key research questions:
  • How can hybrid graph-relational views be defined as replicable, constraint-aware abstractions, integrating both topology, and domain-specific requirements?
  • How can such views be incrementally updated and reconciled across distributed replicas while preserving correctness and integrity?
  • How can replication models make such views the unit of synchronization, ensuring convergence under concurrency?

References

[Angles18] Renzo Angles: The Property Graph Database Model . AMW, 2018.

[Angles21] Renzo Angles, Angela Bonifati, Stefania Dumbrava, George Fletcher, Keith W. Hare, Jan Hidders, Victor E. Lee, Bei Li, Leonid Libkin, Wim Martens, Filip Murlak, Josh Perryman, Ognjen Savkovic, Michael Schmidt, Juan F. Sequeda, Slawek Staworko, Dominik Tomaszuk: PG-Keys: Keys for Property Graphs . SIGMOD Conference 2021: 2423-2436.

[Angles23] Renzo Angles, Angela Bonifati, Stefania Dumbrava, George Fletcher, Alastair Green, Jan Hidders, Bei Li, Leonid Libkin, Victor Marsault, Wim Martens, Filip Murlak, Stefan Plantikow, Ognjen Savkovic, Michael Schmidt, Juan Sequeda, Slawek Staworko, Dominik Tomaszuk, Hannes Voigt, Domagoj Vrgoc, Mingxi Wu, Dusan Zivkovic: PG-Schema: Schemas for Property Graphs . Proc. ACM Manag. Data 1(2): 198:1-198:25 (2023).

[Balegas18] Valter Balegas, Sérgio Duarte, Carla Ferreira, Rodrigo Rodrigues, Nuno M. Preguiça: IPA: Invariant-preserving Applications for Weakly consistent Replicated Databases . Proc. VLDB Endow. 12(4): 404-418 (2018)

[Battiston24] Ilaria Battiston, Karan Kathuria, Peter Boncz: OpenIVM: A SQL-to-SQL Compiler for Incremental Computations . SIGMOD Companion, 2024.

[Budiu22] Mihai Budiu, Frank McSherry, Leonid Ryzhyk, and Val Tannen. 2022. DBSP: Automatic Incremental View Maintenance for Rich Query Languages . arXiv:2203.16684

[Clark22] Stanley Clark, Nikolay Yakovets, George Fletcher, Nicola Zannone: ReLOG: A Unified Framework for Relationship-Based Access Control over Graph Databases . DBSec 2022: 303-315.

[Han24] Sunwoo Han, Zachary G. Ives: Implementation Strategies for Views over Property Graphs . SIGMOD, 2024.

[Ignat24] Claudia-Lavinia Ignat, Victorien Elvinger, Habibatou Ba: SynQL: A CRDT-Based Approach for Replicated Relational Databases with Integrity Constraints . DAIS, 2024.

[Preguiça19] Nuno M. Preguiça, Carlos Baquero, Marc Shapiro: Conflict-Free Replicated Data Types (CRDTs) . Encyclopedia of Big Data Technologies, 2019.

[Rault22] Pierre-Antoine Rault, Claudia-Lavinia Ignat, Olivier Perrin: Distributed Access Control for Collaborative Applications Using CRDTs . PaPoC@EuroSys 2022: 33-38.

[Thomassen23] Thomassen, J., Yu, W. (2023). Eventually-Consistent Replicated Relations and Updatable Views. In: Abelló, A., et al. New Trends in Database and Information Systems. ADBIS 2023. Communications in Computer and Information Science, vol 1850. Springer, Cham.

[Wolde23] Daniel Wolde, Gábor Szárnyas, Peter Boncz: DuckPGQ: Bringing SQL/PGQ to DuckDB . Proc. VLDB Endow. 16(12): 4034-4037 (2023)23

[Yu20] Weijia Yu, Claudia-Lavinia Ignat: Conflict-Free Replicated Relations for Multi-Synchronous Database Management at Edg e. IEEE SMDS, 2020.

Principales activités

The research will follow a structured approach combining theoretical modeling, algorithmic design, and system development:
  • Hybrid Graph-Relational View Replication. We will develop a formal foundation for replicated hybrid graph-relational views that captures how derived data should evolve and compose across distributed replicas under concurrency. This foundation will enable reasoning about views in a replicated setting.
  • Constraint-Preserving Replication. We will generalize CRDT principles to support replication of views and their associated constraints , defining merge operators and protocols that ensure convergence and correctness under concurrent updates.
  • System Design and Evaluation. We will prototype a replication framework on top of a SQL/PGQ-compliant backend. The evaluation will target real-world scenarios, measuring scalability, maintenance overhead, convergence guarantees, and constraint enforcement performance.

Compétences

Candidates should hold a Master's degree in Computer Science or a related field, with a strong background in data management or distributed systems. Familiarity with graph database technologies or system prototyping is highly desirable.

Avantages

  • Subsidized meals
  • Partial reimbursement of public transport costs
  • Leave: 7 weeks of annual leave + 10 extra days off due to RTT (statutory reduction in working hours) + possibility of exceptional leave (sick children, moving home, etc.)
  • Possibility of teleworking (after 6 months of employment) and flexible organization of working hours
  • Professional equipment available (videoconferencing, loan of computer equipment, etc.)
  • Social, cultural and sports events and activities
  • Access to vocational training
  • Social security coverage

Rémunération

€2200 gross/month
Balises associées
-
RÉSUMÉ DE L' OFFRE
PhD Position F/M Decentralized Data Sharing with Hybrid Graph-Relational Views
Inria
Nancy
il y a un jour
S/O
Temps plein

PhD Position F/M Decentralized Data Sharing with Hybrid Graph-Relational Views