Improving database constraints performance.


I am currently working on a model that represents 'temporal' trees (nodes/edges + validity from/to dates).

To enforce integrity I use database constraints. And because database constraints are queries that access all the data of a subset of the base tables (and not a subset of the tuples of base tables), I am running into performance problems.

So I tried several solution, but no one is completely satisfying.

1. Instead of having 1 base table per entity, partition the data into 2 base tables OLD and NEW. OLD contains 'validated, old' data and NEW contains 'new' data. DB constraints refer to the NEW table only.

Create a view from the union of these 2 tables. Control the update propagation of that view so that:
- inserts are done on the NEW table.Thus, inserts will trigger the DB constraints on NEW table only.
- Updated/deleted tuples are first deleted from OLD and copied to NEW and then are updated in /deleted from NEW.

After a certain time or after the NEW table is too large, transfer the data from NEW to OLD.
This solution works fine providing that DB constraints do not need to access to OLD table.

2. Precompute the queries (or part of the queries) that appear in the DB constraints. This can be done using 'after' deferred event handlers.
This solution is rather complex because the pre-computation must be done incrementaly (otherwise the gain in performance would dissapear). But DB constraints can still be expressed as non-restricted queries and we have one entity per table.

3. For each base table, create a companion log table that stores, per transaction, all the insert/update/delete on that base table.
Fill these log tables using event handlers.
Express the DB constraint queries in terms of the base tables and log tables, using the tuples in the log table to restrict the queries.
The problem here is to delete the contents of the log tables before a transaction starts. I wonder whether it is possible to do that inside the server (and not from the frontend).

4. Use event handlers instead of db constraints. But in that case, the constraints are no longer declarative, it is much like putting business rules in stored procedures or in the business layer (à la java or C# with all known bad consequences). It looks like a step backwards.
Moreover when a db constraint refers to several tables (as it is often the case), the event handlers of each table must somewhat collaborate to make sure that this constraint is not checked twice.

So far, I got the best results with method 4 but, as I said, this is not really a solution.
I think that method 3 is the closest to a good solution but, to be useable, it would require some support from the server to collect all the data that is modified during a transaction.

Unless there is another solution that I am not aware of...

Thank you for your comments



Improving Constraint Performance

Hi J-C,

Transition constraints are what the system uses internally to rewrite reference constraints, and often result in significant performance improvements for equivalent constraints. I agree that the cost is high in terms of development because of the complexity of transforming a database constraint to an equivalent set of transition constraints, but the run-time performance should be better than the equivalent implementation using deferred event handlers.

Bryn Rhodes

Re: Improving database constraints performance.

Have you tried transition constraints? That's what I use for my temporal data.

Improving database constraints performance

Thank you for your tip.

In fact, transition constraints or event handlers have pretty much the same cost in terms of development.

Anyway, I'd be interested in knowing exactly how you check these constraints.

Do you think we could exchange some code snippets?



Code snippet

Here's a simple example---simpler than what you need, but it's what I've got. I believe there are examples in the Dataphor documentation as well. The transition constraint here makes sure that no two rows with the same TeamID have overlapping LocationFrom/LocationTo intervals.

create table FieldTeamLocation
LocationID : FieldTeamLocationID { default GetNextGenerator("FieldTeamLocationID") },
KommuneCode : KommuneCode,
LocationName : Name,
TeamID : FieldTeamID,
LocationAddress : LongDescription,
LocationFrom : Date,
LocationTo : Date,
constraint ValidPeriod LocationTo >= LocationFrom
tags { DAE.SimpleMessage = "'To' may not be earlier than 'From'" },
transition constraint NoOverlap
on insert
not exists (FieldTeamLocation where LocationID <> new.LocationID
and TeamID = new.TeamID
and new.LocationFrom <= LocationTo
and new.LocationTo >= LocationFrom)
on update
new.LocationID = old.LocationID and
not exists (FieldTeamLocation where LocationID <> new.LocationID
and TeamID = new.TeamID
and new.LocationFrom <= LocationTo
and new.LocationTo >= LocationFrom)
tags { DAE.SimpleMessage = "Location periods for a particular Field Team may not overlap" },
key { LocationID },
key { LocationName },
key { TeamID, LocationFrom },
key { TeamID, LocationTo }