Semantics of SHACL

SHACL(SHApes Checking Language) is an RDF vocabulary for describing RDF graph structures. These graph structures are captured as shapes, which are expected to correspond to nodes in RDF graphs. These shapes identify predicates and their associated cardinalities and datatypes.

This document introduces the semantics of the SHACL core language features using inference rules.

This document is a work in progress proposal based on the core language features. The SHACL language constructs have yet to be defined by the WG

This document just assumes that some node in an RDF graph has been selected to check its shape in a schema. The triggering selection mechanisms are not defined in this document

Full SHACL syntax

The full SHACL syntax can be defined as:

A schema is a list of Labels associated with shapes

Schema ::= Label Shape *           // Schema associates a Label with a Shape

The abstract syntax of shapes can be recursively defined as:

Shape ::= Empty                          // No constraint
        | BasicArc Cardinality           // Arc with predicate, value and cardinality
		| BasicArc "+"                   // 1 or more basic arcs
		| BasicArc "?"                   // 0 or 1 basic arcs
		| BasicArc "*"                   // 0 or more basic arcs
	    | And (Shape1,Shape2,...ShapeN)  // Matches if E and F match
	    | Or (Shape1, Shape2,...ShapeN)  // Matches if either E or F match

Basic arcs can be direct arcs or inverse arcs.

	
BasicArc ::= Arc Predicate Value     // Direct arc
           | InvArc Predicate Value  // Inverse arc

Cardinality defines the minimum and maximum cardinality. The maximum cardinality can be unbounded

Cardinality ::= {Int,Int | unbounded}  // Min,Max-cardinality

Value specifies if an object matches with a set of values v1,...,vN, if it has a type identified by URI or if it is an node that conforms with the shape identified by Label

Value ::= ValueSet (v1 v2 ... vN) // Matches a set of values v1...vN
        | ValueType URI           // Matches if the value belongs to the type denoted by URI
        | ValueRef Label          // Matches if the value is a node that conforms with the shape identified by Label

Value should be extended to handle kinds like URI, BNode, Literal, and language tagged literals

Formal semantics

Description of the Formal Semantics

The semantics of a Shape schema consist of a specification of what nodes in an RDF graph match a specific shape with respect to that schema. The semantics is described formally. The formalism uses axioms and inference rules. Axioms are propositions that are provable unconditionally. An inference rule consists of one or more antecedents and exactly one consequent. An antecedent is either positive or negative. If all the positive antecedents of an inference rule are provable and none of the negative antecedents are provable, then the consequent of the inference rule is provable.

The notation for inference rules separates the antecedents from the consequent by a horizontal line: the antecedents are above the line; the consequent is below the line. If an antecedent is of the form not(p), then it is a negative antecedent; otherwise, it is a positive antecedent.

The formal semantics presented follows the same formalism as the one used for RelaxNG

Preliminaries

RDF graphs

The natural semantics infers a typing for a resource in an RDF graph.

We represent RDF graphs as sets of RDF triples, where an RDF triple is <subject,predicate,object> such that subject is a URI or a BNode, predicate is a URI and object is a URI, BNode or Literal.

A node is a subject or an object. For our purposes, an RDF graph has the following operations:

{}	Empty graph (empty set of triples)
{t}	Singleton graph with triple t
g1 ∪ g2	Union of graphs g1 and g2
g1 ∩ g2	Intersection of graphs g1 and g2
addTriple(t,ts)	The set of triples that results of adding triple t to the set of triples ts addTriple(t,ts) = {t} ∪ ts
removeTriple(ts)	A pair (t,ts') where t is a triple removed from the set of triples g and g' is the remaining triples removeTriple(ts) ∈ {(t,ts')\| {t} ∪ ts' = ts ∧ t ∉ ts' }
g.triplesAround(node)	Selects the triples that contain node as subject or as object in graph g g.triplesAround(node) = { <s,p,o> ∈ g \| s = node ∨ o = node }

Typings

The semantics of SHACL consists of assigning typings to nodes in an RDF graph. A typing is a map from RDF nodes (iri's or bNodes) to sets of labels in a Schema.

We define the following operations on shape typings:

{}	empty shape typing
addType(node,label,typing)	the result of associating type label to node in typing
combineTypings(t1, t2)	combination of typings t1 and t2
contains(t,node,label)	succeeds if typing t asserts that node has shape label

Context

The shape schema validator acts in a context. A context is a tuple (s, g, t) where s is a Schema, g is an RDF Graph and t is a Typing.

We define the following operations on contexts:

ctx.typing	current typing in ctx
ctx { node -> label }	Given a context with typing t, represents the same context where typing is addType(node,label,t)

matchNode

ctx |- matchNode(node, label) checks if a node conforms with the shape associated with label in context ctx. If it conforms, it evaluates to (t,cs,rs) where t is a typing, cs is a set of checked triples, rs is a set of remaining triples.

matchNode	ctx.schema(label) = shape ctx.graph.triplesAround(node) = ts ctx { node -> label } \|- matchShape(ts, shape) = (t,cs,rs)
	ctx \|- matchNode(node,label) = (t,cs,rs)

matchShape

ctx |- matchShape(triples, shape) checks if a set of triples triples conforms with shape in context ctx. If it conforms, it evaluates to (t,cs,rs) where t is a typing, cs is a set of checked triples, rs is a set of remaining triples.

The empty shape matches any set of triples without any constraint.

Empty
	ctx \|- matchShape(ts,Empty) = (ctx.typing,{},ts)

We distinguish 6 cases for basic arcs depending on the cardinality.

When the cardinality is {0,unbounded} and there is no arc that matches it matches the set of triples without consuming any triple

BasicArc_unbounded2 ctx |- noMatchAny(ts,basicArc)

ctx |- matchShape(ts,basicArc {0,unbounded}) = (ctx.typing,{},ts)

When the cardinality is {0,unbounded} and there is some arc that matches it consumes it

BasicArc_unbounded3	remove(ts) = (t,ts') ctx \|- matchBasicArc(t,basicArc) = t1 ctx \|- matchBasicArc(ts',basicArc {0,unbounded}) = (t2,cs,rs)
	ctx \|- matchShape(ts,basicArc {0,unbounded}) = (combineTypings(t1,t2),addTriple(t,cs),rs)

When the cardinality is {m,unbounded} for m > 0 it matches if there is one triple that matches and if the rest of triples matches with a cardinality of {m - 1, unbounded}

Notice that it only matches if there is at least one triple

BasicArc_unbounded3	m > 0 removeTriple(ts) = (t,ts') ctx \|- matchArc(t,basicArc) = t1 ctx \|- matchShape(ts',basicArc {m - 1,unbounded}) = (t2,cs,rs)
	ctx \|- matchShape(ts, basicArc {m,unbounded}) = (combineTypings(t1,t2),addTriple(t,cs),rs)

When the cardinality is {0, n} for n >= 0 it can match if there is not triple that matches or if there is one triple that matches and the rest of triples matches with a cardinality of {0, n-1}

BasicArc_bounded1 n >= 0 ctx |- noMatchAny(ts,basicArc))

ctx |- matchShape(ts, basicArc {0, n}) = (ctx.typing,{},ts)

BasicArc_bounded2	n > 0 removeTriple(ts) = (t,ts') matchArc(t,basicArc) = t1 ctx \|- matchShape(ts', basicArc {0, n - 1}) = (t2,cs,rs)
	ctx \|- matchShape(ts, basicArc {0, n}) = (combineTypings(t1,t2), addTriple(t,cs), rs)

If the cardinality if {m, n} it matches if there is one triple that matches and if the rest of triples match with the cardinality {m - 1, n - 1}

BasicArc_bounded3	m > 0, n >= m removeTriple(ts) = (t,ts') matchBasicArc(t, basicArc) = t1 ctx \|- matchShape(ts', basicArc {m-1, n-1} ) = (t2, cs, rs)
	ctx \|- matchShape(ts, basicArc {m, n}) = (combineTypings(t1,t2), addTriple(t,cs), rs)

If the shape is a conjunction it matches if it matches the first element and then, the second with the remaining triples

And	ctx \|- matchShape(ts, e1) = (t1, cs1, rs1) ctx \|- matchShape(ts, e2) = (t2, cs2, rs2)
	ctx \|- matchShape(ts, And(e1, e2) = (combineTypings(t1,t2), cs1 union cs2, rs1 intersection rs2)

If the shape is a disjunction it matches if either the first element matches or the second one does

Or_1	ctx \|- matchShape(ts, e1) = (t, cs, rs)
	ctx \|- matchShape(ts, Or(e1,e2) = (t, cs, rs)

Or_2	ctx \|- matchShape(ts, e2) = (t, cs, rs)
	ctx \|- matchShape(ts, Or(e1,e2) = (t, cs, rs)

matchBasicArc

This section covers how to match a basic Arc: ctx|-matchBasicArc(triple,basicArc).

There are two cases: direct arcs match the value with the object of the triple while inverse arcs match the value with the subject the triple

ObjectArc	ctx \|- matchValue(o,v) = t
	ctx \|- matchBasicArc(<s,p,o>, Arc p v) = t

SubjectArc	ctx \|- matchValue(s,v) = t
	ctx \|- matchBasicArc(<s,p,o>, InvArc p v) = t

matchValue

valueSet	x ∈ set
	ctx \|- matchValue(x, ValueSet(set)) = ctx.typing

noValueType needs more work to handle different possibilities

valueType	hastype(x,t)
	ctx \|- matchValue(x, ValueType(t)) = ctx.typing

ValueRef handles references to shapes. There are two cases, the first initial case, when the context already contains the declaration that x has shape label just evaluates to the current typing.

valueRef_1	contains(ctx.typing, x, label)
	ctx \|- matchValue(x, ValueRef(label)) = ctx.typing

The second case, when the context does not contain the declaration that x has the shape label is solved by trying to match x with shape label in the graph.

valueRef_2	not(contains(ctx, x, label)) ctx \|- matchNode(x,v) = (t,_,_)
	ctx \|- matchValue(x, ValueRef(label)) = t

noMatchAny

noMatchArcAny takes a set of triples and a basic arc and checks that there is no triple that matches the basic arc

It has two possibilities.

noMatchAny_1
	ctx \|- noMatchAny({}, basicArc)

noMatchAny_2	removeTriple(ts) = (t,ts') ctx \|- noMatchBasicArc(t,basicArc) ctx \|- noMatchAny(ts',basicArc)
	ctx \|- noMatchAny(ts, basicArc)

noMatchBasicArc

noMatchBasicArc

If the predicates are different, then it succeeds

noMatchBasicArc_1	p ≠ p'
	ctx \|- noMatchBasicArc(<s,p,o>, Arc p' v)

If the predicates are the same it succeeds if the value doesn't match. There are two cases depending on direct/inverse arcs

noMatchSubjectArc	ctx \|- noMatchValue(o,v)
	ctx \|- noMatchBasicArc(<s,p,o>, Arc p v)

noMatchObjectArc	ctx \|- noMatchValue(s,v)
	ctx \|- noMatchBasicArc(<s,p,o>, InvArc p v)

noMatchValue

It succeeds if the value doesn't match with the node.

noValueSet	x ∉ set
	ctx \|- matchValue(x, ValueSet(set))

noValueType needs more work to handle different possibilities

noValueType	hasNoType(x,t)
	ctx \|- matchValue(x, ValueType(t))

noValueRef succeeds if the the system can not determine that the node x has shape label.

noValueRef	ctx \|- matchValue(x, ValueRef(label)) = ctx.typing notContains(ctx.typing, x, label)
	ctx \|- noMatchValue(x, ValueRef(label))

Introduction