SHACL(SHApes Checking Language) is an RDF vocabulary for describing RDF graph structures. These graph structures are captured as shapes, which are expected to correspond to nodes in RDF graphs. These shapes identify predicates and their associated cardinalities and datatypes.
This document introduces the semantics of the SHACL core language features using inference rules.
This document is a work in progress proposal based on the core language features. The SHACL language constructs have yet to be defined by the WG
This document just assumes that some node in an RDF graph has been selected to check its shape in a schema. The triggering selection mechanisms are not defined in this document
This documents contains a formal semantic specification of SHACL. The formal semantics is defined in terms of inference rules and axioms. The semantics has been translated to Haskell to obtain an executable semantic prototype. The source code is available here.
The full SHACL syntax can be defined as:
A schema is a list of Labels associated with shapes
Schema ::= Label Shape * // Schema associates a Label with a Shape
The abstract syntax of shapes can be recursively defined as:
Shape ::= Empty // No constraint | BasicArc Cardinality // Arc with predicate, value and cardinality | BasicArc "+" // 1 or more basic arcs | BasicArc "?" // 0 or 1 basic arcs | BasicArc "*" // 0 or more basic arcs | And (Shape1,Shape2,...ShapeN) // Matches if E and F match | Or (Shape1, Shape2,...ShapeN) // Matches if either E or F match
Basic arcs can be direct arcs or inverse arcs.
BasicArc ::= Arc Predicate Value // Direct arc | InvArc Predicate Value // Inverse arc
Cardinality defines the minimum and maximum cardinality.
The maximum cardinality can be unbounded
Cardinality ::= {Int,Int | unbounded} // Min,Max-cardinality
Value specifies if an object matches with a set of values v1,...,vN, if it has a type identified by URI or if it is an node that conforms with the shape identified by Label
Value ::= ValueSet (v1 v2 ... vN) // Matches a set of values v1...vN | ValueType URI // Matches if the value belongs to the type denoted by URI | ValueRef Label // Matches if the value is a node that conforms with the shape identified by Label
Value should be extended to handle kinds like URI, BNode, Literal, and language tagged literals
# shapes (Turtle) sh:IssueShape a ldom:Shape ; ldom:property [ ldom:predicate ex:state ;ldom:allowedValue (ex:unassigned ex:assigned) ;ldom:allowedValue ex:unassigned , ex:assigned ;ldom:minCount 1 ; ldom:maxCount 1 ] ; ldom:property [ ldom:predicate ex:reportedBy ; ldom:valueShape sh:UserShape ; ldom:minCount 1 ; ldom:maxCount 1 ] . sh:UserShape a ldom:Shape ; ldom:property [ ldom:predicate foaf:name ; ldom:valueType xsd:string ; ldom:minCount 1 ; ldom:maxCount 1 ] ; ldom:property [ ldom:predicate foaf:mbox ; ldom:nodeType ldom:IRI ; ldom:minCount 1 ] .
The example can be defined in the abstract syntax as:
IssueShape (And (Arc ex:state (ValueSet (ex:unasigned ex:asigned)) {1,1}) (Arc ex:reportedBy (ValueRef sh:UserShape) {1,1}) ) UserShape (And (Arc foaf:name (ValueType xsd:string) {1,1}) (Arc foaf:mbox (ValueType URI) {1,unbounded}) )
basicArc+ = basicArc { 1,unbounded } basicArc? = basicArc { 0,1 } basicArc* = basicArc { 0,unbounded } And(e1,e2,...eN) = And(e1,And(e2,...And(eN,Empty))) Or(e1,e2,...eN) = Or(e1,Or(e2,...Or(eN,Empty)))
The full SHACL simplified syntax can be defined as
Shape ::= Empty // No constraint | BasicArc Cardinality // Basic arc with cardinality | And Shape Shape // Matches if E and F match | Or Shape Shape // Matches if either E or F match
The semantics of a Shape schema consist of a specification of what nodes in an RDF graph match a specific shape with respect to that schema. The semantics is described formally. The formalism uses axioms and inference rules. Axioms are propositions that are provable unconditionally. An inference rule consists of one or more antecedents and exactly one consequent. An antecedent is either positive or negative. If all the positive antecedents of an inference rule are provable and none of the negative antecedents are provable, then the consequent of the inference rule is provable.
The notation for inference rules separates the antecedents from the consequent by a horizontal line: the antecedents are above the line; the consequent is below the line. If an antecedent is of the form not(p), then it is a negative antecedent; otherwise, it is a positive antecedent.
The formal semantics presented follows the same formalism as the one used for RelaxNG
The natural semantics infers a typing for a resource in an RDF graph.
We represent RDF graphs as sets of RDF triples, where an RDF triple is <subject,predicate,object> such that subject is a URI or a BNode, predicate is a URI and object is a URI, BNode or Literal.
A node is a subject or an object. For our purposes, an RDF graph has the following operations:
{} | Empty graph (empty set of triples) |
{t} | Singleton graph with triple t |
g1 ∪ g2 | Union of graphs g1 and g2 |
g1 ∩ g2 | Intersection of graphs g1 and g2 |
addTriple(t,ts) | The set of triples that results of adding triple t to
the set of triples ts
addTriple(t,ts) = {t} ∪ ts |
removeTriple(ts) |
A pair (t,ts') where t is a triple removed from the set of triples g and g' is the remaining triples removeTriple(ts) ∈ {(t,ts')| {t} ∪ ts' = ts ∧ t ∉ ts' } |
g.triplesAround(node) |
Selects the triples that contain node as subject or as object in graph g
g.triplesAround(node) = { <s,p,o> ∈ g | s = node ∨ o = node } |
The semantics of SHACL consists of assigning typings to nodes in an RDF graph. A typing is a map from RDF nodes (iri's or bNodes) to sets of labels in a Schema.
We define the following operations on shape typings:
{} | empty shape typing |
addType(node,label,typing) | the result of associating type label to node in typing |
combineTypings(t1, t2) | combination of typings t1 and t2 |
contains(t,node,label) | succeeds if typing t asserts that node has shape label |
The shape schema validator acts in a context. A context is a tuple (s, g, t) where s is a Schema, g is an RDF Graph and t is a Typing.
We define the following operations on contexts:ctx.typing | current typing in ctx |
ctx { node -> label } | Given a context with typing t, represents the same context where typing is addType(node,label,t) |
matchNode | ctx.schema(label) = shape ctx.graph.triplesAround(node) = ts ctx { node -> label } |- matchShape(ts, shape) = (t,cs,rs) |
ctx |- matchNode(node,label) = (t,cs,rs) |
ctx |- matchShape(triples, shape) checks if a set of triples triples conforms with shape in context ctx. If it conforms, it evaluates to (t,cs,rs) where t is a typing, cs is a set of checked triples, rs is a set of remaining triples.
The empty shape matches any set of triples without any constraint.
Empty | |
ctx |- matchShape(ts,Empty) = (ctx.typing,{},ts) |
We distinguish 6 cases for basic arcs depending on the cardinality.
When the cardinality is {0,unbounded} and there is no arc that matches it matches the set of triples without consuming any triple
BasicArc_unbounded2 | ctx |- noMatchAny(ts,basicArc) |
ctx |- matchShape(ts,basicArc {0,unbounded}) = (ctx.typing,{},ts) |
When the cardinality is {0,unbounded} and there is some arc that matches it consumes it
BasicArc_unbounded3 | remove(ts) = (t,ts') ctx |- matchBasicArc(t,basicArc) = t1 ctx |- matchBasicArc(ts',basicArc {0,unbounded}) = (t2,cs,rs) |
ctx |- matchShape(ts,basicArc {0,unbounded}) = (combineTypings(t1,t2),addTriple(t,cs),rs) |
When the cardinality is {m,unbounded} for m > 0 it matches if there is one triple that matches and if the rest of triples matches with a cardinality of {m - 1, unbounded}
Notice that it only matches if there is at least one triple
BasicArc_unbounded3 | m > 0 removeTriple(ts) = (t,ts') ctx |- matchArc(t,basicArc) = t1 ctx |- matchShape(ts',basicArc {m - 1,unbounded}) = (t2,cs,rs) |
ctx |- matchShape(ts, basicArc {m,unbounded}) = (combineTypings(t1,t2),addTriple(t,cs),rs) |
When the cardinality is {0, n} for n >= 0 it can match if there is not triple that matches or if there is one triple that matches and the rest of triples matches with a cardinality of {0, n-1}
BasicArc_bounded1 | n >= 0 ctx |- noMatchAny(ts,basicArc)) |
ctx |- matchShape(ts, basicArc {0, n}) = (ctx.typing,{},ts) |
BasicArc_bounded2 | n > 0 removeTriple(ts) = (t,ts') matchArc(t,basicArc) = t1 ctx |- matchShape(ts', basicArc {0, n - 1}) = (t2,cs,rs) |
ctx |- matchShape(ts, basicArc {0, n}) = (combineTypings(t1,t2), addTriple(t,cs), rs) |
If the cardinality if {m, n} it matches if there is one triple that matches and if the rest of triples match with the cardinality {m - 1, n - 1}
BasicArc_bounded3 | m > 0, n >= m removeTriple(ts) = (t,ts') matchBasicArc(t, basicArc) = t1 ctx |- matchShape(ts', basicArc {m-1, n-1} ) = (t2, cs, rs) |
ctx |- matchShape(ts, basicArc {m, n}) = (combineTypings(t1,t2), addTriple(t,cs), rs) |
If the shape is a conjunction it matches if it matches the first element and then, the second with the remaining triples
And | ctx |- matchShape(ts, e1) = (t1, cs1, rs1) ctx |- matchShape(ts, e2) = (t2, cs2, rs2) |
ctx |- matchShape(ts, And(e1, e2) = (combineTypings(t1,t2), cs1 union cs2, rs1 intersection rs2) |
If the shape is a disjunction it matches if either the first element matches or the second one does
Or_1 | ctx |- matchShape(ts, e1) = (t, cs, rs) |
ctx |- matchShape(ts, Or(e1,e2) = (t, cs, rs) |
Or_2 | ctx |- matchShape(ts, e2) = (t, cs, rs) |
ctx |- matchShape(ts, Or(e1,e2) = (t, cs, rs) |
This section covers how to match a basic Arc: ctx|-matchBasicArc(triple,basicArc).
There are two cases: direct arcs match the value with the object of the triple while inverse arcs match the value with the subject the triple
ObjectArc | ctx |- matchValue(o,v) = t |
ctx |- matchBasicArc(<s,p,o>, Arc p v) = t |
SubjectArc | ctx |- matchValue(s,v) = t |
ctx |- matchBasicArc(<s,p,o>, InvArc p v) = t |
valueSet | x ∈ set |
ctx |- matchValue(x, ValueSet(set)) = ctx.typing |
noValueType needs more work to handle different possibilities
valueType | hastype(x,t) |
ctx |- matchValue(x, ValueType(t)) = ctx.typing |
ValueRef handles references to shapes. There are two cases, the first initial case, when the context already contains the declaration that x has shape label just evaluates to the current typing.
valueRef_1 | contains(ctx.typing, x, label) |
ctx |- matchValue(x, ValueRef(label)) = ctx.typing |
The second case, when the context does not contain the declaration that x has the shape label is solved by trying to match x with shape label in the graph.
valueRef_2 | not(contains(ctx, x, label)) ctx |- matchNode(x,v) = (t,_,_) |
ctx |- matchValue(x, ValueRef(label)) = t |
noMatchArcAny takes a set of triples and a basic arc and checks that there is no triple that matches the basic arc
It has two possibilities.
noMatchAny_1 | |
ctx |- noMatchAny({}, basicArc) |
noMatchAny_2 | removeTriple(ts) = (t,ts') ctx |- noMatchBasicArc(t,basicArc) ctx |- noMatchAny(ts',basicArc) |
ctx |- noMatchAny(ts, basicArc) |
noMatchBasicArc
If the predicates are different, then it succeeds
noMatchBasicArc_1 | p ≠ p' |
ctx |- noMatchBasicArc(<s,p,o>, Arc p' v) |
If the predicates are the same it succeeds if the value doesn't match. There are two cases depending on direct/inverse arcs
noMatchSubjectArc | ctx |- noMatchValue(o,v) |
ctx |- noMatchBasicArc(<s,p,o>, Arc p v) |
noMatchObjectArc | ctx |- noMatchValue(s,v) |
ctx |- noMatchBasicArc(<s,p,o>, InvArc p v) |
It succeeds if the value doesn't match with the node.
noValueSet | x ∉ set |
ctx |- matchValue(x, ValueSet(set)) |
noValueType needs more work to handle different possibilities
noValueType | hasNoType(x,t) |
ctx |- matchValue(x, ValueType(t)) |
noValueRef succeeds if the the system can not determine that the node x has shape label.
noValueRef | ctx |- matchValue(x, ValueRef(label)) = ctx.typing notContains(ctx.typing, x, label) |
ctx |- noMatchValue(x, ValueRef(label)) |
The core SHACL language can be extended with the following constructs:
Shape ::= Not Shape // Matches if the shape does not match | Close Shape // Matches if it matches the shape and there are no remaining triples | Xor Shape Shape // Matches if either the first shape or the second match, but not both
The axiomatic definition of these constructs are:
If the shape is a negation of a shape it matches if the shape does not match
Not | ctx |- not(matchShape(ts,e)) |
ctx |- matchShape(ts, Not(e)) = (t, ts, {}) |
A set of triples matches a closed shape if the set of triples matches the shape and there are no remaining triples
Close | ctx |- matchShape(ts,e) = (t,cs,{}) |
ctx |- matchShape(ts, Closed(e)) = (t, cs, {}) |
The XOr construct is defined in terms of negation as:
Xor(e1,e2) = Or(And(e1,not(e2)),And(e2,not(e2)))
We would like to acknowledge the contributions of Peter F. Patel-Schneider, Eric Prud'hommeaux, Iovka Boneva and the other members of the W3c Data Shapes Working group.