SHACL(SHApes Checking Language) is an RDF vocabulary for describing RDF graph structures. These graph structures are captured as shapes, which are expected to correspond to nodes in RDF graphs. These shapes identify predicates and their associated cardinalities and datatypes.

This document introduces the semantics of the SHACL core language features using inference rules.

This document is a work in progress proposal based on the core language features. The SHACL language constructs have yet to be defined by the WG

This document just assumes that some node in an RDF graph has been selected to check its shape in a schema. The triggering selection mechanisms are not defined in this document

Introduction

This documents contains a formal semantic specification of SHACL. The formal semantics is defined in terms of inference rules and axioms. The semantics has been translated to Haskell to obtain an executable semantic prototype. The source code is available here.

Full SHACL syntax

The full SHACL syntax can be defined as:

A schema is a list of Labels associated with shapes

Schema ::= Label Shape *           // Schema associates a Label with a Shape

The abstract syntax of shapes can be recursively defined as:

Shape ::= Empty                          // No constraint
        | BasicArc Cardinality           // Arc with predicate, value and cardinality
		| BasicArc "+"                   // 1 or more basic arcs
		| BasicArc "?"                   // 0 or 1 basic arcs
		| BasicArc "*"                   // 0 or more basic arcs
	    | And (Shape1,Shape2,...ShapeN)  // Matches if E and F match
	    | Or (Shape1, Shape2,...ShapeN)  // Matches if either E or F match

Basic arcs can be direct arcs or inverse arcs.

	
BasicArc ::= Arc Predicate Value     // Direct arc
           | InvArc Predicate Value  // Inverse arc

Cardinality defines the minimum and maximum cardinality. The maximum cardinality can be unbounded

Cardinality ::= {Int,Int | unbounded}  // Min,Max-cardinality

Value specifies if an object matches with a set of values v1,...,vN, if it has a type identified by URI or if it is an node that conforms with the shape identified by Label

Value ::= ValueSet (v1 v2 ... vN) // Matches a set of values v1...vN
        | ValueType URI           // Matches if the value belongs to the type denoted by URI
        | ValueRef Label          // Matches if the value is a node that conforms with the shape identified by Label 

Value should be extended to handle kinds like URI, BNode, Literal, and language tagged literals

Example

# shapes (Turtle)
sh:IssueShape a ldom:Shape ;
    ldom:property [                                     
        ldom:predicate ex:state ;
ldom:allowedValue (ex:unassigned ex:assigned) ;
ldom:allowedValue ex:unassigned , ex:assigned ;
ldom:minCount 1 ; ldom:maxCount 1 ] ; ldom:property [ ldom:predicate ex:reportedBy ; ldom:valueShape sh:UserShape ; ldom:minCount 1 ; ldom:maxCount 1 ] . sh:UserShape a ldom:Shape ; ldom:property [ ldom:predicate foaf:name ; ldom:valueType xsd:string ; ldom:minCount 1 ; ldom:maxCount 1 ] ; ldom:property [ ldom:predicate foaf:mbox ; ldom:nodeType ldom:IRI ; ldom:minCount 1 ] .

The example can be defined in the abstract syntax as:

IssueShape 
 (And (Arc ex:state (ValueSet (ex:unasigned ex:asigned)) {1,1})
      (Arc ex:reportedBy (ValueRef sh:UserShape) {1,1})
 )

UserShape 
 (And (Arc foaf:name (ValueType xsd:string) {1,1})
       (Arc foaf:mbox (ValueType URI) {1,unbounded})
 )

Simplification rules

basicArc+ = basicArc { 1,unbounded }
basicArc? = basicArc { 0,1 }
basicArc* = basicArc { 0,unbounded }

And(e1,e2,...eN) = And(e1,And(e2,...And(eN,Empty)))
Or(e1,e2,...eN) = Or(e1,Or(e2,...Or(eN,Empty)))

Simple syntax

The full SHACL simplified syntax can be defined as

Shape ::= Empty                   // No constraint
       | BasicArc Cardinality     // Basic arc with cardinality
	   | And Shape Shape          // Matches if E and F match
	   | Or Shape Shape           // Matches if either E or F match

Formal semantics

Description of the Formal Semantics

The semantics of a Shape schema consist of a specification of what nodes in an RDF graph match a specific shape with respect to that schema. The semantics is described formally. The formalism uses axioms and inference rules. Axioms are propositions that are provable unconditionally. An inference rule consists of one or more antecedents and exactly one consequent. An antecedent is either positive or negative. If all the positive antecedents of an inference rule are provable and none of the negative antecedents are provable, then the consequent of the inference rule is provable.

The notation for inference rules separates the antecedents from the consequent by a horizontal line: the antecedents are above the line; the consequent is below the line. If an antecedent is of the form not(p), then it is a negative antecedent; otherwise, it is a positive antecedent.

The formal semantics presented follows the same formalism as the one used for RelaxNG

Preliminaries

RDF graphs

The natural semantics infers a typing for a resource in an RDF graph.

We represent RDF graphs as sets of RDF triples, where an RDF triple is <subject,predicate,object> such that subject is a URI or a BNode, predicate is a URI and object is a URI, BNode or Literal.

A node is a subject or an object. For our purposes, an RDF graph has the following operations:

{}Empty graph (empty set of triples)
{t} Singleton graph with triple t
g1 ∪ g2 Union of graphs g1 and g2
g1 ∩ g2 Intersection of graphs g1 and g2
addTriple(t,ts) The set of triples that results of adding triple t to the set of triples ts

addTriple(t,ts) = {t} ∪ ts

removeTriple(ts)

A pair (t,ts') where t is a triple removed from the set of triples g and g' is the remaining triples

removeTriple(ts) ∈ {(t,ts')| {t} ∪ ts' = ts ∧ t ∉ ts' }

g.triplesAround(node) Selects the triples that contain node as subject or as object in graph g

g.triplesAround(node) = { <s,p,o> ∈ g | s = node ∨ o = node }

Typings

The semantics of SHACL consists of assigning typings to nodes in an RDF graph. A typing is a map from RDF nodes (iri's or bNodes) to sets of labels in a Schema.

We define the following operations on shape typings:

{}empty shape typing
addType(node,label,typing) the result of associating type label to node in typing
combineTypings(t1, t2) combination of typings t1 and t2
contains(t,node,label) succeeds if typing t asserts that node has shape label

Context

The shape schema validator acts in a context. A context is a tuple (s, g, t) where s is a Schema, g is an RDF Graph and t is a Typing.

We define the following operations on contexts:
ctx.typingcurrent typing in ctx
ctx { node -> label } Given a context with typing t, represents the same context where typing is addType(node,label,t)

matchNode

ctx |- matchNode(node, label) checks if a node conforms with the shape associated with label in context ctx. If it conforms, it evaluates to (t,cs,rs) where t is a typing, cs is a set of checked triples, rs is a set of remaining triples.
matchNode ctx.schema(label) = shape     ctx.graph.triplesAround(node) = ts     ctx { node -> label } |- matchShape(ts, shape) = (t,cs,rs)
ctx |- matchNode(node,label) = (t,cs,rs)

matchShape

ctx |- matchShape(triples, shape) checks if a set of triples triples conforms with shape in context ctx. If it conforms, it evaluates to (t,cs,rs) where t is a typing, cs is a set of checked triples, rs is a set of remaining triples.

The empty shape matches any set of triples without any constraint.

Empty  
ctx |- matchShape(ts,Empty) = (ctx.typing,{},ts)

We distinguish 6 cases for basic arcs depending on the cardinality.

If the shape is a conjunction it matches if it matches the first element and then, the second with the remaining triples

And ctx |- matchShape(ts, e1) = (t1, cs1, rs1)     ctx |- matchShape(ts, e2) = (t2, cs2, rs2)
ctx |- matchShape(ts, And(e1, e2) = (combineTypings(t1,t2), cs1 union cs2, rs1 intersection rs2)

If the shape is a disjunction it matches if either the first element matches or the second one does

Or_1 ctx |- matchShape(ts, e1) = (t, cs, rs)
ctx |- matchShape(ts, Or(e1,e2) = (t, cs, rs)
Or_2 ctx |- matchShape(ts, e2) = (t, cs, rs)
ctx |- matchShape(ts, Or(e1,e2) = (t, cs, rs)

matchBasicArc

This section covers how to match a basic Arc: ctx|-matchBasicArc(triple,basicArc).

There are two cases: direct arcs match the value with the object of the triple while inverse arcs match the value with the subject the triple

ObjectArc ctx |- matchValue(o,v) = t
ctx |- matchBasicArc(<s,p,o>, Arc p v) = t
SubjectArc ctx |- matchValue(s,v) = t
ctx |- matchBasicArc(<s,p,o>, InvArc p v) = t

matchValue

valueSet x ∈ set
ctx |- matchValue(x, ValueSet(set)) = ctx.typing

noValueType needs more work to handle different possibilities

valueType hastype(x,t)
ctx |- matchValue(x, ValueType(t)) = ctx.typing

ValueRef handles references to shapes. There are two cases, the first initial case, when the context already contains the declaration that x has shape label just evaluates to the current typing.

valueRef_1 contains(ctx.typing, x, label)
ctx |- matchValue(x, ValueRef(label)) = ctx.typing

The second case, when the context does not contain the declaration that x has the shape label is solved by trying to match x with shape label in the graph.

valueRef_2 not(contains(ctx, x, label))       ctx |- matchNode(x,v) = (t,_,_)
ctx |- matchValue(x, ValueRef(label)) = t

noMatchAny

noMatchArcAny takes a set of triples and a basic arc and checks that there is no triple that matches the basic arc

It has two possibilities.

noMatchAny_1  
ctx |- noMatchAny({}, basicArc)
noMatchAny_2 removeTriple(ts) = (t,ts')     ctx |- noMatchBasicArc(t,basicArc)     ctx |- noMatchAny(ts',basicArc)
ctx |- noMatchAny(ts, basicArc)

noMatchBasicArc

noMatchBasicArc

If the predicates are different, then it succeeds

noMatchBasicArc_1 p ≠ p'
ctx |- noMatchBasicArc(<s,p,o>, Arc p' v)

If the predicates are the same it succeeds if the value doesn't match. There are two cases depending on direct/inverse arcs

noMatchSubjectArc ctx |- noMatchValue(o,v)
ctx |- noMatchBasicArc(<s,p,o>, Arc p v)
noMatchObjectArc ctx |- noMatchValue(s,v)
ctx |- noMatchBasicArc(<s,p,o>, InvArc p v)

noMatchValue

It succeeds if the value doesn't match with the node.

noValueSet x ∉ set
ctx |- matchValue(x, ValueSet(set))

noValueType needs more work to handle different possibilities

noValueType hasNoType(x,t)
ctx |- matchValue(x, ValueType(t))

noValueRef succeeds if the the system can not determine that the node x has shape label.

noValueRef ctx |- matchValue(x, ValueRef(label)) = ctx.typing     notContains(ctx.typing, x, label)
ctx |- noMatchValue(x, ValueRef(label))

Extended features

The core SHACL language can be extended with the following constructs:

Shape ::= Not Shape                // Matches if the shape does not match
	    | Close Shape              // Matches if it matches the shape and there are no remaining triples
		| Xor Shape Shape          // Matches if either the first shape or the second match, but not both

The axiomatic definition of these constructs are:

If the shape is a negation of a shape it matches if the shape does not match

Not ctx |- not(matchShape(ts,e))
ctx |- matchShape(ts, Not(e)) = (t, ts, {})

A set of triples matches a closed shape if the set of triples matches the shape and there are no remaining triples

Close ctx |- matchShape(ts,e) = (t,cs,{})
ctx |- matchShape(ts, Closed(e)) = (t, cs, {})

The XOr construct is defined in terms of negation as:

Xor(e1,e2) = Or(And(e1,not(e2)),And(e2,not(e2)))

Acknowledgements

We would like to acknowledge the contributions of Peter F. Patel-Schneider, Eric Prud'hommeaux, Iovka Boneva and the other members of the W3c Data Shapes Working group.