The data model of "property graphs" or "attributed graphs " has emerged since the early 2000s as a common denominator of various models of graph-oriented databases [1]. It can be defined informally as follows:
Building upon widely adopted definitions [2] [3], a property graph/attributed graph can be defined by a 7-tuple (N , A, P, V, α, , π), where
A complementary construct, used in several implementations of property graphs with commercial graph databases, is that of labels, which can be associated both with nodes and arcs of the graph. Labels have a practical rather than theoretical justification, as they were originally intended for users of Entity-Relationship models and relational databases, to facilitate the import of their legacy data sets into graph databases :. labels make it possible to associate the same identifier (that of the relational table, or of the ER entity) to all graph nodes which would correspond to the different rows of this relational table, or to instances of the same generic entity / class. With the proposed definition, these labels could in fact be viewed as attributes defined only by a key, without an associated value (this is why is defined separately as a binary relation, and π as a partial function). The basic definition thus becomes much clearer, simpler, and satisfies a principle of parsimony. Alternatively, and more consistently, labels can be defined through type graphs, as special types associated with nodes and arcs.
Attributed graphs, as defined above, are especially useful and relevant in that they provide an "umbrella" hypernymic concept ( i.e. common generalization) for several key graph-theoretic models, which have long-since been widely used in classical graph algorithms
Knowledge graphs, usually represented as RDF graphs, are in fact hybrid labeled graphs, whose node labels correspond to instance identifiers ( IRI)s or literals, and edge labels identify types (not instances) of predicates. They have now acquired a visibility which tends to obscure the longer-established use of graphs as direct model for systems of all kinds [4]. Attributed graphs are, by their versatility and expressivity, the best-adapted for this type of modeling, where graphs which can rightly be called cyber-physical do not merely capture weakly structured about a physical system, as would be the case with a knowledge graph, but attempt to directly capture the structure of a physical system, as matched by the connectivity structure of the graph. In contrast, an RDF graph would mix structural relationships with attached properties, and category / class information with instance / individuals, drowning out the structure The expressivity of attributed graphs, on the level of higher order logic, is also far above that of RDF graphs, which is limited to first order logic. Properties of relationships, which are at the heart of the attributed graph model, require a very cumbersome reification process to be expressed in RDF.
The NGSI-LD data model specified by ETSI has been the first attempt to standardize property graphs under a de jure standards body. Compared to the basic model defined here, the NGSI-LD meta-model adds a formal definition of basic categories (entity, relation, property) on the basis of semantic webstandards ( OWL, RDFS, RDF), which makes it possible to convert all data represented in NGSI-LD into RDF datasets, through JSON-LD serialization. NGSI-LD entities, relations and properties are thus defined by reference to types which can themselves be defined by reference to ontologies, thesauri, taxonomies or microdata vocabularies, for the purpose of ensuring the semantic interoperability of the corresponding information.
The ISO/IEC JTC1/SC32/WG3 group of ISO, which established the SQL standard, is in the process of specifying a new query language suitable for graph-oriented databases, called GQL (Graph Query Language). This standard will include the specification of a property graph data model, which should be along the lines of the basic model described here, possibly adding notions of labels, types, and schemas .
Graph-oriented databases are, compared to relational databases, touted for not requiring the prior definition of a schema to start populating the base. This is desirable and suitable for environments and applications where one operates under an open world assumption, such as the description of complex systems and systems of systems, characterized by bottom-up organization and evolution, not control of a single stakeholder. However, even in such environments, it may be needed to constrain the representation of specific subsets of the information entered into the database, in a way that may resemble a traditional database schema, while keeping the openness of the overall graph for addition of unforeseen data or configurations. For example, the description of a smart city falls under the open world assumption and will be described by the upper level of a graph database, without a schema. However, specific technical sub-systems of this city remain top-down closed-world systems managed by a single operator, who may impose a stronger structuring of information, as customarily represented by a schema.
The notions of "type graphs" and schemas [2] make it possible to meet this need, with types playing a role similar to that of labels in classical graph databases, but with the added possibility of specifying relations between these types and constraining them by keys and properties. The type graph is itself a property graph, linked by a relation of graph homomorphism with the graphs of instances that use the types it defines, playing a role similar to that of a schema in a data definition language.
The ontologies, thesauri or taxonomies used to reference NGSI-LD types are also defined by graphs, but these are RDF graphs rather than property graphs, and they typically have broader scopes than database schemas. The complementary use, possible with NGSI-LD types, of type graphs and referencing of external ontologies, makes it possible to enforce strong data structuration and consistency, while affording semantic grounding and interoperability.