Christoph Bussler

Research Work

Back

| SQL For NoSQL | Business Rule Computing | Multi-Tenant Computing and SaaS | Principles of Computing | Process Computing | Semantic Computing | Thoughts on ... | Transaction Computing |
Top

SQL For NoSQL

NoSQL databases support data models that are different from the Relational Model. These are, for example, document data model, key/value data model, graph data model, to name a few (a more comprehensive list can be found on e.g. the DB-Engines site, see reference in the list of resources below).

The document data model is in general based on JSON structures with a JSON document being the stored unit ("document"). Zero, one or more documents are stored in collections and in general non-SQL interfaces are provided to select and to retrieve documents. There are many databases supporting the document data model, see the landscape map from The 451 Group below in the list of resources.

The question arises, why are databases that support the document data model almost insisting on providing a non-SQL query interface? Some started to add SQL support, but those implementations appear as if they are an afterthought without the attempt of being fully compliant or complete wrt. SQL as a language and the existing standards of it.

I got curious and launched a research effort of extending SQL supporting the JSON-based document data model specifically. My progress and findings are reported here in detail: https://realprogrammer.wordpress.com. As it turns out, support for the document data model fits quite naturally into the SQL language (where "S" stands for structured, not relational).

When designing a query language, its semantics is defined in context of a data model. In this context one fundamental design question is: should a document schema be enforced or not? For example, do all documents in a collection have to be of the same schema (in terms of structure and data types)? In my research I opted for not forcing documents to follow a defined schema and that puts interesting requirements on the SQL language extension in order to support JSON documents which are worthwile addressing, also from a general data schema perspective, as each document can be of a different structure.

Resources:


Top

Business Rule Computing

Business rules are an area that is gaining more and more visibility in industry. The main idea is that business rules like determining the rebate amount or discount percentage can be changed dynamically at runtime without changing the application software code and without recompiling and redeploying it. Instead, a model approach is followed where the underlying data and rules are defined declaratively and can be changed dynamically. This increases the flexibility from a business perspective while reducing the load on the IT department.


Top

Multi-Tenant Computing and SaaS


Top

Principles of Computing

Peter Denning and Craig Martell started working on defining the Great Principles of Computing (PoC). They put together a web site (see here http://denninginstitute.com/pjd/GP/GP-site/welcome.html) that contains the current status of their ongoing work. They identified seven categories of principles of computing; these are

In the following I will discuss the category of coordination in more detail as much of my work is in this area and reading the definition and details of this category triggered some thoughts that I put down here.

Coordination is an important set of principle as much time is spent every day on coordination, at work, at home, during the commute, while being on business or vacation trips, with more and more support through software systems. Some examples of those supporting systems are email, calendar or instant messengers. But also cell phones, iPhones, PDAs and Blackberries carry software that supports human and system coordination like SMS, GPS, walky-talky and other functionality.

Coordination is communication with a specific purpose: communicating entities (agents) synchronize their activities in order to achieve a common goal. An example is the set of employees involved in a travel approval and expense reporting process where each agent acts in a specific way like a traveller, an approver, etc., in order to make a business trip successfully happen. One can argue that in the absense of a purpose coordination does not take place.

The coordination category by Denning and Martell outlines the basic elements of coordination. It contains 2 types of coordination, direct coordination by speech acts between agents and indirect coordination of agents to synchronize access to a shared resource. The first case supports the direct communication between the agents in such a way that each agent knows what is expected of him during the communication in order to achieve the goal. The second case, however, does not support the direct communication between the agents and they do not even have to be aware of each other in this type of coordination.

One observation of this categorization is that the involved computer systems do not have a representation of the coordination itself. For example, in a direct coordination two employees can coordinate their actions over the phone or with an instant messenger. The coordination is not formally defined or executed in form of ongoing instances. Instead, the medium for coordination is unaware of the fact that coordination is ongoing and might even be stateless, i.e., the communication is not recorded. For example, in the case of direct coordination an instant messenger could be used in a mode that does not allow to recall the conversation. In this case the speech acts are not defined in the instant messenger at all and at runtime, the individual messages sent are not related to each other in such a way that the instant messenger system could retrieve a sequence that relates the messages.

The same applies for the synchronization of a shared resource. The coordination in this case is indirect as the agents do not communicate directly with each other, but through a transaction that is not aware of the agents at all. The coordination is not formally defined, nor executed as such, and the agents are unaware of each other. The example used by Denning and Martell is a checking account access through database transactions. In this case the involved agents to not have a common goal. Instead, the coordination is enforced due to the data integrity constraints placed on a checking account.

In addition to the above mentioned types of coordination there are additional ones, however. The following matrix shows additional types of coordination. The matrix has two dimensions, one is the 'awareness' dimension. The point in the dimension 'Software Aware' means that the software system has a formal representation of the coordination itself and can recall it. 'Software Not Aware' means that the software system does not know about the fact that it is used for coordination. It does not have a formal definition and therefore it cannot recall it. The other dimension is the 'control' dimension. 'Agent Controlled' means that the involved agents control the steps and the progress of the coordination. 'Software Controlled' means that the software system controls the progress and the steps of the coordination (the numbering is used to refer to the fields later on.

     
  | Software Not Aware of Coordination | Software Aware of Coordination
Agent Controlled Coordination | (1) Speech Acts | (2) Constraint Management
Software Controlled Coordination | (3) Resource Synchronization | (4) Workflow Management

An example for constraint management (field (2)) is a software code management system like Perforce (http://www.perforce.com). It is aware of agents (software engineers) as it carefully keeps track of which engineer checks in or checks out software. It also keeps carefully track of multiple checkouts of the same code and conflicts upon checkin. If a conflict happens, it states so, however, without telling the software engineers how to resolve the conflict. It only states that a constraint was violated. A software management system in this sense has rules and constraints that define a consistent state or an inconsistent state of the whole software system and its parts. As engineers resolve conflicts, they might actually be resolved or violation of constraints appear elsewhere in the code. In this approach the software management system is aware of the ongoing coordination, it can recall the history of what the software engineers did over time, but it does not control the coordination of the software engineers themselves.

In contrast, and example for (4) is a workflow management system where the business process is formally encoded and the software system actually drives the coordination by indicating to agents what they have to do and when. It can report on the status of different ongoing workflows and keeps in general a full history of all coordination that took place.

An example for (3) is a relational database system that coordinates transactions such that write access to a data item is done in such a way that agents do not interfere with each other. In this case the agents of agents are coordinated by the software, however, the software is not aware of the coordination. No history is recorded, that agents are unknown to the software. It can even be the case that two actions of the same agents are synchronized.

An example for (1) is the instant messenger where the instant messenger is not aware that the exchanged messages are actually coordinating the typing agents. As they are in control they drive the coordination. However, there was a system implemented (ActionTechnologies) that implemented speech acts literally and coordinated agents. However, this system really belongs into (4) as the speech acts was 'just' a specific process representation and the system exhibited all characteristics of a workflow management system.


Top

Process Computing

Process computing comprises a series of currently separate areas that in the future will come together and will become one technical area. These are

The reason for my prediction is that the underlying principles and concepts of these different areas are precisely the same as well as their specific requirements. For historical reasons these areas were developed separately, however, from a technology and underlying concept perspective they are the same and one technology and one conceptual model is sufficient to provide a complete solution for these areas.


Top

Semantic Computing

For me Semantic Computing is a major shift in computer science where all aspects, from language theory to operating systems make use of semantic technology in such a way that semantic understanding, interpretation and interoperability are easily (!) achieved. Ideally, Semantic Computing starts from the microprocessor level with introducing more adequate data types and ends at an user interface level that can intelligently interact with users to obtain semantically correct data.

Semantic Computing, in my mind, requires the examination of all areas of computer science as a whole. This is in contrast to ongoing research efforts that try to apply semantic technology to a single domain or technology in isolation. Just as one discussion point, it is not possible today to retrieve an RDF triple and pass it on through all software layers to the user interface without it being re-represented in various languages and type systems throughout the software and technology component stack (see the discussion in this column: Is Semantic Web Technology Taking the Wrong Turn? [Cached].

Major conferences in the space of semantics are the following:


Top

Thoughts On ...


© Christoph Bussler, 1991 -