Christoph Bussler

Research Work

Back

| Big Data and NoSQL | Business Rule Computing | Multi-Tenant Computing and SaaS | Principles of Computing | Process Computing | Semantic Computing | Thoughts on ... | Transaction Computing |
Top

Big Data and NoSQL

Going with the times, Big Data and NoSQL appeared on my radar for a while now. Aside from performance and size aspects, what is fascinating to me is the proliferation, rather than convergence, of data models that underlies the new breed of databases. It appears that databases and data models are going to follow the programming language route: constantly renewing by creating new languages with the most significant languages staying in the game while not so significant ones are disappearing.

What are the core research aspects? There are plentiful. An interesting initial one would be to categorize the field of Big Data and NoSQL databases into various concurrent categories (e.g., system performance or model/language expressiveness). Another interesting work would be to compare existing systems for real, including system performance and language expressiveness. How do the new systems compare to established ones? More core research topics will appear over time for sure.

What are eco-system research aspects? Aside from system and models, there will be probably the "usual" set of topics appearing on the horizon: backup and recovery, transactions, triggers, stored procedures, streaming queries, queues, rules, and so on - all in context of the new systems.

And, finally, I predict that research areas using databases as support technology will take notice: event processing systems on NoSQL databases, high-performance transaction systems on NoSQL databases, workflow management systems on NoSQL databases, just to name a few.

One thing is clear: Big Data and NoSQL represent a major shift in the area of databases and systems that depend on databases. This shift is as significant as the appearance of SQL and relational systems because it fundamentally changes how data are modeled, managed and used by application systems.

Resources:


Top

Business Rule Computing

Business rules are an area that is gaining more and more visibility in industry. The main idea is that business rules like the rebate amount or discount percentage can be changed dynamically at runtime without changing the software code and without recompiling and redeploying the code. Instead, a model approach is followed where the underlying data and rules can change declaratively. This increases the flexibility from a business perspective while reducing the load on the IT department.

Changing data independent of the software code is a known methodology using, for example, database systems. In such a case the rebate amount is stored inside a database. Whenever the rebate amount in the above example needs to be changed, the rebate amount or percentage in the database is changed and this change will be picked up by the software code at runtime. Changing rules dynamically is possible, too, today without specialized business rules languages. For example, simple rules can be encoded as and/or logic inside databases. More complex rules can be implemented using stored procedures inside databases that can be updated dynamically independent of the software code of the business application.

Rules languages are declarative in nature. They can be used to change data, too. In addition, rules languages are declarative and allow inferencing, providing more expressiveness through this property. However, they are not necessarily superior to the aforementioned approaches. If inferencing is not needed and a declarative language not seen to be beneficial, then other languages are appropriate, too.

While in principle it is a good idea to use a business rule language (or another language for the same purpose), from an implementation viewpoint the main error made is that a rules language is used like a programming language. Very often a rule or several rules are put in place where without a rules language a procedure or method would be implemented with a procedural programming language. The downside of this approach is that while the rules can be changed as well as their underlying data, the invocation location of the rules is fixed as the rules are executed like a procedure or method. Adding a rule at a different place in the code in order to implement appropriate business semantics requires a full code implementation cycle in order to make that change.

A by far better approach is that the time the rules are invoked is not hardcoded, but declaratively determined so that the programmer or software engineer does not have to implement the rule invocation itself into the regular business logic code. This allows to model rules in such a way that their invocation location can be dynamically changed, too, not just their underlying data. This then allows to add or remove rules throughout the business logic, not just at the locations that are pre-planned and hard-coded into the software.


Top

Multi-Tenant Computing and SaaS


Top

Principles of Computing

Peter Denning and Craig Martell started working on defining the Great Principles of Computing (PoC). They put together a web site (see here http://cs.gmu.edu/cne/pjd/GP) that contains the current status of their ongoing work. They identified seven categories of principles of computing; these are

In the following I will discuss the category of coordination in more detail as much of my work is in this area and reading the definition and details of this category triggered some thoughts that I put down here.

Coordination is an important set of principle as much time is spent every day on coordination, at work, at home, during the commute, while being on business or vacation trips, with more and more support through software systems. Some examples of those supporting systems are email, calendar or instant messengers. But also cell phones, iPhones, PDAs and Blackberries carry software that supports human and system coordination like SMS, GPS, walky-talky and other functionality.

Coordination is communication with a specific purpose: communicating entities (agents) synchronize their activities in order to achieve a common goal. An example is the set of employees involved in a travel approval and expense reporting process where each agent acts in a specific way like a traveller, an approver, etc., in order to make a business trip successfully happen. One can argue that in the absense of a purpose coordination does not take place.

The coordination category by Denning and Martell outlines the basic elements of coordination. It contains 2 types of coordination, direct coordination by speech acts between agents and indirect coordination of agents to synchronize access to a shared resource. The first case supports the direct communication between the agents in such a way that each agent knows what is expected of him during the communication in order to achieve the goal. The second case, however, does not support the direct communication between the agents and they do not even have to be aware of each other in this type of coordination.

One observation of this categorization is that the involved computer systems do not have a representation of the coordination itself. For example, in a direct coordination two employees can coordinate their actions over the phone or with an instant messenger. The coordination is not formally defined or executed in form of ongoing instances. Instead, the medium for coordination is unaware of the fact that coordination is ongoing and might even be stateless, i.e., the communication is not recorded. For example, in the case of direct coordination an instant messenger could be used in a mode that does not allow to recall the conversation. In this case the speech acts are not defined in the instant messenger at all and at runtime, the individual messages sent are not related to each other in such a way that the instant messenger system could retrieve a sequence that relates the messages.

The same applies for the synchronization of a shared resource. The coordination in this case is indirect as the agents do not communicate directly with each other, but through a transaction that is not aware of the agents at all. The coordination is not formally defined, nor executed as such, and the agents are unaware of each other. The example used by Denning and Martell is a checking account access through database transactions. In this case the involved agents to not have a common goal. Instead, the coordination is enforced due to the data integrity constraints placed on a checking account.

In addition to the above mentioned types of coordination there are additional ones, however. The following matrix shows additional types of coordination. The matrix has two dimensions, one is the 'awareness' dimension. The point in the dimension 'Software Aware' means that the software system has a formal representation of the coordination itself and can recall it. 'Software Not Aware' means that the software system does not know about the fact that it is used for coordination. It does not have a formal definition and therefore it cannot recall it. The other dimension is the 'control' dimension. 'Agent Controlled' means that the involved agents control the steps and the progress of the coordination. 'Software Controlled' means that the software system controls the progress and the steps of the coordination (the numbering is used to refer to the fields later on.

     
  | Software Not Aware of Coordination | Software Aware of Coordination
Agent Controlled Coordination | (1) Speech Acts | (2) Constraint Management
Software Controlled Coordination | (3) Resource Synchronization | (4) Workflow Management

An example for constraint management (field (2)) is a software code management system like Perforce (http://www.perforce.com). It is aware of agents (software engineers) as it carefully keeps track of which engineer checks in or checks out software. It also keeps carefully track of multiple checkouts of the same code and conflicts upon checkin. If a conflict happens, it states so, however, without telling the software engineers how to resolve the conflict. It only states that a constraint was violated. A software management system in this sense has rules and constraints that define a consistent state or an inconsistent state of the whole software system and its parts. As engineers resolve conflicts, they might actually be resolved or violation of constraints appear elsewhere in the code. In this approach the software management system is aware of the ongoing coordination, it can recall the history of what the software engineers did over time, but it does not control the coordination of the software engineers themselves.

In contrast, and example for (4) is a workflow management system where the business process is formally encoded and the software system actually drives the coordination by indicating to agents what they have to do and when. It can report on the status of different ongoing workflows and keeps in general a full history of all coordination that took place.

An example for (3) is a relational database system that coordinates transactions such that write access to a data item is done in such a way that agents do not interfere with each other. In this case the agents of agents are coordinated by the software, however, the software is not aware of the coordination. No history is recorded, that agents are unknown to the software. It can even be the case that two actions of the same agents are synchronized.

An example for (1) is the instant messenger where the instant messenger is not aware that the exchanged messages are actually coordinating the typing agents. As they are in control they drive the coordination. However, there was a system implemented (ActionTechnologies) that implemented speech acts literally and coordinated agents. However, this system really belongs into (4) as the speech acts was 'just' a specific process representation and the system exhibited all characteristics of a workflow management system.


Top

Process Computing

Process computing comprises a series of currently separate areas that in the future will come together and will become one technical area. These are

The reason for my prediction is that the underlying principles and concepts of these different areas are precisely the same as well as their specific requirements. For historical reasons these areas were developed separately, however, from a technology and underlying concept perspective they are the same and one technology and one conceptual model is sufficient to provide a complete solution for these areas.


Top

Semantic Computing

For me Semantic Computing is a major shift in computer science where all aspects, from language theory to operating systems make use of semantic technology in such a way that semantic understanding, interpretation and interoperability are easily (!) achieved. Ideally, Semantic Computing starts from the microprocessor level with introducing more adequate data types and ends at an user interface level that can intelligently interact with users to obtain semantically correct data.

Semantic Computing, in my mind, requires the examination of all areas of computer science as a whole. This is in contrast to ongoing research efforts that try to apply semantic technology to a single domain or technology in isolation. Just as one discussion point, it is not possible today to retrieve an RDF triple and pass it on through all software layers to the user interface without it being re-represented in various languages and type systems throughout the software and technology component stack (see the discussion in this column: Is Semantic Web Technology Taking the Wrong Turn? [Cached].

Major conferences in the space of semantics are the following:


Top

Thoughts On ...


Top

Transaction Computing

While transactions are used widely in context of database management systems, transactions in context of programming languages are not yet used in mainstream programming today. An abort or a rollback on a database transaction does not have an effect in main memory or even on the user interface as it has within a transactional database management system.

However, from a programming perspective it would be desirable if the whole implementation stack, i.e., all layers (user interface - business logic - database) and their technologies can participate in transactions with the same effect as in database transactions. A programmer, on abort or rollback, would expect that with the data also the user interface state as well as the main memory state (business logic) is rolled back to a consistent state as it was before the initiation of the transaction. So instead of only having transactions in a database, transaction boundaries bracket all computation.

Such a holistic transactional behaviour, i. e. one that encompasses all parts of the computation, from the user interface to the database system would truly make a difference in software engineering for dependable systems.

Sun's research work on transactional memory can be found here: http://research.sun.com/scalable/. In Wikipedia is an overview of various approaches here: http://en.wikipedia.org/wiki/Software_transactional_memory.

It is interesting to observe that several works in this space emphasize the concurrency control problem, not the transactional problem. In frameworks like J2EE in conjunction with application servers concurrency is not really an issue programmers have to deal with as this is taken care of by the application server. However, transactional behaviour is very important for programmers to deal with as this establishes the correctness of the business logic's results. Based on this train of thought it would be very undesirable to have transactional memory that is independent from the database transactions in programs that use both.


© Christoph Bussler, 1991 -