Tag Archives: Architecture

Revisiting Software Architecture

Quite recently I heard a statement similar to

“The application works, so there is no need to consider changing the architecture.”

I was a bit surprised and must admit that in this situation had no proper response for someone who obviously had a view so different from everything I believe in. But when you think about it, there is obviously a number of reasons why this statement was a bit premature. Let’s have a look at this in more detail.

There are several assumptions and implicit connotations, which in our case did not hold true. The very first is that the application actually works, and at the time that was not entirely clear. We had just gone through a rather bumpy go-live and there had not yet been a single work item processed by the system from start to finish, let alone all the edge cases covered. (We had done a sanity test with a limited set of data, but that had been executed by folks long on the project and not real end users.) So with all the issues that had surfaced during the project, nobody really knew how well the application would work in the real world.

The second assumption is that the chosen architecture is a good fit for the requirements. From a communication theory point of view this actually means “a good fit for what I understand the requirements to be”. So you could turn the statement in question around and say “You have not learned anything new about the requirements since you started the implementation?”. Because that is what it really means: I never look back and challenge my own thoughts or decisions. Rather dumb, isn’t it?

Interestingly, the statement was made in the context of a discussion about additional requirements. So there is a new situation and of course I should re-evaluate my options. It might indeed be tempting to just continue “the old way” until you really hit a wall. But if that happens you have consciously increased sunk costs. And even if you can “avoid the wall”, there is still a chance that a fresh look at things could have fostered a better result. So apart from the saved effort (and that is only the analysis, not a code change yet) you can only loose.

The next reason are difficulties with the original approach and of that there had been plenty in our case. Of course people are happy that things finally sort-of work. But the more difficulties there have been along the way, the bigger the risk that the current implementation is either fragile or still has some hidden issues.

And last but not least there are new tools that have become available in the meantime. Whether they have an architectural impact obviously depends on the specific circumstances. And it is a fine line, because there is always temptation to go for the new, cool thing. But does it provide enough added value to accept the risks that come with such a switch? Moving from a relational database to one that is graph-based, is one example that lends itself quite well to this discussion. When your use-case is about “objects” and their relationships with one another (social networks are the standard example here), the change away from a relational database is probably a serious option. If you deal with financial transactions, things look a bit different.

So in a nutshell here are the situations when you should explicitly re-evaluate your application’s architecture:

  • Improved understanding of the original requirements (e.g. after the first release has gone live)
  • New requirements
  • Difficulties faced with the initial approach
  • New alternatives available

So even if you are not such a big fan of re-factoring in the context of architecture, I could hopefully show you some reasons why it is usually the way to go.

Choosing a Technology

I recently started a new hobby project (it is still in stealth mode, so no details yet) and went through the exercise to really carefully think about what technology to use for it. On a very high level the requirements are fairly standard: Web UI, persistence layer, API focus, cross-platform, cloud-ready, continuous delivery, test automation, logging, user and role management, and all the other things.

Initially I was wondering about the programming language, but quickly settled for Java. I have reasonable experience with other languages, but Java is definitely where most of my knowledge lies these days. So much for the easy part, because the next question proved to be “slightly” more difficult to answer.

Looking at my requirements it was obvious that developing everything from the ground up would be nonsense. The world does not need yet another persistence framework and I would not see any tangible result for years to come, thus loosing interest to soon. So I started looking around and first went to Spring. There is a plethora of tutorials out there and they show impressive results really quickly. Java EE was not really on my screen then, probably because I still hear some former colleagues complain about J2EE 1.4 in the back of my mind. More importantly, though, my concern was more with agility (Spring) over standards (JEE). My perception with too many Java standards is that they never outgrow infancy, simply because they lack adoption in the real world. On the other hand Spring was created to solve real-world problems in the first place.

But then, when answering a colleague’s question about something totally different, I made the following statement:

I tend to avoid convenience layers, unless I am 100% certain that they can cope with all future requirements.

All to often I have seen that some first quick results were paid for later, when the framework proved not to be flexible enough (I call this the 4GL trap). So this cautioned myself and I more or less went back to the drawing board: What are the driving questions for technology selection?

  • Requirements: At the beginning of any non-trivial software project the requirements are never understood in detail. So unless your project falls into a specific category, for which there is proven standard set of technology, you must keep your options open.
  • Future proof: This is a bit like crystal ball gazing, but you can limit the risks. The chances are bigger that a tier-3 Apache project dies than an established (!) Java standard to disappear. And of course this means that any somewhat new and fancy piece must undergo extreme scrutiny before selecting it; and you better have a migration strategy, just in case.
  • Body of knowledge: Sooner or later you will need help, because the documentation (you had checked what is available, right?) does not cover it. Having a wealth of information available, typically by means of your favorite search engine, will make all the difference. Of course proper commercial support from a vendor is also critical for non-hobby projects.
  • Environment: Related to the last aspect is how the “landscape” surrounding your project looks like. This entails technology but even more importantly the organization which has evolved around that technology. The synergies from staying with what is established will often outweigh the benefits that something new might have when looked at in isolation.

On a strategic level these are the critical questions in my opinion. Yes, there are quite a few others, but they are more concerned with specifics.

The Non-SOA View on SOA: Part 1

There is a bunch of folks out there that don’t like SOA (Service-Oriented Architecture) for various reasons. So I try to look at things without all the buzz and distill out a few aspects that in my view provide value. The goal is to provide an alternative perspective that is hopefully hype-free.

I want to split this in two parts: First (in this post) comes the conceptual side of things, which looks at what SOA in my view is about at its core. Second are the more practical benefits we get from the way SOA is approached in real life. This will be a separate post.

Concepts

As a “preface” let me point out that I look at things mostly from a technical perspective. So everything along the lines of “SOA is about business and not technology” gets pretty much ignored. There are two reasons for that: Firstly I am a technical person and more interested in the technical aspects. Secondly, the business argument is probably the one most resented by skeptics. So let’s get started …

The core thing about SOA is that all functionality is made available as a service (surprise, surprise). This is really trivial and hardly worth mentioning. However, it has far-reaching consequences once you dig a bit deeper. And it’s those secondary aspects that provide the advancements.

  • The right level of slicing up things: Of course there were many other approaches and technologies before (e.g. OO and  CORBA). However, none of those has kept its promise. Admittedly it remains to be seen to what extent SOA can fulfill expectations. On the other hand, those expectations are still in flux as we all continue to improve our understanding. So there is a chance that expectations and reality will actually meet some time (where?). Also, the criticism I am aware of is not about this aspect. In fact it seems pretty much undisputed, at least I have never heard anything from customers or prospects.
  • The service is the application: A direct consequence of the previous point is that the service gets elevated to the place that was formerly held by an entire application; at least from a user’s perspective. Whether the implementation reflects this or not is irrelevant to the consumer.
    For new development it is usually desirable that the internals match the services exposed to the outside. For “legacy” stuff that just gets a service interface, the wrapper logic takes this place. In either case, however, the exposed logic is much smaller than an entire application.
  • State management: There has been a lot of talk about loose coupling. This principle can be applied at many levels, transport protocol and data format being the obvious ones. A slightly more subtle place is the handling of state, which pretty much depends on the aforementioned transport protocol and data format.
    The interface, at least of the initial operation, must be exposed in a way that it can just be called.  In other words it is kind of stateless. Of course from that point on everything could be state-full. There is no difference between SOA and traditional applications here. In reality, however, people pay much more attention to keeping things stateless when working in a SOA context.
  • Life cycle management: This one is strongly related to the point “the service is the application”. The notion of a life cycle has formerly mostly been on the level of entire applications. The obvious disadvantage is that this huge scope almost demands big bang-approaches for new versions. With the effort and costs associated for bringing a new version into production, there are only very few such releases and much stuff gets crammed into it. The risks increase, business agility is close to zero and nobody is really happy. With a service as the “unit to be delivered”, the scope is drastically reduced and so are things like risk, complexity etc. Thus I think the concept will be brought to much more practical relevance than before.
  • Standards: Although a real SOA can be built using an entirely proprietary technology, in reality a set of standards has established itself. Yes, there are probably to many WS-* things around that only work together in particular versions. So I would not recommend to jump onto too many things. But with the basic things you are certainly on the right track.

There is probably a whole lot more to say, but let’s keep it here for now. I am looking forward to any comments on the subject!

Michael T. Nygard: Release It! Design and Deploy Production-Ready Software

I bought this book about a year ago and -shame on me- only just read it. It’s really great for everyone that is interested in designing and developing robust software. So in that sense a must-read for all of us.

The book is organized in four general sections: Stability, capacity, general design issues and operations. For all of them a number of typical scenarios are described and general approaches discussed. The author seems to have a pretty strong Java and web application background (at least those are the areas of most of his examples), but the patterns and solutions are great for all systems, languages and use-cases.

So overall what we have here is a book that is fun to read and at the same time offers great insight into large-scale software systems. In my view everyone who works in this field can benefit from this book.

The author is also blogging on Amazon.com and seems to cover quite a few interesting topics.

Lifecycle Management with SOA and BPM: 1+1> 2

For some time the topics of SOA Governance and BPM have been looked at as if they were two relatively unrelated things. And this perception is correct in the sense that you don’t have to have them together. However, more and more people realize what huge additional benefits are in for them if they combine the two things. In many cases the idea is that you need some logic to govern the actual work (design, development, testing etc.) for a process that has been modeled in a nice fancy tool.

But you can also do it the other way around: Think about what you would get if you could govern your whole IT lifecycle management from one tool. The idea goes like this: You store all relevant information about “objects” that are relevant for your organization in a central repository, and the different attributes that describes those objects (aka assets) are completely freely configurable. You probably need to attach additional information to them, like existing documentation etc. So in a way you can think of these information in the repository as a way to store the knowledge about all relevant aspects of the organization and then leverage this knowledge.

Now based on that groundwork, whenever a request for a new a feature in the IT landscape comes in, you can have it go through a “workflow”. The first steps would probably be about an approval chain. So people from various functions (e.g. product management, operations, security, marketing etc.) would need to either approve or reject this. How the final outcome is determined can be a bit tricky (and is much more a political topic than a technical one).

Then come steps like gathering requirements, signing them off, doing the development etc. You probably also want to integrate this whole thing with your development chain (automated testing, continuous integration etc.). At any given point in time you know where your development stands in terms of the project plan.

So if you step back a bit and look at what you get, we are not talking about development tools any more. Instead this is true, real-time end-to-end visibility. There are clear responsibilities for assigning tasks (a human being has to decide on something) and you no longer need to fear emails that are lost in the inbox of someone’s email program. Instead you get a view into the currently open tasks, their due dates etc. Other advantages existing but for now those are the critical ones. The reason for this is that these functionalities allow you to have automatically generated documentation that satisfies your compliance requirements. In most organizations these things eat up enormous amounts of resources and affect processes that should deliver value to the organization.

Let’s leave it here for now. I am quite interested in your comments on this, so please let me know.

David A. Fisher “An Emergent Perspective on Interoperation in Systems of Systems”

When David A. Fisher wrote this paper in 2006, the hype around Web Services and SOA had just begun. The point that struck me most when reading the executive summary, was that Fisher does not limit his thoughts to technical systems. Instead he accepts the fact that the involved people are also an important part of the equation. This aspect seems to be ignored by most authors and is -in my view- a major reason why many theories seem to be so far from reality.

The paper is more than 60 pages long, so nothing for a quick lunch break reading. However, I recommend reading at least the executive summary. It made me curious enough to schedule special time for the rest of the document.

Is a SAN Really the Silver Bullet for Your Performance Requirements?

All too often I have heard things like “You need fast storage? Use SAN”. While this is certainly correct, most people have a relatively vague understanding about SANs in general and what they offer performance-wise compared to local disks in particular. I will not go into the details of things like the various protocols etc. Rather I want to highlight where this requirement for high performance comes from and why SANs are in many cases a good approach to it.

I am looking at all this from the point of view of commercial applications only. What they are doing -in a nutshell- is retrieving, processing and sending data. By the way, if you never touched upon the semantics of terms like data, information, knowledge and how they are related to one another, I suggest to do so. It is an interesting topic and knowing the differences can be helpful in many discussions.

But back to things …. For most commercial applications the overall performance is determined not so much by CPU power but the throughput with which information can be received or sent (or read and written from/to disk when speaking more of persistance). In technical terms this is commonly referred to as I/O throughput. And that is, by the way, a major reason why mainframes will most likely stay around for quite a few more years. They have been optimized for I/O performance for literally decades. This includes hardware (special I/O subsystems that free the CPUs from processing I/O requests), the operating system and also the applications (because people have been aware of this much more than in the open systems’ world).

But I did not want to focus on mainframes today. Rather this post is about a rather common misconception when it comes to I/O or more specifically storage performance. Many people, whenever they hear “We need high performance storage” instantly think about putting the respective data onto a SAN. While this may be the right choice in many cases, the reasoning is not so often correct. For many folks SAN is just a synonym for “some centralized ultra-fast storage pool”. It is in my view important to understand how a SAN is related to other storage devices.

There is no reason that local hard disks are “by definition” slower than their counterpart in the SAN. At the end of the day the main difference between a SAN and local hard disks is that the connection from the computer to the disks is different. For SAN this is either fiber channel (FC) or iSCSI (basically SCSI over IP). For local disks it can be SCSI (serial or good old parallel), Serial ATA or whatever. What determines the speed is the number and individual performance of the physical disks that make up a {en:RAID}. The more disks you have and the faster they each are, the faster your logical drive will be. This is generally indepedent from whether they are connected to a local RAID controller or “simply” sit in the SAN somewhere in your data center.

So as soon as you compare the SAN to a local RAID system (and I am not talking about the RAID-capabilities many motherboards offer these days but dedicated RAID controllers) the perceived performance gap goes pretty much away. You may even have an advantage with local disks because you have direct control over how many drives your file system spreads while with a SAN this may be a bit more difficult to achieve.

I will leave it here for now. Hopefully this somewhat demystifies things for you.

Asynchronous Communication

In the context of SOA (Service-Oriented Architecture) there has been a revival of the asynchronous communication pattern. And that is really what it is: a pattern. First and foremost we are not talking about a specific product, protocol or API but simply a way how to design systems and applications. After this has (hopefully) been clarified between us, let’s look at some of the in my opinion important aspects. I start off with a (certainly non-academic) definition:

Asynchronous communication simply means that when making a call into an “external system”, my program/component/service etc. does not expect an immediate response with the actual result (this would be synchronous communication). I am only interested in getting a positive acknowledgement that my request has been received (but not processed) correctly. I don’t care about any further logic that is going to be executed in some other program. Instead things continue on my side and at some other place the result of my request may be received and processed further. (There are quite a few use-cases when I don’t expect a result at all.)

At a first glance this may seem way more complicated than simply making a call, wait for the result and then continue. And to a degree this perception is correct, although I must say that in my view people exaggerate greatly here. In all the cases I have seen so far the really bad problems were not caused by the pattern itself. Rather it was a cumbersome integration on the tool-level. If you have to bother with e.g. complicated mappings before you can send data over to your message broker from the “regular” code, this will certainly inhibit the use of the pattern (and rightly so). But you should blame your tool set and not the pattern in this case!

What is indeed a bit challenging when you first start using this pattern, is that it requires a fundamental shift of your mindset for designing your system or application. What you need is a loose coupling of your components. Technically speaking this means that there must be another component that is ready to receive a response matching your original request and then continues working on it.

(If you followed the discussion around SOA lately, you may have come across that term “loose coupling” more than once already. So you can think of asynchronous communication as a means for reaching this design goal. Bear in mind though, that we only look at the communication layer here! Loose coupling should also be concerned about the semantics, so in technical terms the data model needs to support this as well. However, I wanted to discuss asynchronous communication here, so let’s leave it with loose coupling for now.)

With one component sending out a request and another one handling the response, we have a solid foundation for a highly distributed system . Yes, you can also design a distributed system with synchronous communication but this is probably more difficult. What comes to mind when talking about distributed systems is scalability. More specifically, we are talking about horizontal scalability (scale out), which is spreading the load over more machines instead of putting more resources into a single machine. Having many relatively small systems work on things typically has two main advantages compared to going for bigger machines: Firstly the approach scales further and secondly it is cheaper because you can use standard hardware.

Another big plus of asynchronous communication is robustness, provided you are using a message broker that offers guaranteed delivery. Once your message broker has confirmed the receipt of the request you can be sure that it will be delivered to the receiver. (You can use the publish-subscribe pattern to allow the sender not having to know about receivers. More on that in a separate post later.) And it is certainly easier to only have to make the message broker highly available than to do the same for each and every piece of code. So by having the message broker as a central piece of infrastructure you can reach a high level of high availability relatively easily.

That’s it for now, there will be more posts on related topics.

Business Activity Monitoring (BAM) and Related Areas

When you work in the software industry you are probably used to seeing things change rapidly. One of the disadvantages that come with this is a “heterogeneous” use of terms. It is often due to the fact that we do not want to bother ourselves with precise definitions. A very sloppy use of language is the natural consequence.

As some people have noticed in the past I tend to speak rather neatly. I can attribute this to my parents who were very much concerned with a correct usage of language (my father used to work as a German teacher after all). So I may be a bit more aware of the implications of an imprecise use of language. Therefore, today I want to look at some terms that are often used either without much differentiation or without a precise notion of the relationships between them. They are

  • Monitoring,
  • Business Activity Monitoring (BAM) and
  • Reporting

Monitoring is mostly used to describe an operational activity that looks after currently ongoing things at an individual level. It typically deals with questions like “What activities in my organization are currently causing problems?”. In a Business Process Management (BPM) context this means checking that all process instances are running fine. An operator will query the BPM system and ask for process instances that failed entirely, exceeded timeout limits for single process steps etc. Also automated activities fall into this category. So if you have set up an alert that alarms a human being or another system if some threshold has been exceeded, I would also consider this as monitoring.

However, we are gradually moving into a grey area here. Depending on the type of the rule that is associated with this alert, the focus moves more towards the Business Activity Monitoring (BAM) area. So imagine a threshold that is not concerned about a single incident any more. Rather it can be of a “composite” nature and look at an average for a certain time frame (e.g. average order amount per day of the week). Such figures are usually called Key Performance Indicators (KPIs). Managers tend to be much more interested in KPIs than in the characteristics of a single, although very important, activity in the organization. So instead of asking “How are we doing with order 5297523?” they would like to know “How is our on-time order fulfillment rate today?” You can think of this as monitoring for operations management. If you want to know how you are doing NOW from an overall perspective, BAM is what you probably need.

Take away the “now” attribute from this and you pretty much end up with Reporting. This answers question like “How was our on-time order fulfillment rate this month?”. In many cases this will be closely linked to Business Intelligence (BI) and Data Warehousing (DWH) topics. Some of the BI/DWH providers have also realized that their customers do not only want to know how they did in the last month but how things are going at the moment. The simple reason is that looking into the past is good to identify structural issues that can then be tackled. However, it does not help to identify a current or upcoming problem and avoid its worst consequences by taking instant counter measures.

Being able to react in almost real-time to what’s going on, is the real value of BAM. The later you realize that something is going wrong the fewer options you have and the more likely it becomes that you can only try to repair damage already done. Although implicitly mentioned in the above paragraph, I want to highlight that BAM is not only concerned with existing but also with upcoming problems. You should expect a BAM solution to be able to identify trends and alert you based on that.

So there are distinct, yet related use-cases for all these functions and you definitely want to combine them to get most from them. I can well imagine that over time these things will, at least to a degree, merge into broader solutions. One last word here: The statements given here can not reflect all possible variations and in your particular case things may seem to be different. Don’t be disturbed by that feeling. Rather analyze the situation and you will probably discover that the general lines are still the same, although the details may vary.