Tag Archives: VCS

Coordinating Distributed Development in Projects

I quickly wanted to kick around an idea about how to make distributed development within a project easier. Project for the purpose of this writing is not a typical open source project but something more along the lines of a commercial environment. So you will usually have 10-50 people working on a solution with access to common infrastructure (corporate network, central Version Control System, etc.).

So far we basically tell people that they can solve this either on an organizational or technical level. Organizational more or less means phone or email, while technical means pessimistic locking in the Version Control System (VCS). Both have a number of disadvantages and will increasingly be challenged.

I will start with looking at the VCS-based approach.  This can be seen as doing the coordination ex post, meaning that upon check-in people realize that someone else made conflicting changes or objects are locked in VCS. The result is unnecessary additional work using diff/merge functionality. So we can happily start a diff/merge exercise and try to make our work fit into the other changes that were done. The common view is that this is perfectly acceptable. What is often overlooked though, is that many such statements are/were made in the context of open source projects, where there is no chance to coordinate upfront. So I look at it as a last resort and not as something that should occur on a regular base because two people accidentally worked on the same thing. Rather diff/merge would be needed for porting stuff between branches and apart from that or similar scenarios should in general be avoided.

With the pessimistic locking approach we are (ab)using a VCS on a conceptual level to coordinate work. And technically not all VCS support pessimistic locking. This becomes increasingly important when we speak about distributed VCS like Git or Mercurial. One other issue with pessimistic locking is that it usually does not prevent you from checking out but from checking in again. And if you forget to acquire a lock, you will have spent time on some work only to find out later that you are basically screwed because someone else also worked on it and did not forget to acquire the lock.

So when you look at all these points, the question raises why the VCS locking approach is still favored by most folks compared to the upfront coordination. The latter would be done by letting everyone know that I will now start working on something. My educated guess is that the most important factor for preferring the VCS locking is that it’s integrated in the toolchain and hence automated. Also, I don’t need to think about who needs the information to populate my email’s To: field nor do I have to switch to another program.

So what if we could combine the ex ante aspect of informing people with the  integrated and publish-subscribe nature of the VCS lock? You actually often find this in server-based development environments, where people do not work against a local workspace but a central development instance of the system. These environments usually offer a command to lock certain objects on the server; the lock request is not expressed towards the VCS but the “live” system and I cannot perform any change without having acquired such a lock. I have worked with such a system for many years and while there are certain drawbacks to the shared nature of it, in most circumstances and from a productivity point of view it is just great.

So we need to find a way to incorporate this “live locking” into a setup with many disparate development environments.  In terms of implementation we could probably leverage the Eclipse Communication Framework (ECF) and e.g. an XMPP-based IM server. The workflow would roughly look like this:

  • After installation the user configures a “Coordination Server” (which would be an XMPP server) and his/her account there
  • For each Eclipse project there is a “chat room” or something similar that basically plays the role of a topic (in JMS terms). Whenever someone opens a project of that name, the respective Eclipse instance will be added to that chat room.
  • There is a “lock/unlock” entry in the context menu of all objects (e.g. classes). Whenever someone clicks on of those a respective message is sent to the chat room and picked up by all subscribers.
  • The open question for me is how to persist that message in a stateful manner and all the associated questions around conflict resolution etc. In general I would favour a mostly manual approach here, because it would make the design/implementation a whole lot easier.
  • These machine-generated messages adhere to some naming convention, so that they can be processed/filtered easily. All other messages go through and can be used for human-to-human communication.

These are my initial thoughts and I look forward to your comments, so please feel free to share them.

Version Control Systems and other Repositories

Recently, a few colleagues and I had a very interesting discussion about what should go into a Version Control System (VCS) and what should not. In particular we were arguing as to whether things like documents or project plans should go in. Here are a few things that I came up with in that context.

I guess the usage of VCS (and other repositories) somehow comes down to a few general desires (aka use-cases):

  • Single source of truth
  • History/time machine
  • Traceability
  • Collaboration
  • Automation of builds etc.

In today’s world with its many different repositories you can either go for a mix (best-of-breed) or the lowest common denominator which is usually the VCS. So what’s stopping people from doing it properly (=best of breed)?

  • Lack of conceptual understanding:
    • Most people involved in those kinds of discussion usually come from a (Java) development background. So there is a “natural” tendency to think VCS. What this leaves out is that other repositories, which are often DB-based, offer additional capabilities. In particular there are all sorts of cross checks and other constraints which are being enforced. Also, given their underlying architecture, they are usually easier to integrate with in therms of process-driven approaches.
    • Non-technical folks are mostly used to do versioning-by-filename and require education to see the need for more.
  • Lack of repository integration: Interdependent artefacts spread over multiple repositories require interaction, esp. synchronisation. Unless some kind of standard has emerged, it is a tedious task to do custom development for these kinds of interfaces. Interestingly, this goes back to my post about ALM needing middleware.
  • Different repositories have clients working fundamentally differently, both in terms of UI and underlying workflow (the latter is less obvious but far-reaching in consequence). Trying to understand all this is really hard. BTW: This already starts with different VCS! As an example just compare SVN, TFS and Git (complexity increasing in that order, too) and have “fun”.
  • Lack of process: Multiple repositories asking for interaction between themselves also means that there is, at least implicitly, a process behind all this. Admittedly, there is also a process behind a VCS-only approach, but it’s less obvious and its evolvement often ad-hoc in nature. With multiple repositories a more coordinated approach is required to the process development, also because often this means crossing organisational boundaries.

Overall, this means that there is considerable work to be done in this area. I will continue to post my ideas here and look forward to your comments!

Tooling for Agile and Traditional Development Methodologies

A hot topic of the last few years has been the debate as to whether traditional (aka waterfall-like) methodologies or agile ones (XP, SCRUM, etc.) deliver better results. Much of the discussion that I am aware of has focused on things like

  • Which approach fits the organization?
  • How strategic or tactical (both terms usually go undefined) is the project and how does this affect the suitability of one approach over the other?
  • What legal and compliance requirements must be taken into account?
  • How large and distributed is the development team?

This is all very important stuff and thinking about it is vital. Interestingly, though, what has largely been ignored, at least in the articles I have come across, is the tooling aspect. A methodology without proper tool support has relatively little practical value. Well, of course the tools exist. But can they effectively be used in the project? In my experience this is mostly not the case, when we speak about the “usual suspects” for requirements and test management. The reason for that is simply money. It comes in many incarnations:

  • Few organizations have enterprise licenses for the respective tools and normally no budget is available for buying extra licenses for the project. The reason for the latter is either that this part of the budget was rejected, or that it was forgotten altogether.
  • Even if people are willing to invest for the project, here comes the purchasing process, which in itself can be quite prohibitive.
  • If there are licenses, most of these comprehensive tools have a steep learning curve (no blame meant, this is a complicated subject).
  • No project manager, unless career-wise suicidal, is willing to have his budget pay for people getting to know this software.
  • Even if there was budget (in terms of cash-flow), it takes time and often more than one project to obtain proficiency with the tools.

Let’s be clear, this is not product or methodology bashing. It is simply my personal, 100% subjective experience from many projects.

Now let’s compare this with the situation for Version Control Systems (VCS). Here the situation looks quite different. Products like Subversion (SVN) are well-established and widely used. Their value is not questioned and every non-trivial project uses them. Why are things so different here and since when? (The second part of the question is very important.) VCSes have been around for many years (RCS, CVS and many commercial ones) but none of them really gained the acceptance that SVN has today. I cannot present a scientific study here but my gut feeling is that the following points were crucial for this:

  • Freely available
  • Very simple to use, compared to other VCS. This causes issues for more advanced use-cases, especially merging, but allows for a fast start. And this is certainly better than avoiding a VCS in the first place.
  • Good tool suppport (e.g. TortoiseSVN for Windows)

Many people started using SVN under the covers for the aforementioned reasons and from there it gradually made its way into the official corporate arena. It is now widely accepted as the standard. A similar pattern can be observed for unit-testing (as opposed to full-blown integrating and user acceptance testing):  Many people use JUnit or something comparable with huge success. Or look at Continuous Integration with Hudson. Cruise Control was around quite a bit longer but its configuration was perceived to be cumbersome. And on top of its ease-of-use Hudson added something else: extensibility via plug-ins. The Hudson guys accepted upfront that people would want to do more than what the core product could deliver.

All these tools were designed bottom-up coming from people who knew exactly what they needed. And by “sheer coincidence” much of this stuff is what’s needed for an agile approach. My hypothesis is that more and more of these tools (narrow scope, free, extensible) will be coming and moving up the value chain. A good example is the Framework for Integrated Test that addresses user acceptance tests. As this happens and integration of the various tools at different levels progresses, the different methodologies will also converge.

USVN with CentOS 5

If you are looking for a Subversion web interface, chances are you come across USVN (User-friendly SVN). I first used it in August 2009 during a complex proof-of-concept (PoC). The current version at the time was 0.7.2 and it was of great help. Nevertheless there were a few things missing, esp. LDAP support. So I was really happy to recently learn that the project is being continued (it is an end-of-studies project) and in fact one of the first new features is support for LDAP.

One of the challenges I came across during the installation was the systems check that reported “Subversion has not been detected”. This simply means that the Subversion client binary (svn) was not found on the search path (PATH). The reason for this in my case was the fact that I had done a custom installation of Subversion and not relied on the one that comes with CentOS. For details on this please check this post where I also present a way to custom-define environment variables for the Apache web server. Here is the respective snippet with the search path added (my changes are in bold)start() {
echo -n $"Starting $prog: "
check13 || exit 1
LANG=$HTTPD_LANG LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/CollabNet_Subversion/lib PATH=$PATH:/opt/CollabNet_Subversion/bin daemon --pidfile=${pidfile} $httpd $OPTIONS
[ $RETVAL = 0 ] && touch ${lockfile}
return $RETVAL
With this amendment the system check passed just fine. It should be noted, however, that at least for v1.0.1 this check is not complete. E.g. it misses on PHP support for the database. So you most likely also want to install php-pdo and php-mysql:yum install php-pdo php-mysql SQLite did not work at a first try whereas MySQL did, so I went for the latter.

Use CollabNet Subversion with Regular Apache

CollabNet are providing up-to-date binary packages of Subversion for many platforms. In my case this is CentOS 5, which by itself only has a rather dated version of Subversion. So I downloaded and installed the client, server and extras packages from CollabNet. The server package comes with a bundled Apache and a pretty nice installation script. However, I wanted to use my regular Apache for hosting the Subversion repositories, which means that I had to include the Apache modules from the CollabNet installation. So here are the respective lines from /etc/httpd/conf/httpd.confLoadModule dav_svn_module /opt/CollabNet_Subversion/modules/mod_dav_svn.so
LoadModule authz_svn_module /opt/CollabNet_Subversion/modules/mod_authz_svn.so
Those modules require access to additional libraries from /opt/CollabNet_Subversion/lib, so Apache needs to be told to include this directory into the search path (LD_LIBRARY_PATH). The bold part in the below snippet from /etc/init.d/httpd shows what needs to be added:start() {
echo -n $"Starting $prog: "
check13 || exit 1
LANG=$HTTPD_LANG LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/CollabNet_Subversion/lib daemon --pidfile=${pidfile} $httpd $OPTIONS
[ $RETVAL = 0 ] && touch ${lockfile}
return $RETVAL
Simply sourcing in LD_LIBRARY_PATH does not work, because the daemon function calls a separate Bash instance. The only way to feed environment variables into Apache, was by prepending them as shown above. This is also the approach to take for extending the PATH variable (which I needed to do for including /opt/CollabNet_Subversion/bin).

Related posts

Git Links

Here is a number of links to resources I found useful.

Using Git without Shell Access

If you want to host a Git repository there should normally be no shell access for all the people that need access to the repositories. So far many people have used gitosis to achieve this. Now there is a “new kid on the block”, called gitolite. I have not really used it in practice so far, but the added functionality looks promising and I also like the fact that it’s written in Perl. There is also a chapter about it in the Pro Git book.

Why Subversion’s “svn:externals” is bad

Subversion provides a property (svn:externals) to include references to other projects into a given location within your source code tree. This is pretty much the same as a symbolic link (symlink) in Unix/Linux. But while the usage of symlinks is good practice to de-couple things in the file system, it is just the other way around for svn:externals, at least in my opinion.

Interestingly enough, there is a number of sources that recommend its usage. I disagree here and strongly discourage people from making use of it for a number of reasons:

  1. It creates a lock-in into Subversion, because many other Version Control Systems (VCS) do not have a comparable feature. And even if they had, an automated migration will most likely be cumbersome, to say the least. One would have to find and extract all svn:externals properties, build a dependency tree (hopefully without circular dependencies) and and process things in the appropriate order. This is far from trivial!
  2. On a conceptual level svn:externals is not about version control but dependency management. This is even more important an argument than the lock-in effect.
    • Dependency management and version control are two entirely different things, which should not be mixed. Having a somewhat implicit mechanism to define the dependencies will make it easier for people to not have a clear understanding about this separation.
    • Dependencies are hidden and only show up during VCS operations. To find out a project’s dependencies, in theory one could dig through the repository with a special browser but this is not feasible for a large enough project.
    • Different dependencies can occur at different stages of an artifact’s life-cycle: compilation, unit testing, run-time etc. There is no way to reflect this requirement.
    • Other dependency management systems (e.g. Maven or Ivy for Ant) offer way more functionality and can be extended for additional requirements. Those customisations would have to go into hook scripts for Subversion (which, on top of things, would probably be OS-specific).
  3. Quite often the dependencies will be about artifacts that are not source code at all (usually third-party libraries). You may not want to have compiled artifacts in your VCS.
  4. Also, the dependency could be about source code that is maintained by an external organisation. If they are not using Subversion you could not link directly there but would have to set up a mirror internally. (Admittedly, you may want to do that anyway.)

I had used svn:externals when I started out with Subversion and have gone through quite some headaches since then because of that. Practically, most of them were around the lock-in effect. Nevertheless, I still think the conceptual argument is more important in the long run.