Sunday, November 27, 2005

Can you base your Enterprise Architecture completely on open source software?

Last week, I touched on the medias coverage of open source software, and also gave my answer to a question that seems to be becoming more common in the media, which was "Is open source better?".

This week, I want to touch on the topic of building your entire enterprise architecture around open source software, and is this possible?

First, let me say, that I think that it is not only possible, but may be the best thing that you could do. When you look at any decent size company today, they all have very similar needs, and mostly similar problems as well.

Let's talk about needs first. Everyone needs a basic computing infrastructure, comprised of servers, network and storage. While there is a lot that can be discussed on these needs alone, I'll save that for another time. Where servers are concerned though, what server platforms are not supported by Linux these days? Really, there are no major platforms, that one would consider in an enterprise context, that does not support Linux today (with maybe one exception). They range from the Intel and AMD industry standard servers, Itanium based systems from vendors like Hewlett-Packard, through to Power 5 and 5+ systems from IBM. In the case of the Power 5 based systems, Linux even supports SMT (Simultaneous Multi-Threading) and Micro-partitioning (the most advanced features of the platform). The one notable exception is Sun Microsystems with their UltraSparc line of hardware. Of course, it is supported on their AMD based products, which are quite good. So, as far as servers are concerned, you have the widest possible net if you want to base your server infrastructure on the Linux operating system, and can even have a mix of hardware from various vendors and still manage it through one operating system. A major advantage to a Linux based OS strategy!

Of course, where network and storage infrastructure are concerned, a Linux based server infrastructure will work with any network infrastructure, up to 10 gigabit Ethernet solutions.

Where storage infrastructure is concerned, storage from all the major players support Linux, from EMC, to Hitatchi to IBM and Network Appliance. The major HBA (Host Bus Adapter) vendors also support Linux, whether that is Fibre Channel, iSCSI or just plain Ethernet via the NFS protocol. Of course, there are some interesting products from smaller vendors that cover both network and storage infrastructure that have actually built their products from an open source base, such as Linux with an XFS file system for storage, network switches and routers that use Linux internally as the OS, etc. These vendors would not typically find themselves in a large enterprise setting, but I think over time, this will change. These products will be commoditized just like infrastructure software, and eventually players in this space will find themselves penetrating the enterprise.

So, we have set the basic compute infrastructure, and we need to address the layers above that. There are various choices for the middleware in the enterprise. In no particular order, you can use what is called LAMP (Linux, Apache, MySQL, Perl, Python or PHP). This stack is increasingly becoming popular and their are a couple of variations on this. One variation is LAMJ (Linux, Apache, MySQL, Java). The Java part of this stack is typically composed of Tomcat as a standalone servlet container, or Tomcat and JBoss (typically with Hibernate for Object/Relational mapping) if your needs extend beyond the servlet container into the full Java enterprise edition stack. In fact, I have used Tomcat embedded in JBoss, even when the application did not require the other components of the Java enterprise edition stack, because JBoss has very robust connection pooling and high-availability clustering built in. Therefore, you can take advantage of those attributes without having to add those types of features to Tomcat. There is even a second variation of LAMP, which I have not seen an acronym for, but for lack of a better name I will call it LAMR (Linux, Apache, MySQL and Ruby on Rails). Ruby on Rails is becoming very popular with developers, for its' sure productivity over pretty much everything else. While it is an interpreted environment, much the way PHP, Perl and Python are, they provide the complete abstraction for all tiers of an n-tier architecture. Interpreted environments will certainly be slower than non-interpreted environments, but hardware has become fast enough that many enterprise environments could easily be run by Ruby or the other interpreted environments. Eventually, this will be true about all workloads, and the debate over speed and scalability of these types of environments over others will be a mute point. Of course, they won't be faster than the other approaches, but all anyone needs is for it to be fast enough for their workload. The productivity improvements will far outweigh the speed disadvantages.

Aside from the middle-tier components, we mention the MySQL database. Of course, there is another alternative to MySQL, and that is PostgreSQL. PostgreSQL has always been more feature complete, and better conforming to the ANSI SQL standard than MySQL. MySQL has been considered the fastest open source database. My own experience seems to back this up, with PostgreSQL improving on the performance front, and MySQL improving on the feature front. In either case, I don't think you can go wrong with either choice, but you should test the two to make sure that it fits your needs. In both cases, you can now get commercial support. Of course, you could always get that for MySQL from MySQL AB. In the PostgreSQL world, there have been quite a few companies who have come and gone trying to create businesses from PostgreSQL support. Recently there have been some interesting developments in the PostgreSQL community with companies like Enterprise DB (http://www.enterprisedb.com), or GreenPlum (http://www.greenplum.com). Greenplum specializes in data warehousing uses for PostgreSQL.

With the basic compute infrastructure in place, and now the middleware for the middle-tier and the persistence mechanism, through a relational database, we have a very complete platform for delivering enterprise applications. Besides the basic middleware pieces, other needs include the ability to horizontally scale, vertically scale, and have the infrastructure be highly available.

In terms of horizontal scalability, all of the above solutions will work for a horizontally scaled solution. Any of the hardware platforms, can be purchased in small node sizes, like two processor configurations, and used in this way. The higher-end servers, can be partitioned for this type of deployment. The middleware, such as Apache, Tomcat, JBoss, the plugins for Apache for PHP, Perl, Python and Ruby, can all be deployed in a horizontal scaling solution, and you can even use Apache as the load balancer to distribute the workload. In that case, though you will need an OS level clustering solution, as I am unaware of built in Apache solution for failover. If there is one, someone please let me know.

In terms of vertical scalability, Apache and the Java middleware (Tomcat, JBoss, Hibernate) will scale vertically on very large systems quit nicely. In my own experience, I have seen Tomcat and JBoss scale linearly on 16 CPU systems, and there doesn't really seem to be a limit here. On the database side of things, PostgreSQL has made recent improvements and scales nicely on 8 CPU systems. I haven't seen any data that goes beyond that, but I would bet that it will scale beyond 8 CPU's (if anyone has any experience with this, let me know). I my testing, and other third party testing for MySQL I have seen very good vertical scaling on 8 CPU systems as well, and I also believe that it would scale beyond that (again, if anyone has any experience with this, let me know). While I think that horizontal scaling is becoming the way to go, because the price/performance of vertical scaling doesn't match it, vertical scaling may be necessary for some applications because of design constraints within the applications, so both are probably necessary in the near term.

In terms of high availability, all of these components have clustering, replication and failover capabilities. You can use things like Red Hat's cluster suite for clustering at the OS level. JBoss, has a sophisticated clustering mechanism that even provides for transactional aware persistence clustering. MySQL has had clustering and replication for quite some time, and the latest release (version 5.0.x) has brought that to a higher level still. Sabre is using these capabilities to horizontally scale, and provide high availability for its Travelocity travel website. Certainly, that is a ringing endorsement for and enterprise setting. That brings up to some of the common problems that can be address with an open source based enterprise architecture.

Some of the common problems enterprises face are identity (many sources of identity), reliability of infrastructure (middleware) and manageability. These are just scratching the surface, but I will stick to these three problem areas.

Where identity is concerned there are now a couple of open source projects that can do Web single sign-on or SSO, and there is one in particular that can do full blown identity management with federated identity from within the enterprise, and across enterprises. That solution is from Ping Identity (http://www.pingidentity.com). They have an appliance like solution that is based on the open source project hosted at http://www.sourceid.org/. This is by far the most complete solution I have seen in the open source community, and also seems to be one of the easiest solutions to setup. They have over 1,000 deployments in some very large organizations, so it also seems to be tested in an enterprise setting.

One of the other problems that all enterprises face, is the reliability of its infrastructure. As we talked about last week, open source software is more reliable, and has fewer bugs than their closed source counterparts. In my experience, I have seen a SNORT intrusion detection system, running on Linux, stay operational in a production environment without a single incident for almost 600 days. This system is actually still running, so we will probably surpass the 2 year mark before too long. I gave much more data in last weeks post, so if you are interested in more detail see last weeks post.

Finally, we will discuss manageability. Red Hat, JBoss, MySQL and other companies actually have service offerings that help with patch management, monitoring, alerting, deployment, configuration, etc. These offerings allow you to do very large scale deployments with few administrators, up to and including 100's to 1000's of server instances. These tools are rapidly maturing, and help you not only to do the initial deployment, but manage it from a central location at very large scale. If you add something like the GroundWork's (http://www.itgroundwork.com) monitoring suite, based on open source projects like Cacti and Nagios, you can also monitor you entire environment very efficiently.

Put it all together, and you have a rather complete open source based enterprise architecture, that will be highly reliable, performant, scalable, and manageable. What more could you ask for?

4 comments:

pheadron said...

This is a very complete article!

Andrig T Miller said...

Thanks. I should actually create a revised version of this, since this post is quite old now. Of course, the basic tenants remain the same.

Anonymous said...

We are in the process of migrating a character based application to web based application on Java and wish to know the suitability of Jboss for the same. Is Jboss a match for the commercial application servers?

Is it ok for a huge user base of over 20000 users across a wan? Any network bandwidth enhancement needed?

Andrig T Miller said...

Certainly, JBoss is up to pretty much any kind of application you may have. I deployed JBoss very early in its life with some very demanding use. Since I joined JBoss, as an employee back in 2006, I have personally seen a lot of large companies and applications deployed on JBoss. Of course, every deployment, and application have different needs, but I see no reason why you cannot use JBoss middle-ware for your application.