Sunday, November 27, 2005

Can you base your Enterprise Architecture completely on open source software?

Last week, I touched on the medias coverage of open source software, and also gave my answer to a question that seems to be becoming more common in the media, which was "Is open source better?".

This week, I want to touch on the topic of building your entire enterprise architecture around open source software, and is this possible?

First, let me say, that I think that it is not only possible, but may be the best thing that you could do. When you look at any decent size company today, they all have very similar needs, and mostly similar problems as well.

Let's talk about needs first. Everyone needs a basic computing infrastructure, comprised of servers, network and storage. While there is a lot that can be discussed on these needs alone, I'll save that for another time. Where servers are concerned though, what server platforms are not supported by Linux these days? Really, there are no major platforms, that one would consider in an enterprise context, that does not support Linux today (with maybe one exception). They range from the Intel and AMD industry standard servers, Itanium based systems from vendors like Hewlett-Packard, through to Power 5 and 5+ systems from IBM. In the case of the Power 5 based systems, Linux even supports SMT (Simultaneous Multi-Threading) and Micro-partitioning (the most advanced features of the platform). The one notable exception is Sun Microsystems with their UltraSparc line of hardware. Of course, it is supported on their AMD based products, which are quite good. So, as far as servers are concerned, you have the widest possible net if you want to base your server infrastructure on the Linux operating system, and can even have a mix of hardware from various vendors and still manage it through one operating system. A major advantage to a Linux based OS strategy!

Of course, where network and storage infrastructure are concerned, a Linux based server infrastructure will work with any network infrastructure, up to 10 gigabit Ethernet solutions.

Where storage infrastructure is concerned, storage from all the major players support Linux, from EMC, to Hitatchi to IBM and Network Appliance. The major HBA (Host Bus Adapter) vendors also support Linux, whether that is Fibre Channel, iSCSI or just plain Ethernet via the NFS protocol. Of course, there are some interesting products from smaller vendors that cover both network and storage infrastructure that have actually built their products from an open source base, such as Linux with an XFS file system for storage, network switches and routers that use Linux internally as the OS, etc. These vendors would not typically find themselves in a large enterprise setting, but I think over time, this will change. These products will be commoditized just like infrastructure software, and eventually players in this space will find themselves penetrating the enterprise.

So, we have set the basic compute infrastructure, and we need to address the layers above that. There are various choices for the middleware in the enterprise. In no particular order, you can use what is called LAMP (Linux, Apache, MySQL, Perl, Python or PHP). This stack is increasingly becoming popular and their are a couple of variations on this. One variation is LAMJ (Linux, Apache, MySQL, Java). The Java part of this stack is typically composed of Tomcat as a standalone servlet container, or Tomcat and JBoss (typically with Hibernate for Object/Relational mapping) if your needs extend beyond the servlet container into the full Java enterprise edition stack. In fact, I have used Tomcat embedded in JBoss, even when the application did not require the other components of the Java enterprise edition stack, because JBoss has very robust connection pooling and high-availability clustering built in. Therefore, you can take advantage of those attributes without having to add those types of features to Tomcat. There is even a second variation of LAMP, which I have not seen an acronym for, but for lack of a better name I will call it LAMR (Linux, Apache, MySQL and Ruby on Rails). Ruby on Rails is becoming very popular with developers, for its' sure productivity over pretty much everything else. While it is an interpreted environment, much the way PHP, Perl and Python are, they provide the complete abstraction for all tiers of an n-tier architecture. Interpreted environments will certainly be slower than non-interpreted environments, but hardware has become fast enough that many enterprise environments could easily be run by Ruby or the other interpreted environments. Eventually, this will be true about all workloads, and the debate over speed and scalability of these types of environments over others will be a mute point. Of course, they won't be faster than the other approaches, but all anyone needs is for it to be fast enough for their workload. The productivity improvements will far outweigh the speed disadvantages.

Aside from the middle-tier components, we mention the MySQL database. Of course, there is another alternative to MySQL, and that is PostgreSQL. PostgreSQL has always been more feature complete, and better conforming to the ANSI SQL standard than MySQL. MySQL has been considered the fastest open source database. My own experience seems to back this up, with PostgreSQL improving on the performance front, and MySQL improving on the feature front. In either case, I don't think you can go wrong with either choice, but you should test the two to make sure that it fits your needs. In both cases, you can now get commercial support. Of course, you could always get that for MySQL from MySQL AB. In the PostgreSQL world, there have been quite a few companies who have come and gone trying to create businesses from PostgreSQL support. Recently there have been some interesting developments in the PostgreSQL community with companies like Enterprise DB (http://www.enterprisedb.com), or GreenPlum (http://www.greenplum.com). Greenplum specializes in data warehousing uses for PostgreSQL.

With the basic compute infrastructure in place, and now the middleware for the middle-tier and the persistence mechanism, through a relational database, we have a very complete platform for delivering enterprise applications. Besides the basic middleware pieces, other needs include the ability to horizontally scale, vertically scale, and have the infrastructure be highly available.

In terms of horizontal scalability, all of the above solutions will work for a horizontally scaled solution. Any of the hardware platforms, can be purchased in small node sizes, like two processor configurations, and used in this way. The higher-end servers, can be partitioned for this type of deployment. The middleware, such as Apache, Tomcat, JBoss, the plugins for Apache for PHP, Perl, Python and Ruby, can all be deployed in a horizontal scaling solution, and you can even use Apache as the load balancer to distribute the workload. In that case, though you will need an OS level clustering solution, as I am unaware of built in Apache solution for failover. If there is one, someone please let me know.

In terms of vertical scalability, Apache and the Java middleware (Tomcat, JBoss, Hibernate) will scale vertically on very large systems quit nicely. In my own experience, I have seen Tomcat and JBoss scale linearly on 16 CPU systems, and there doesn't really seem to be a limit here. On the database side of things, PostgreSQL has made recent improvements and scales nicely on 8 CPU systems. I haven't seen any data that goes beyond that, but I would bet that it will scale beyond 8 CPU's (if anyone has any experience with this, let me know). I my testing, and other third party testing for MySQL I have seen very good vertical scaling on 8 CPU systems as well, and I also believe that it would scale beyond that (again, if anyone has any experience with this, let me know). While I think that horizontal scaling is becoming the way to go, because the price/performance of vertical scaling doesn't match it, vertical scaling may be necessary for some applications because of design constraints within the applications, so both are probably necessary in the near term.

In terms of high availability, all of these components have clustering, replication and failover capabilities. You can use things like Red Hat's cluster suite for clustering at the OS level. JBoss, has a sophisticated clustering mechanism that even provides for transactional aware persistence clustering. MySQL has had clustering and replication for quite some time, and the latest release (version 5.0.x) has brought that to a higher level still. Sabre is using these capabilities to horizontally scale, and provide high availability for its Travelocity travel website. Certainly, that is a ringing endorsement for and enterprise setting. That brings up to some of the common problems that can be address with an open source based enterprise architecture.

Some of the common problems enterprises face are identity (many sources of identity), reliability of infrastructure (middleware) and manageability. These are just scratching the surface, but I will stick to these three problem areas.

Where identity is concerned there are now a couple of open source projects that can do Web single sign-on or SSO, and there is one in particular that can do full blown identity management with federated identity from within the enterprise, and across enterprises. That solution is from Ping Identity (http://www.pingidentity.com). They have an appliance like solution that is based on the open source project hosted at http://www.sourceid.org/. This is by far the most complete solution I have seen in the open source community, and also seems to be one of the easiest solutions to setup. They have over 1,000 deployments in some very large organizations, so it also seems to be tested in an enterprise setting.

One of the other problems that all enterprises face, is the reliability of its infrastructure. As we talked about last week, open source software is more reliable, and has fewer bugs than their closed source counterparts. In my experience, I have seen a SNORT intrusion detection system, running on Linux, stay operational in a production environment without a single incident for almost 600 days. This system is actually still running, so we will probably surpass the 2 year mark before too long. I gave much more data in last weeks post, so if you are interested in more detail see last weeks post.

Finally, we will discuss manageability. Red Hat, JBoss, MySQL and other companies actually have service offerings that help with patch management, monitoring, alerting, deployment, configuration, etc. These offerings allow you to do very large scale deployments with few administrators, up to and including 100's to 1000's of server instances. These tools are rapidly maturing, and help you not only to do the initial deployment, but manage it from a central location at very large scale. If you add something like the GroundWork's (http://www.itgroundwork.com) monitoring suite, based on open source projects like Cacti and Nagios, you can also monitor you entire environment very efficiently.

Put it all together, and you have a rather complete open source based enterprise architecture, that will be highly reliable, performant, scalable, and manageable. What more could you ask for?

Saturday, November 19, 2005

Media Coverage of Open Source Software

I am constantly reading technology magazines online. What I find most interesting about the coverage of open source software is how it all seems to be the same. If the magazine is about open source, it is always positive coverage, with nothing negative ever said, or rarely said. If the magazine is a more general publication then the coverage is always straining to be "fair", in that they always have to try to point out something negative in mostly positive coverage. Many times these statements are hackneyed phrases that are used over, and over and over. Some of the statements are statements that may have once been true, but aren't true anymore. Even analyst firms seem to be stuck in a time warp when open source is concerned. I think that the progress of open source projects is just too fast for the general media to keep up with.

I once participated in a research study about Linux, and the firm conducting the research had a scalability chart for operating systems that included the commercial UNIX's, Windows and Linux. They had Linux scaling to only four CPU's, and I pointed out in a conference call that Linux had surpassed four CPU scalability some time ago, and they were amazed. Then I pointed them at various benchmarks which were showing from 8 CPU's (an SAP benchmark) all the way to 64 CPU's, which was an SGI LINPACK benchmark. Shortly after our conference call there was also a 32 CPU TPC-C that had been published.

One of the questions the media seems intent on asking recently is the following:

Is open source software better?

I will tell you unequivocally, that it is better in all the aspects that really matter. What are those things that really matter? This can be quite subjective, but I believe that there are a number of aspects that everyone cares about, even if they haven't though about it in the way that I do.

First and foremost, open source software is of higher quality. What I mean by that is it has fewer bugs than equivalent commercial closed source products. This has been backed up many times now. The University of Wisconsin did a study entitled "Fuzz Revisited" that tested various UNIX operating system and Linux along with the GNU tools. What the did in the study is conduct testing called fuzz testing. This is when you throw random input at software and see how it reacts. To pass a test case, the software couldn't hang or crash. Each hang or crash was considered a test case failed. The results were pretty amazing. The best commercial closed source offering failed 20% of the tests, and the worst failed 45% of the tests. Linux and the GNU tools failed 9% and 6% respectively. That is from 55% to 87% fewer test failures! This study was done in 1995. Since then there have been other studies done, using static code analysis. Examples of these are the Reasoning studies, which compared the Linux TCP stack, MySQL database, etc. They also show very low defect rates compared to close source software. Here are some links to some of this information:

http://www.cs.wisc.edu/~bart/fuzz/fuzz.html
http://www.reasoning.com/pdf/Open_Source_White_Paper_v1.1.pdf
http://www.reasoning.com/pdf/MySQL_White_Paper.pdf
http://www.reasoning.com/pdf/Linux_Defect_Report.pdf
http://www-106.ibm.com/developerworks/linux/library/l-rel/


So this is one area where open source software is clearly better than their closed source competitors. Another aspect to look at is performance and scalability.

I have done many performance and scalability tests over the years, and open source software has always been very competitive. While I cannot give specifics for many of these tests, I can point at others that have talked publicly about their results. The web site Weather.com switched from WebSphere to Tomcat some time ago (http://www.computerworld.com/printthis/2004/0,4814,92583,00.html), and they stated that they achieved a substantial performance improvement on the same hardware. La Quinta switched from BEA Weblogic to JBoss (http://www.jboss.com/pdf/La_Quinta_Case_Study_FINAL.pdf), and they have a testimonial on the JBoss.com web site that states they saw 30% better performance with lower CPU utilization running their application. So in these two examples we see not only competitive performance and scalability but superior to both commercial application servers.

The final aspects are service and support. I can tell, from personal experience, that companies like Red Hat, JBoss, etc., give much better support than any commercial ISV that I have ever been involved with. What's the reason for this? It is quite simple. The business model of these companies aligns completely with the customer's needs. They don't get subscription renewals if they don't give good support, where with closed source ISV's, you have to renew if you want to get any support at all (usually contractually mandated), whether it is good or not. They also got your money up front in the license fee, so they have less of an incentive to give good support. They also staff their support organizations with people that are much lower paid then developers of their products, which means you get much less knowledgeable help in general. They also only support you if you have a certified deployment, according do what they want to support. Open source companies will support you based on what your needs are, not theirs. Finally, if they don't give you good support, you usually have the option of going elsewhere for support, turning to the community at large for support, or supporting the software yourself, if you have the proper in-house skills. These factors all add up to superior service and support!

To wrap things up, we have seen that open source generally provides higher quality software (less bugs), is competitive, and many times, superior in performance and scalability, and provides better service and support. All of this, without the up front costs of license fees! Who thinks open source is not better?

Monday, November 14, 2005

Business Person's Understanding of Open Source Software

I have been involved, from mostly a user's perspective, with open source software, in an enterprise setting, for the last five years. Over this time, I have dealt with lot's of misunderstandings, and sometimes plain ignorance of open source software. What is it? Is it any good? Should you ever use it? Is is safe? Who writes it? How do you get support? You name it, and I have had the questions.

Having said that, I thought I was past most of the questioning, but something popped up recently that caught me by surprise. It was again, a complete lack of understanding of open source software vs. traditional closed source software.

So once again, I find myself answering questions about open source software, and educating people on its merits, the business models that open source companies use. What companies like JBoss, Inc. call "Professional Open Source". Also, how pervasive, and even dependent we have become on open source software.

I find it awesome that 70% of all publicly accessible web servers are running the Apache web server. Of that 70%, 77% of those are running the Linux operating system. How Linux, which is certainly the most well know open source project is growing faster than any operating system, and continues to mature in all respects so quickly.

I am also a big proponent of Java, at least in an enterprise context, and I love the fact that the BZ Research survey shows JBoss as the number one application server in their survey. I saw another survey that was based on 95 Fortune 1000 companies, and it showed 23% of those companies were deploying JBoss as their application server.

I am also very encouraged by the SugarCRM project, and company, as they have built a customer base of some 300 companies already. I never would have thought that open source application software could be successful. Just goes to show you, that open source is here to stay, and it transcends any limits we may have thought it had.

Even with all of these examples, and open source companies, like Red Hat, JBoss, MySQL, SugarCRM, and others finding success, that business person's still need to be educated on open source software.

Have you experienced this, and if so, how did you deal with it? I have lot's of research that I have collected, that give fact based explainations for all of the questions I have received, but it would be nice to see what others have collected and used in support of open source software.