Monday, December 12, 2005

Open Source vs. Open Standards

I often have had, over the years, discussions about open standards versus open source, and whether you can consider them the same. Last week, I talked about an open source implementation of EJB3 from JBoss. In that case, EJB3 could be considered an open standard (some people don't consider the Java Community Process a true standards organization), and certainly JBoss implementation is open source.

While open standards and open source are certainly not the same, you have to ask yourself if the two are synonymous from the standpoint of their intent. Open standards are intended to bring interoperability to multiple implementations of the same technology. Open source is all about freedom, and one of the freedoms that is inherent in open source is interoperability. I say that, because if you want to create an interoperable implementation, it cannot be any easier than creating an interoperable implementation with the access to the source code of what you are trying to interoperate with!

I have had many experiences with closed source software that was standards based, but did not implement the standard in an interoperable way. Therefore, when you try to makes things work, you get stuck with some incompatibility that breaks something. Typically, standards do not have a compatibility test, or requirement, so vendors will play an interesting game. They can claim that they are standards compliant, without the customer being able to actually realize the benefits!

Over the past four or five years, I have come to the conclusion that an open source implementation is as good, or better than a closed source implementation of an open standard, and I have elevated open source to be considered equal to anything that implements an open standard. Specifically, I feel this way based on the ease in which you can actually achieve interoperability with the solution. Of course, open source implementations, and open standards are not mutually exclusive, as the JBoss EJB3 example shows. Also, Jabber and the IETF standard for XMPP are another good example, and Google's adoption of this will no doubt fuel true interoperability for the instant messaging market.

Monday, December 05, 2005

Software Productivity in the Java World

Recently, I have been playing around with the JBoss EJB3 implementation, and I have to say that I have been impressed with where the specification is heading. I have long been following the J2EE, and now the Java EE specifications. In the beginning, EJB was better than what I had been doing with CORBA, and seemed to have great promise. Of course, once we started to develop large scale applications, the weaknesses of the component model started to become evident.

The component model, led to many anti-patterns, in that in order to have any reasonable application, you needed to implement value or transfer objects, so you could pass results up to the web tier of an application. You needed business delegates, and value list handlers to make business logic easier to implement, and result sets from queries reasonable for the web tier to deal with. These are only some of the patterns that became common in J2EE development. With all of these, things just become more complicated for developers, and less productive than was anticipated by any of the original specifications developers anticipated. Also, the deployment descriptor approach, which on the surface seemed like it would make things easier for developers, really just made it harder, because it wasn't just coding Java classes, or components anymore, you had to also describe the behavior you wanted external to the code. This gave rise to XDoclet, which anyone who has done J2EE development, knows certainly that it helped alleviate some of the complexity, but introduced its own complexities. That brings us to the new EJB3 specification, and the JBoss implementation.

The first thing you will notice is that the entire component model (at least for the most part) is gone, and replaced with just Plain Old Java Objects (POJO). This is a great step forward, as it removes the complexity of home interfaces, remote and local interfaces, and you don't need to have separate transfer objects and the like, because your POJO can just implement Serializable, and it can be moved between any of the tiers of your application. Of course, the magic that makes all this happen, is the use of annotations, and having defaults that are reasonable, so you don't have to specify things that you shouldn't have to. In fact, they went to a development model by convention versus doing it by declaring everything through the deployment descriptors. For example, this is what an entity bean looks like:

package services.entities;

import java.io.Serializable;
import java.math.BigDecimal;
import java.util.List;

import javax.persistence.CascadeType;
import javax.persistence.Entity;
import javax.persistence.FetchType;
import javax.persistence.GeneratorType;
import javax.persistence.Id;
import javax.persistence.OneToMany;

@Entity
public class Order implements Serializable {

private long orderId;
private long customerId;
private Address shippingAddress;
private BigDecimal totalOrderAmount;
private List orderLines;

public Order() {

}

public Order(long customerId
, Address shippingAddress
, BigDecimal totalOrderAmount) {

this.customerId = customerId;
this.shippingAddress = shippingAddress;
this.totalOrderAmount = totalOrderAmount;

}

@Id(generate = GeneratorType.AUTO)
public long getOrderId () {

return orderId;

}

public void setOrderId(long orderId) {

this.orderId = orderId;

}

public long getCustomerId() {

return customerId;

}

public void setCustomerId(long customerId) {

this.customerId = customerId;

}

public String getShippingAddressLine1() {

return shippingAddress.getAddressLine1();

}

public void setShippingAddressLine1(String shippingAddressLine1) {

shippingAddress.setAddressLine1(shippingAddressLine1);

}

public String getShippingAddressLine2() {

return shippingAddress.getAddressLine2();

}

public void setShippingAddressLine2(String shippingAddressLine2) {

shippingAddress.setAddressLine2(shippingAddressLine2);

}

public String getShippingCity() {

return shippingAddress.getCity();

}

public void setShippingCity(String shippingCity) {

shippingAddress.setCity(shippingCity);

}

public String getShippingState() {

return shippingAddress.getState();

}

public void setShippingState(String shippingState) {

shippingAddress.setState(shippingState);

}

public int getShippingZipCode() {

return shippingAddress.getZipCode();

}

public void setShippingZipCode(int shippingZipCode) {

shippingAddress.setZipCode(shippingZipCode);

}

public int getShippingZipCodePlusFour() {

return shippingAddress.getZipCodePlusFour();

}

public void setShippingZipCodePlusFour(int shippingZipCodePlusFour) {

shippingAddress.setZipCodePlusFour(shippingZipCodePlusFour);

}

public BigDecimal getTotalOrderAmount() {

return totalOrderAmount;

}

public void setTotalOrderAmount(BigDecimal totalOrderAmount) {

this.totalOrderAmount = totalOrderAmount;

}

@OneToMany(mappedBy="order", cascade=CascadeType.ALL, fetch=FetchType.EAGER)
public List getOrderLines() {

return orderLines;

}

public void setOrderLines(List orderLines) {

this.orderLines = orderLines;

}

public boolean equals(Object o) {

if (this == o) {
return true;
}

if (o == null || getClass() != o.getClass()) {
return false;
}

final Order order = (Order) o;

if (!(orderId == order.orderId)) {
return false;
}

if (!(customerId == order.customerId)) {
return false;
}

if (!shippingAddress.equals(order.shippingAddress)) {
return false;
}

if (!totalOrderAmount.equals(order.totalOrderAmount)) {
return false;
}

if (!orderLines.equals(order.orderLines)) {
return false;
}

return true;

}

public int hashCode() {

int hashValue;

hashValue = Long.toString(orderId).hashCode()
+ Long.toString(customerId).hashCode()
+ shippingAddress.hashCode()
+ totalOrderAmount.hashCode()
+ orderLines.hashCode();

return hashValue;

}

}

Let me point out some interesting things. This is just a plain java object, nothing special. The thing that makes it an Entity bean, is the fact that it has the @Entity annotation. That is all there is to making it an entity bean. There are really only three other things in this example that are meaningful. First is the primary key of the object, for persistence purposes it is identified by the @Id annotation. It specifies that the orderId is the primary key, and that it is autogenerated. In this case, I am using a MySQL database underneath, and using an autoincrement column type. Of course, there are annotations that work across every relational database that matters, such as DB2 identity column types, Oracle sequences, etc. You can even customize it to create your own autogenerated types, so it is completely extensible in this regard. The second thing to notice is that I implement Serializable, and implemented the hashcode and equals methods. Just standard Java coding, nothing special. The final thing to notice, is the annotation that says @OneToMany. This annotation specifies that this entity has an association (actually it is an aggregate) with another entity, and what attribute that association is based on. It also specifies the fetch behavior that you want for the collection. Once again, this is very simple, and it is in the code, no special deployment descriptor required!

Everything else, the table that it maps to, the column mapping, etc., are all done by convention. If you simply make the table and object names the same you don't have to specify anything. If the column and attribute names are the same, you don't have to specify mapping for the columns either. Of course, you can use an @Table annotation, and an @Column annotation for specify these things, but why do that, just follow the convention and save yourself all the trouble.

In the old EJB model, everyone eschewed the use of entity beans, because you could have a lot of trouble creating a scalable application using them. In this new model, you have a full featured query language, and very simple persistence. You have a persistence context that you can specify through an annotation (@PersistenceContext), and an EntityManager class. You can just call the EntityManager persist method passing the entity object, and it will get saved to the database. You can create as fine a grained, or course a grained model as you want, and still be able to create a scalable application. What a difference!

Of course, I'm just scratching the surface, with the entity bean example, but you start getting the picture. One of the other really nice things about EJB3, is that by specification, it is embeddable in any application, not just ones running in an application server. This has opened up the capability to make all sorts of applications, and it makes the capability to unit test much easier. In fact, I recently figured out how to use the JBoss embeddable version of EJB3 and JUnit from within Eclipse to create unit test suites and unit tests that test the POJO's of EJB3. This gives another boost to developer productivity, especially if you follow a test driven development paradigm.

This is a great step forward. If you have any experience with this, let me know what you think.

Sunday, November 27, 2005

Can you base your Enterprise Architecture completely on open source software?

Last week, I touched on the medias coverage of open source software, and also gave my answer to a question that seems to be becoming more common in the media, which was "Is open source better?".

This week, I want to touch on the topic of building your entire enterprise architecture around open source software, and is this possible?

First, let me say, that I think that it is not only possible, but may be the best thing that you could do. When you look at any decent size company today, they all have very similar needs, and mostly similar problems as well.

Let's talk about needs first. Everyone needs a basic computing infrastructure, comprised of servers, network and storage. While there is a lot that can be discussed on these needs alone, I'll save that for another time. Where servers are concerned though, what server platforms are not supported by Linux these days? Really, there are no major platforms, that one would consider in an enterprise context, that does not support Linux today (with maybe one exception). They range from the Intel and AMD industry standard servers, Itanium based systems from vendors like Hewlett-Packard, through to Power 5 and 5+ systems from IBM. In the case of the Power 5 based systems, Linux even supports SMT (Simultaneous Multi-Threading) and Micro-partitioning (the most advanced features of the platform). The one notable exception is Sun Microsystems with their UltraSparc line of hardware. Of course, it is supported on their AMD based products, which are quite good. So, as far as servers are concerned, you have the widest possible net if you want to base your server infrastructure on the Linux operating system, and can even have a mix of hardware from various vendors and still manage it through one operating system. A major advantage to a Linux based OS strategy!

Of course, where network and storage infrastructure are concerned, a Linux based server infrastructure will work with any network infrastructure, up to 10 gigabit Ethernet solutions.

Where storage infrastructure is concerned, storage from all the major players support Linux, from EMC, to Hitatchi to IBM and Network Appliance. The major HBA (Host Bus Adapter) vendors also support Linux, whether that is Fibre Channel, iSCSI or just plain Ethernet via the NFS protocol. Of course, there are some interesting products from smaller vendors that cover both network and storage infrastructure that have actually built their products from an open source base, such as Linux with an XFS file system for storage, network switches and routers that use Linux internally as the OS, etc. These vendors would not typically find themselves in a large enterprise setting, but I think over time, this will change. These products will be commoditized just like infrastructure software, and eventually players in this space will find themselves penetrating the enterprise.

So, we have set the basic compute infrastructure, and we need to address the layers above that. There are various choices for the middleware in the enterprise. In no particular order, you can use what is called LAMP (Linux, Apache, MySQL, Perl, Python or PHP). This stack is increasingly becoming popular and their are a couple of variations on this. One variation is LAMJ (Linux, Apache, MySQL, Java). The Java part of this stack is typically composed of Tomcat as a standalone servlet container, or Tomcat and JBoss (typically with Hibernate for Object/Relational mapping) if your needs extend beyond the servlet container into the full Java enterprise edition stack. In fact, I have used Tomcat embedded in JBoss, even when the application did not require the other components of the Java enterprise edition stack, because JBoss has very robust connection pooling and high-availability clustering built in. Therefore, you can take advantage of those attributes without having to add those types of features to Tomcat. There is even a second variation of LAMP, which I have not seen an acronym for, but for lack of a better name I will call it LAMR (Linux, Apache, MySQL and Ruby on Rails). Ruby on Rails is becoming very popular with developers, for its' sure productivity over pretty much everything else. While it is an interpreted environment, much the way PHP, Perl and Python are, they provide the complete abstraction for all tiers of an n-tier architecture. Interpreted environments will certainly be slower than non-interpreted environments, but hardware has become fast enough that many enterprise environments could easily be run by Ruby or the other interpreted environments. Eventually, this will be true about all workloads, and the debate over speed and scalability of these types of environments over others will be a mute point. Of course, they won't be faster than the other approaches, but all anyone needs is for it to be fast enough for their workload. The productivity improvements will far outweigh the speed disadvantages.

Aside from the middle-tier components, we mention the MySQL database. Of course, there is another alternative to MySQL, and that is PostgreSQL. PostgreSQL has always been more feature complete, and better conforming to the ANSI SQL standard than MySQL. MySQL has been considered the fastest open source database. My own experience seems to back this up, with PostgreSQL improving on the performance front, and MySQL improving on the feature front. In either case, I don't think you can go wrong with either choice, but you should test the two to make sure that it fits your needs. In both cases, you can now get commercial support. Of course, you could always get that for MySQL from MySQL AB. In the PostgreSQL world, there have been quite a few companies who have come and gone trying to create businesses from PostgreSQL support. Recently there have been some interesting developments in the PostgreSQL community with companies like Enterprise DB (http://www.enterprisedb.com), or GreenPlum (http://www.greenplum.com). Greenplum specializes in data warehousing uses for PostgreSQL.

With the basic compute infrastructure in place, and now the middleware for the middle-tier and the persistence mechanism, through a relational database, we have a very complete platform for delivering enterprise applications. Besides the basic middleware pieces, other needs include the ability to horizontally scale, vertically scale, and have the infrastructure be highly available.

In terms of horizontal scalability, all of the above solutions will work for a horizontally scaled solution. Any of the hardware platforms, can be purchased in small node sizes, like two processor configurations, and used in this way. The higher-end servers, can be partitioned for this type of deployment. The middleware, such as Apache, Tomcat, JBoss, the plugins for Apache for PHP, Perl, Python and Ruby, can all be deployed in a horizontal scaling solution, and you can even use Apache as the load balancer to distribute the workload. In that case, though you will need an OS level clustering solution, as I am unaware of built in Apache solution for failover. If there is one, someone please let me know.

In terms of vertical scalability, Apache and the Java middleware (Tomcat, JBoss, Hibernate) will scale vertically on very large systems quit nicely. In my own experience, I have seen Tomcat and JBoss scale linearly on 16 CPU systems, and there doesn't really seem to be a limit here. On the database side of things, PostgreSQL has made recent improvements and scales nicely on 8 CPU systems. I haven't seen any data that goes beyond that, but I would bet that it will scale beyond 8 CPU's (if anyone has any experience with this, let me know). I my testing, and other third party testing for MySQL I have seen very good vertical scaling on 8 CPU systems as well, and I also believe that it would scale beyond that (again, if anyone has any experience with this, let me know). While I think that horizontal scaling is becoming the way to go, because the price/performance of vertical scaling doesn't match it, vertical scaling may be necessary for some applications because of design constraints within the applications, so both are probably necessary in the near term.

In terms of high availability, all of these components have clustering, replication and failover capabilities. You can use things like Red Hat's cluster suite for clustering at the OS level. JBoss, has a sophisticated clustering mechanism that even provides for transactional aware persistence clustering. MySQL has had clustering and replication for quite some time, and the latest release (version 5.0.x) has brought that to a higher level still. Sabre is using these capabilities to horizontally scale, and provide high availability for its Travelocity travel website. Certainly, that is a ringing endorsement for and enterprise setting. That brings up to some of the common problems that can be address with an open source based enterprise architecture.

Some of the common problems enterprises face are identity (many sources of identity), reliability of infrastructure (middleware) and manageability. These are just scratching the surface, but I will stick to these three problem areas.

Where identity is concerned there are now a couple of open source projects that can do Web single sign-on or SSO, and there is one in particular that can do full blown identity management with federated identity from within the enterprise, and across enterprises. That solution is from Ping Identity (http://www.pingidentity.com). They have an appliance like solution that is based on the open source project hosted at http://www.sourceid.org/. This is by far the most complete solution I have seen in the open source community, and also seems to be one of the easiest solutions to setup. They have over 1,000 deployments in some very large organizations, so it also seems to be tested in an enterprise setting.

One of the other problems that all enterprises face, is the reliability of its infrastructure. As we talked about last week, open source software is more reliable, and has fewer bugs than their closed source counterparts. In my experience, I have seen a SNORT intrusion detection system, running on Linux, stay operational in a production environment without a single incident for almost 600 days. This system is actually still running, so we will probably surpass the 2 year mark before too long. I gave much more data in last weeks post, so if you are interested in more detail see last weeks post.

Finally, we will discuss manageability. Red Hat, JBoss, MySQL and other companies actually have service offerings that help with patch management, monitoring, alerting, deployment, configuration, etc. These offerings allow you to do very large scale deployments with few administrators, up to and including 100's to 1000's of server instances. These tools are rapidly maturing, and help you not only to do the initial deployment, but manage it from a central location at very large scale. If you add something like the GroundWork's (http://www.itgroundwork.com) monitoring suite, based on open source projects like Cacti and Nagios, you can also monitor you entire environment very efficiently.

Put it all together, and you have a rather complete open source based enterprise architecture, that will be highly reliable, performant, scalable, and manageable. What more could you ask for?

Saturday, November 19, 2005

Media Coverage of Open Source Software

I am constantly reading technology magazines online. What I find most interesting about the coverage of open source software is how it all seems to be the same. If the magazine is about open source, it is always positive coverage, with nothing negative ever said, or rarely said. If the magazine is a more general publication then the coverage is always straining to be "fair", in that they always have to try to point out something negative in mostly positive coverage. Many times these statements are hackneyed phrases that are used over, and over and over. Some of the statements are statements that may have once been true, but aren't true anymore. Even analyst firms seem to be stuck in a time warp when open source is concerned. I think that the progress of open source projects is just too fast for the general media to keep up with.

I once participated in a research study about Linux, and the firm conducting the research had a scalability chart for operating systems that included the commercial UNIX's, Windows and Linux. They had Linux scaling to only four CPU's, and I pointed out in a conference call that Linux had surpassed four CPU scalability some time ago, and they were amazed. Then I pointed them at various benchmarks which were showing from 8 CPU's (an SAP benchmark) all the way to 64 CPU's, which was an SGI LINPACK benchmark. Shortly after our conference call there was also a 32 CPU TPC-C that had been published.

One of the questions the media seems intent on asking recently is the following:

Is open source software better?

I will tell you unequivocally, that it is better in all the aspects that really matter. What are those things that really matter? This can be quite subjective, but I believe that there are a number of aspects that everyone cares about, even if they haven't though about it in the way that I do.

First and foremost, open source software is of higher quality. What I mean by that is it has fewer bugs than equivalent commercial closed source products. This has been backed up many times now. The University of Wisconsin did a study entitled "Fuzz Revisited" that tested various UNIX operating system and Linux along with the GNU tools. What the did in the study is conduct testing called fuzz testing. This is when you throw random input at software and see how it reacts. To pass a test case, the software couldn't hang or crash. Each hang or crash was considered a test case failed. The results were pretty amazing. The best commercial closed source offering failed 20% of the tests, and the worst failed 45% of the tests. Linux and the GNU tools failed 9% and 6% respectively. That is from 55% to 87% fewer test failures! This study was done in 1995. Since then there have been other studies done, using static code analysis. Examples of these are the Reasoning studies, which compared the Linux TCP stack, MySQL database, etc. They also show very low defect rates compared to close source software. Here are some links to some of this information:

http://www.cs.wisc.edu/~bart/fuzz/fuzz.html
http://www.reasoning.com/pdf/Open_Source_White_Paper_v1.1.pdf
http://www.reasoning.com/pdf/MySQL_White_Paper.pdf
http://www.reasoning.com/pdf/Linux_Defect_Report.pdf
http://www-106.ibm.com/developerworks/linux/library/l-rel/


So this is one area where open source software is clearly better than their closed source competitors. Another aspect to look at is performance and scalability.

I have done many performance and scalability tests over the years, and open source software has always been very competitive. While I cannot give specifics for many of these tests, I can point at others that have talked publicly about their results. The web site Weather.com switched from WebSphere to Tomcat some time ago (http://www.computerworld.com/printthis/2004/0,4814,92583,00.html), and they stated that they achieved a substantial performance improvement on the same hardware. La Quinta switched from BEA Weblogic to JBoss (http://www.jboss.com/pdf/La_Quinta_Case_Study_FINAL.pdf), and they have a testimonial on the JBoss.com web site that states they saw 30% better performance with lower CPU utilization running their application. So in these two examples we see not only competitive performance and scalability but superior to both commercial application servers.

The final aspects are service and support. I can tell, from personal experience, that companies like Red Hat, JBoss, etc., give much better support than any commercial ISV that I have ever been involved with. What's the reason for this? It is quite simple. The business model of these companies aligns completely with the customer's needs. They don't get subscription renewals if they don't give good support, where with closed source ISV's, you have to renew if you want to get any support at all (usually contractually mandated), whether it is good or not. They also got your money up front in the license fee, so they have less of an incentive to give good support. They also staff their support organizations with people that are much lower paid then developers of their products, which means you get much less knowledgeable help in general. They also only support you if you have a certified deployment, according do what they want to support. Open source companies will support you based on what your needs are, not theirs. Finally, if they don't give you good support, you usually have the option of going elsewhere for support, turning to the community at large for support, or supporting the software yourself, if you have the proper in-house skills. These factors all add up to superior service and support!

To wrap things up, we have seen that open source generally provides higher quality software (less bugs), is competitive, and many times, superior in performance and scalability, and provides better service and support. All of this, without the up front costs of license fees! Who thinks open source is not better?

Monday, November 14, 2005

Business Person's Understanding of Open Source Software

I have been involved, from mostly a user's perspective, with open source software, in an enterprise setting, for the last five years. Over this time, I have dealt with lot's of misunderstandings, and sometimes plain ignorance of open source software. What is it? Is it any good? Should you ever use it? Is is safe? Who writes it? How do you get support? You name it, and I have had the questions.

Having said that, I thought I was past most of the questioning, but something popped up recently that caught me by surprise. It was again, a complete lack of understanding of open source software vs. traditional closed source software.

So once again, I find myself answering questions about open source software, and educating people on its merits, the business models that open source companies use. What companies like JBoss, Inc. call "Professional Open Source". Also, how pervasive, and even dependent we have become on open source software.

I find it awesome that 70% of all publicly accessible web servers are running the Apache web server. Of that 70%, 77% of those are running the Linux operating system. How Linux, which is certainly the most well know open source project is growing faster than any operating system, and continues to mature in all respects so quickly.

I am also a big proponent of Java, at least in an enterprise context, and I love the fact that the BZ Research survey shows JBoss as the number one application server in their survey. I saw another survey that was based on 95 Fortune 1000 companies, and it showed 23% of those companies were deploying JBoss as their application server.

I am also very encouraged by the SugarCRM project, and company, as they have built a customer base of some 300 companies already. I never would have thought that open source application software could be successful. Just goes to show you, that open source is here to stay, and it transcends any limits we may have thought it had.

Even with all of these examples, and open source companies, like Red Hat, JBoss, MySQL, SugarCRM, and others finding success, that business person's still need to be educated on open source software.

Have you experienced this, and if so, how did you deal with it? I have lot's of research that I have collected, that give fact based explainations for all of the questions I have received, but it would be nice to see what others have collected and used in support of open source software.