Can industry standard benchmarks or even application benchmarks, like SAP's be relied upon to make technology choices? I have come to the conclusion that benchmarks are not reliable measuring sticks for use by decision makers regardless if they are a mythical application, like a SPECjAppServer2004 benchmark, or an ISV specific application benchmark like SAP's (http://www.sap.com/solutions/benchmark/index.epx). Why do I believe this is true?
With the industry standard benchmarks, whether they are SPEC, or the Transaction Performance Council, the applications are far too simple to simulate a real-world workload. Real-world applications have far more complex business logic, and are usually highly data driven. When I say data driven, I mean that the application logic branches are almost always determined by querying a database for what to do under certain business cases. They are really automated business processes. I have seen cases where the customer setup in an application had over 60,000 locations, and in another case where there were over 100,000 specific products listed in a customer contract. These are but two simple examples, and what they lead to is a read/write ratio in the applications that are heavily tilted to the read side. In two major applications that I have been involved with the read/write ratios were 98% read, 2% write, and 93% read, 7% write. Industry standard benchmarks do not have such ratios because they don't simulate these types of complex, data driven, large dataset applications.
With ISV specific benchmarks, even though they are running a real business application, they don't represent a customized deployment of their technology. They are specifically crafted to create the highest possible numbers because they are actually marketing tools, not something that can be relied upon for your own implementation. If you look at some of the SAP benchmark results they have numbers like 29 million and more dialog steps per hour, and stuff like that! This is a dead give away to anyone who has half a brain. Does anyone's SAP implementation in the world do 29 million of anything in one hour? I think not! My entire career has been spent in high volume transaction oriented businesses (until just recently), and believe me, these types of numbers are completely off the chart, and meaningless.
There is one other aspect to both types of benchmarks. The benchmark configurations used are configurations that no customer, in their right mind would deploy in a production environment. You will see things like raw disc being used, with RAID level 0 (just stripping). Undocumented features of databases being turned on, that are specifically for benchmarks, but make the database unsupported by the vendor in a production environment. Data being stripped over hundreds or even thousands of disk drives. All logging of any kind, whether it be databases, application servers, OS, etc., being turned off to lower the overhead as much as possible. These are but some of the tricks that are used in so-called audited benchmark results. Where does that lead us where these benchmarks are concerned.
It leads us to one place and one place only. That "numbers don't lie, but liars use numbers"! These benchmarks are marketing tools, and no more. They don't represent anything remotely close to a production deployment, and the numbers will always be higher than what can be achieved in a real-world deployment that can be managed. Don't rely on these marketing ploys to make decisions, instead run your own workload in a proper production like configuration, and make your decisions based on facts, and not fiction.
With the industry standard benchmarks, whether they are SPEC, or the Transaction Performance Council, the applications are far too simple to simulate a real-world workload. Real-world applications have far more complex business logic, and are usually highly data driven. When I say data driven, I mean that the application logic branches are almost always determined by querying a database for what to do under certain business cases. They are really automated business processes. I have seen cases where the customer setup in an application had over 60,000 locations, and in another case where there were over 100,000 specific products listed in a customer contract. These are but two simple examples, and what they lead to is a read/write ratio in the applications that are heavily tilted to the read side. In two major applications that I have been involved with the read/write ratios were 98% read, 2% write, and 93% read, 7% write. Industry standard benchmarks do not have such ratios because they don't simulate these types of complex, data driven, large dataset applications.
With ISV specific benchmarks, even though they are running a real business application, they don't represent a customized deployment of their technology. They are specifically crafted to create the highest possible numbers because they are actually marketing tools, not something that can be relied upon for your own implementation. If you look at some of the SAP benchmark results they have numbers like 29 million and more dialog steps per hour, and stuff like that! This is a dead give away to anyone who has half a brain. Does anyone's SAP implementation in the world do 29 million of anything in one hour? I think not! My entire career has been spent in high volume transaction oriented businesses (until just recently), and believe me, these types of numbers are completely off the chart, and meaningless.
There is one other aspect to both types of benchmarks. The benchmark configurations used are configurations that no customer, in their right mind would deploy in a production environment. You will see things like raw disc being used, with RAID level 0 (just stripping). Undocumented features of databases being turned on, that are specifically for benchmarks, but make the database unsupported by the vendor in a production environment. Data being stripped over hundreds or even thousands of disk drives. All logging of any kind, whether it be databases, application servers, OS, etc., being turned off to lower the overhead as much as possible. These are but some of the tricks that are used in so-called audited benchmark results. Where does that lead us where these benchmarks are concerned.
It leads us to one place and one place only. That "numbers don't lie, but liars use numbers"! These benchmarks are marketing tools, and no more. They don't represent anything remotely close to a production deployment, and the numbers will always be higher than what can be achieved in a real-world deployment that can be managed. Don't rely on these marketing ploys to make decisions, instead run your own workload in a proper production like configuration, and make your decisions based on facts, and not fiction.