The next wave in BI


TPC-H fun with Greenplum (single node edition)

27/01/2010 23:12

Introduction

There's a nice quote on the Greenplum site from Brian Dolan, Fox Interactive Media which says “Very impressed with the speed... 3 minutes to do a sum on 100 million rows of data”. That pretty much sums it up. I wouldn't consider that anything to be proud of; 3 minutes to do a simple sum over a measly 100 million rows of data is SLOW nowadays. A quick look at the TPC site will tell you that more complex queries on at least 600 million records run in seconds, not in minutes. Heck, with Excel 2010 PowerPivot summing 100 million rows takes less than a second! And you don't need expensive hardware or software to do any of this. My own benchmark machine is a dual Xeon 5520 with 64 GB ram and 12 Intel X-25M SSD drives of 80 GB each, connected to 2 Adaptec 5805 Raid controllers. The system is running Centos 5.4 and total cost is way below $10K. Anyway, it's always nice to see how a database behaves in practice, so I downloaded the free, single node Greenplum edition, installed it and ran the TPC-H sf100 benchmark. By the way, typing that last sentence took a lot less time than the actual work...

 

Installation

Before you install the Greenplum (GP) database it's good to first plan your disk layout. GP likes to distribute its data over so called data segments, and each data segment is tied to a GP process which is tied to a CPU core. So in my case I had 8 cores (in fact with hyperthreading it is 16) which allowed me to use 8 disks or partitions. No raid, no failover (although GP has facilities for that), I was just interested in testing load and query speed. The installation process itself is pretty straightforward and well explained in the single node install manual. What's not explained at all, not even in the more voluminous admin guide, is how the system should be tuned. Remember that GP is based on PostgreSQL so all the same settings are available. However, the settings advised for PostgreSQL are for OLTP systems, not for data warehousing. So in my first run the system used only about 3GB of the available 64GB ram. It turned out that the most important configuration parameter (shared_buffers, which tells how much memory a process can use) was set at only 32 MB (!). I toyed around with several parameters, including the system kernel settings, but couldn't use a value over 1920MB (GP won't start when it's set to e.g. 2048), so I suspect that value is capped somewhere.

 

Database creation

Again, if you're familiar with PostgreSQL, GP's DDL and DML shouldn't have a lot of secrets. They did add a few extra's though, such as the ability to define a table as column oriented instead of the standard row orientation, and the ability to compress data. But, unlike modern column stores as Vertica or Paraccel, the DBA is responsible for thinking out the best storage and compression strategy as everything has to be explicitly created using DDL statements. What's more, the column orientation and compression are only available for append-only tables. That's right: if you want columns and/or compression, it's dump and reload if you want to update your data. In that respect it's a bit similar to Infobright Community Edition which lacks all DML capabilities. For the benchmarks I ran I used went with the default row orientation.

 

Loading & database size

The data loaded into the database is generated using the dbgen tool which can be compiled from the source available from www.tpc.org. Dbgen will create 8 ASCII files that comprise the TPC schema which can then be loaded into the created database. GP lets you use the standard COPY statement to load files into a table, but also offers a facility called 'external tables' to support fast, parallel data loading. I didn't use external tables but the plain COPY which took about 45 minutes to load the 100 GB dataset. The tables already had indexes defined so that's actually pretty good. What's not so good is the resulting database size. As I couldn't use compression, the resulting database size was about 160 GB, which is 10 times as much as you need when loading the same dataset into an Infobright database. It's also about two to three times as much as less aggressivily compressed databases need.

 

Running queries

I first started by starting some individual queries to test whether the system was working and whether is was fast. It was working, but fast? TPC-H query 1 took 7 minutes and 7 seconds, which is more than 60 times slower than the fastest result (5.5 secs) I got with another database on this machine . I decided to fire up my TPC-H script anyway which runs the official TPC-H bechmark consisting of a power test (a single stream) and a throughput test (5 parallel streams). After a couple of minutes I got this message:

 

ERROR: Greenplum Database does not yet support that query. DETAIL: The query contains a correlated subquery

 

Ouch, that hurts. Had the same problem with InfoBright some time ago but haven't tested recent editions. What's funny is that Query 4 runs fine (that one also contains a CSQ but is probably easier to rewrite) but Q2, Q17, Q20, Q21 and Q22 return an error which also invalidates the TPC results I got. I also got a couple of 'out of memory' errors while running the trhoughput test, even though the memory utilization never exceeded 45GB. So as disappointing as it may be, I'm not going to publish the results here. What I can tell is that if you're running PostgreSQL and want a similar but faster database to run your data warehouse, GP single node edition will be a major improvement. It's also much faster than MySQL for typical BI queries. On the other hand, it's also (a lot) slower than SQL Server 2008, Kickfire, or Sybase IQ, the current leaders in the single node SF100 benchmarks.

 

Conclusion

Greenplum might be an interesting product if you are using the MPP version and can invest time and money in optimizing and tuning the system, but I'm not very impressed with the single node edition. Some of the issues probably also apply to the MPP version. It lacks intelligent auto tuning capabilities, query optimization seems to work not too well, it does a bad job of utilizing available memory (which might be due to my lack of experience with tuning PostgreSQL db's too, btw). Note though that TPC-H is just 'a' benchmark: you should always do your own testing with your own data! Overall I would say it's 'not bad', but not very good either.

—————

Back


Topic: TPC-H fun with Greenplum

Date: 03/02/2010

By: Luke Lonergan

Subject: Re: Final update

See it here:
https://bit.ly/bkzgyK

- Luke

—————

Date: 05/02/2010

By: What point are you trying to make?

Subject: Re: Re: Final update

Hi Luke,

I appreciate your efforts, but my points are even more valid now: you rewrote the queries to circumvent the fact that GP doesn't support correlation (that was one point of failure) and you didn't run multiple streams (the 'throughput' test) simulating a concurrent workload (the second point of failure). So it seems that there was nothing wrong with my setup (I did a second run with all the default settings) in the first place. Anyway, just let me know when GP is capable of handling CSQ's and I'll be happy to give it another try.

best, Jos

—————

Date: 08/02/2010

By: Luke Lonergan

Subject: Re: Re: Re: Final update

Hi Jos,

I can run the throughput test (as can you).

My point here was to measure "out of the box" performance for Greenplum SNE on analytical queries of the type found in TPC-H compared with another system mentioned in your blog post, Postgres 8.4.2. The results show a 19-fold performance improvement from running Greenplum SNE compared with Postgres without tuning, partitioning or indexes.

If you were to run the other DBs, Oracle, MSFT, Sybase, etc on TPC-H, I think you would find a similar situation, without exhaustive tuning and tricks you will not get near the performance posted on formal TPC-H filings. I know what those tricks are, and some of them are downright crazy and in my opinion have no bearing on real world scenarios.

I also think that TPC-H has been gamed so much by the various upstart companies using pre-aggregations, projections and materialized views that a fast TPC-H result says very little about how well a given DB will perform on a realistic workload. The only real answer comes from running real data and queries using standard maintenance practices.

Lastly, have you seen Teradata or Netezza posting TPC-H results lately?

It's true - we don't yet support the kind of CSQs present in 5 of the TPC-H queries. Neither does a well known publicly held appliance company, yet we're both handling the world's largest workloads under tools including Business Objects, Cognos, Microstrategy, Microsoft, etc. We will support CSQs, they just aren't that high on our priority list.

I enjoy your posts, please engage us with more benchmarking and we'll support your efforts :-)

- Luke

—————

Date: 30/01/2010

By: Jos van Dongen

Subject: Good comments, but....

@all: thanks for the feedback, really appreciated. As for the comments regarding premature conclusions: I'd be happy to run this again but could use some help with parameter settings. The machine has enough horsepower so that shouldn't be the problem. You're also right about PostgreSQL; latest releases have many performance improvements so for a fair comparison PG8.4 (I'll use PG+ from EnterpriseDB then) and GP SNE should be evaluated side by side.

@Luke: The top TPC-H results with the 5.5 secs on Q1 I mentioned were with a plain vanilla installation of an analytical db but I agree, all the db's you mention suck out of the box. E.g. all the top TPC-H results were obtained by running everything in memory, and you do need to set that up. Note that MonetDB will do this automatically though but I haven't tested the feb. 2010 build yet (which compresses about 2:1 so SF100 will fit in memory)

other than that: CSQ's are required as Mark pointed out, and only offering column/compression options on append-only tables severely limits the use of these options.

To be continued!

—————

Date: 31/01/2010

By: Luke Lonergan

Subject: Re: Good comments, but....

Hi Jos,

I just posted results of a comparison between Postgres 8.4.2 and Greenplum SNE run on the same server and using the 22 queries from TPC-H here:
https://community.greenplum.com/showthread.php?p=404#post404

Greenplum SNE is between 13 and 14 times faster on the queries and 3-4 times faster on loading using COPY.

These results were gained with default settings of Greenplum and tuned settings for Postgres 8.4.2.

I've only had time to get a 1GB scale factor test, but I'll have a 100GB test up when the results are finished (long time for PG). Based on past results, I only expect the difference to increase, as GP SNE is optimized for maximal performance on-disk access.

- Luke

—————

Date: 29/01/2010

By: Luke Lonergan

Subject: Some hints

Thanks for testing this out - I think your conclusions are premature.

For a clear example with runnable code of GP SNE as the fastest DB on the planet, see this case:
https://community.greenplum.com/showthread.php?t=111&referrerid=9

As in that example, GP SNE is hundreds of times faster than Postgres and many times faster than the fastest column stores.

WRT TPC-H, it's a poor way to evaluate DB vendor A versus B. If you unpack MSFT, Oracle, Sybase or others as you've suggested and run TPC-H out of the box, they will all suck, although they'll run the CSQs. Try it - even use Postgres 8.4 and see what you get.

If you run real workloads with GP, you should expect to be pleasantly surprised at how much faster it is - that's been the experience across the world so far. When GP SNE is compared apples-apples, it wins.

Cheers,

Luke

—————

Date: 30/01/2010

By: pguser

Subject: Re: Some hints

Luke,

My conclusion was that with my setup + dataset (i.e. real workloads), PostgreSQL 8.4 beats GreenPlum SNE, especially in cases where GreenPlum SNE chooses some bad plans due to enable_seqscan = 0. I don't understand what can be premature with that conclusion, and what that has to do with TPC-H.

On the other hand, Jos made a general statement about how PostgreSQL sucks compared to Greenplum SNE. That's premature.

—————

Date: 30/01/2010

By: Luke Lonergan

Subject: Re: Re: Some hints

Hi,

A few things:

- No parameter changes should be needed in the postgresql.conf files for Greenplum, especially not shared_buffers. This is true of all cases including the original poster and the TPC-H test. enable_seqscan is not set to 0 by default, so this indicates that the parameters have been changed.

- The number of segments should generally be set to the number of cores on the machine. GP is all about parallelism and the number of segments used controls how much of the CPU will be used to answer queries. If you increase from 2 to 8, you should see something like a 400% speedup of your queries, depending on how fast your disk is. Note that because of the parallelism outrunning your disk, you should (must) use the XFS filesystem to avoid the fragmentation pollution that ext2/3/4 have.

- In the general case, we find that GP SNE is faster than Postgres 8.x due to faster core elements like sort, hashagg, etc, even when not using a lot of parallelism, though the key advantage of GP is parallelism.

- GP implements the full SQL 2003 analytical query support with windowing, cube, etc. This is very useful and important for analytical workloads.

So, in short, in your case I'd also say you should try again and this time don't change the parameters, use the number of segments equal to the number of CPU cores and GP will be faster on every query.

- Luke

—————

Date: 30/01/2010

By: Jos van Dongen

Subject: Re: Re: Re: Some hints

Hi Luke,

shared_buffers at 32MB sounds like a bad idea if you have 64GB available but if you say so, I'll give it a go. As you can read in the post, I used 8 segments with 8 X-25M SSD drives using XFS, so that couldn't have been the problem.

Jos

—————

Date: 03/02/2010

By: Luke Lonergan

Subject: Re: Re: Re: Re: Some hints

Hi Jos,

Shared buffers is not 32MB unless you set it that way. There is no reason to tune shared_buffers, as I've described here:
https://bit.ly/bwXUgq

Perhaps you are referring to work_mem?

I just posted results (untuned, no indexes) up to 1TB for a single server compared with Postgres 8.4.2 here:
https://bit.ly/bkzgyK

- Luke

—————