If you are a DBA or a database programmer, your manager may have asked you to prepare a breakdown of employees by location and department, or a breakdown of products by category and provider. If your boss is a techie, instead of a breakdown she might have mentioned a pivot table, or, if she likes to use “the right words,” a cross-tabulation, Giuseppe Maxia writes. Elsewhere on OnLamp.com another article advises on “Quick and Dirty RDBMS Tuning“.
My recommendation to any of you database app developers out there, and DBAs – learn this stuff – you WILL use it, I promise you. I guarantee you that at some point you WILL be asked to write cross-tab queries (MS Access has this feature built in, in the form of the TRANSFORM SQL keyword statement, but SQL Server, DB2 and Oracle people will have to do this the hard way, unfortunately). TRANSFORM is not standard SQL99 SQL code and is a custom extension by MSFT, so that is why the ‘real’ databases don’t have it, presumably.
In any big company the analysts are generally addicted to pivot-tables and crosstabs in Excel and Access for looking at data, so get used to them.
OnLamp seems to be “OSNews’d” right now, I’ll have to check the articles later and can’t directly comment on them now.
Just as I thought, it is pretty painful to create cross-tabs in SQL, but there is just no getting away from it in the so-called real world.
I would KILL to have the simple MS Access TRANSFORM keyword available to me in SQL Server (and the other big databases), but I guess MSFT just expects you to move on to “cubes” in Analysis Services and maybe start using MDX statements there (a special quasi-SQL dialect used for manipulating data cubes). At least cubes are very easily digested in Excel, as you might expect, so that is nice anyway.
when i need ‘crosstabs’ i have two options, use essbase (cube db) or use pivot tables in excel…
often if you have < 60000 rows to just use excel and create an interactive piviot table. managers love this.
but something built into SQL as STANDARD would be very very nice.
also as soon as you start to do things like summing, you need to use temporary tables otherwise you still get multiple rows per record.
Well, once you have you tables, what do you do to interface your database with the web? Do you know of any php/whatever program that makes your life easier to convert SQL->Web forms. I know of this one: phpMyedit. Do you recommend any other one?
The right way to approach this is to create a view, counting, summing and averaging. You then group by your dimensions. This produces every number and column and row header that would be in the cross tab, but does not format it at all like that. However, it does reduce the volume of the data to the point that making the crosstab happen in the spreadsheet or application layer becomes reasonable and fast. I know this is not truely server side crosstab, but it is the way that works in the real world.
i.e.
select count(*),avg(salary),country,location,dept,gender
from person p,depts d,locs l,countries c
where
p.dept_id=d.dept_id
and p.loc_id=l.loc_id
and l.country_id=c.country_id
group by country,location,dept,gender;
Now it is a simple matter of formatting.
First, let me say that the article was a nice refresher to a common task done in corp. daily.
But as the author describes the ‘pain’ of having to write the query .. was the query really that difficult to write. If you look – it’s really just repetition on sorting out the data.
e.g.
COUNT(CASE WHEN dept = ‘pers’ AND gender = ‘f’ THEN id ELSE NULL END)
AS ‘pers-f’,
COUNT(CASE WHEN dept = ‘pers’ AND gender = ‘m’ THEN id ELSE NULL END)
AS ‘pers-m’,
COUNT(CASE WHEN dept = ‘pers’ THEN id ELSE NULL END)
AS ‘pers’,
COUNT(CASE WHEN dept = ‘sales’ AND gender = ‘f’ THEN id ELSE NULL END)
AS ‘sales-f’,
COUNT(CASE WHEN dept = ‘sales’ AND gender = ‘m’ THEN id ELSE NULL END)
AS ‘sales-m’,
COUNT(CASE WHEN dept = ‘sales’ THEN id ELSE NULL END)
AS ‘sales’,
COUNT(CASE WHEN dept = ‘dev’ AND gender = ‘f’ THEN id ELSE NULL END)
AS ‘dev-f’,
COUNT(CASE WHEN dept = ‘dev’ AND gender = ‘m’ THEN id ELSE NULL END)
AS ‘dev-m’,
COUNT(CASE WHEN dept = ‘dev’ THEN id ELSE NULL END)
This really is not that difficult. And I find it funny that the author found it non-trivial (in his words) to write the follow query:
SELECT dept,
COUNT(CASE WHEN gender =’m’ THEN id ELSE NULL END) AS ‘m’,
COUNT(CASE WHEN gender =’f’ THEN id ELSE NULL END) AS ‘f’
FROM
person
GROUP BY
dept
Nonetheless, the article was interesting and informative.
Right on!
However, with the right way, the Perl monk can no longer show off the DBIx::SQLCrosstab module.
The problem is that crosstabs are never really static in terms of what the end user expects from the DBA. Sure, you may fairly easily write a specific piece of SQL that creates a certain specific crosstab output, but you will find that many users will ask for the crosstab to be tweaked in various ways to emphasize different dimensions at different times, and re-writing your SQL for each user request can become tedious very quickly.
Loading everything into Excel and giving the user full control over pivot-tables and pivot-charts is a luxury, if/when you can get to that point and tweaking a simple TRANSFORM statement in MS Access is much, much less SQL code to deal with than any of these other solutions.
TRANSFORM Statement
Creates a crosstab query.
Syntax
TRANSFORM aggfunction
selectstatement
PIVOT pivotfield [IN (value1[, value2[, …]])]
The TRANSFORM statement has these parts:
Part Description
aggfunction: An SQL aggregate function that operates on the selected data.
selectstatement: A SELECT statement.
pivotfield: The field or expression you want to use to create column headings in the query’s result set.
value1, value2: Fixed values used to create column headings.
Remarks
When you summarize data using a crosstab query, you select values from specified fields or expressions as column headings so you can view data in a more compact format than with a select query.
TRANSFORM is optional but when included is the first statement in an SQL string. It precedes a SELECT statement that specifies the fields used as row headings and a GROUP BY clause that specifies row grouping. Optionally, you can include other clauses, such as WHERE, that specify additional selection or sorting criteria. You can also use subqueries as predicates — specifically, those in the WHERE clause — in a crosstab query.
The values returned in pivotfield are used as column headings in the query’s result set. For example, pivoting the sales figures on the month of the sale in a crosstab query would create 12 columns. You can restrict pivotfield to create headings from fixed values (value1, value2 ) listed in the optional IN clause. You can also include fixed values for which no data exists to create additional columns.