You'll all be familiar with normal aggregation.
For example I can get the number of subcategories per category in AdventureWorks:
SELECT c.Name,
COUNT(*)
as SubCategoryCount
FROM Production.ProductCategory c
JOIN Production.ProductSubCategory s
ON c.ProductCategoryID = s.ProductCategoryID
GROUP
BY c.Name
Which returns:
Name | SubCategoryCount |
Accessories | 12 |
Bikes | 3 |
Clothing | 8 |
Components | 14 |
SQL 2005 supports 13 aggregation functions: AVG, CHECKSUM, CHECKSUM_AGG, COUNT, COUNT_BIG, GROUPING, MAX, MIN, SUM, STDEV, STDEVP, VAR and VARP but nothing to aggregate strings.
You can create custom CLR aggregation functions but on some occasions, a simple SQL recursive query may be able to do what you need.
Let's have a look first at a case that doesn't require recursive queries.
Let's create a comma delimited string of all the available categories.
The data comes from the Product.ProductCategory table:
select
*
from Production.ProductCategory
And returns
ProductCategoryID | Name | rowguid | ModifiedDate |
1 | Bikes | CFBDA25C-DF71-47A7-B81B-64EE161AA37C | 1998-06-01 00:00:00.000 |
2 | Components | C657828D-D808-4ABA-91A3-AF2CE02300E9 | 1998-06-01 00:00:00.000 |
3 | Clothing | 10A7C342-CA82-48D4-8A38-46A2EB089B74 | 1998-06-01 00:00:00.000 |
4 | Accessories | 2BE3BE36-D9A2-4EEE-B593-ED895D97C2A6 | 1998-06-01 00:00:00.000 |
To get the comma delimited list of Categories, we can use the COALESCE function and a string variable:
DECLARE @categories varchar(200)
SET @categories =
NULL
SELECT @categories =
COALESCE(@categories +
',','')
+
Name
FROM Production.ProductCategory
SELECT @categories
And returns, as expected a string "Accessories,Bikes,Clothing,Components"
But now, how would we return the recordset of categories with a second column containing the list of subcategories.
I can get the list of subcategories for a single category but not for each category using the COALESCE method.
This is where recursive queries help. First I need to number each subcategory within the category. I can do that using RANK and ROW_NUMBER functions.
SELECT
c.Name as Category,
s.Name as SubCategory,
ROW_NUMBER()
OVER(ORDER
BY c.Name,s.Name)
+ 1
-
RANK()
OVER(ORDER
BY c.Name)
as SubCategoryNumber
FROM Production.ProductCategory c
JOIN Production.ProductSubCategory s
ON c.ProductCategoryID = s.ProductCategoryID
And now the recursive query where each loop of the query adds to the comma delimited string with the initial value equal to the first subcategory
WITH
SubCategories(Category, Subcategory, SubCategoryNumber)
AS
(
SELECT
c.Name as Category,
s.Name as SubCategory,
ROW_NUMBER()
OVER(ORDER
BY c.Name,s.Name)
+ 1
-
RANK()
OVER(ORDER
BY c.Name)
as SubCategoryNumber
FROM Production.ProductCategory c
JOIN Production.ProductSubCategory s
ON c.ProductCategoryID = s.ProductCategoryID ),
TempCategories(Category, Subcategories, SubCategoryNumber)
AS
(
SELECT Category,
CAST(Subcategory as
varchar(max)), SubCategoryNumber
FROM SubCategories
WHERE SubCategoryNumber = 1
UNION ALL
SELECT TempCategories.Category,
CAST(TempCategories.SubCategories +
','
+ SubCategories.Subcategory as
varchar(max)), TempCategories.SubCategoryNumber + 1
FROM TempCategories
JOIN SubCategories
ON TempCategories.Category = SubCategories.Category and TempCategories.SubCategoryNumber + 1 = SubCategories.SubCategoryNumber ),
MaxCategories(Category, SubCategoryNumber)
AS
(
SELECT Category,
MAX(SubCategoryNumber)
AS SubCategoryNumber
FROM SubCategories
GROUP
BY Category ),
Categories (Category, Subcategories)
AS
(
SELECT TempCategories.Category, TempCategories.Subcategories
FROM TempCategories
JOIN MaxCategories
ON TempCategories.Category = MaxCategories.Category AND TempCategories.SubCategoryNumber = MaxCategories.SubCategoryNumber )
SELECT
*
FROM Categories
Which returns:
Category | Subcategories |
Components | Bottom Brackets,Brakes,Chains,Cranksets,Derailleurs,Forks,Handlebars,Headsets,Mountain Frames,Pedals,Road Frames,Saddles,Touring Frames,Wheels |
Clothing | Bib-Shorts,Caps,Gloves,Jerseys,Shorts,Socks,Tights,Vests |
Bikes | Mountain Bikes,Road Bikes,Touring Bikes |
Accessories | Bike Racks,Bike Stands,Bottles and Cages,Cleaners,Fenders,Helmets,Hydration Packs,Lights,Locks,Panniers,Pumps,Tires and Tubes |
If we look at each query:
SubCategories, we've seen already, returns the full list of categories and subcategories with a subcategory number, string at 1, for each category.
TempCategories is the recursive query and adds at each level the previous subcategory list with the current subcategory, so the first subcategory contains itself, the second catains the first and itself, the third contains the first, the second and itself and so on. We then need to only keep the last record of each category that contains every subcategory.
MaxCategories contains the highest SubcategoryNumber for each category.
Categories joins TempCategories and MaxCategory to only retain the highest subcategory number per category. The result we need.
The aggregation operation defined in TempCategories (here adding the string together) could be replaced with any custom aggregation. It iteratively, in a known sequence, combines the previous calculated value with the new value for that row.
0 comments:
Post a Comment