Greedy Test Case Algorithm in a SQL Stored Proc
Here's a straightforward problem: I have a table with a lot of fields in it (in this case, several tables -- new Fact and Dimension tables in a star schema data warehouse, but, you know, any wide table will do).
I want to extract a few real world test records that exercise the entire table... a "covering set" of test cases... so if I have 100 columns, and record A has non-zero, non-null values in columns 1-50 and record B has good values in columns 51-100, then I only need to test those two records. How great is that?!
Ok, I should probably BUILD test cases, but I like using real data since there are always unseen business rules lurking about. Anyway, this is a pretty basic math problem: Select the minimal number of objects from the set of rows where the union of the viable (non-null, non-zero) columns across the subset covers all possible columns.
There's some code below. Note that it is very bad code. I use the wrong scope on global temporary tables, I don't do lots of checking of things, I generate SQL and execute it, I debug with print statements. It is also formatted poorly, but that's actually more of a wordpress/plugin issue than anything else.
But it's mine, and I love it...