Mapping fields of weakly related tables in SQL

Question

I'm searching for an SQL-Query that can map a set of items of an individual size to a set off buckets of individual size.

I would like to satisfy the following conditions:

The size of a bucket has to be bigger or equal the size of an item.
Every bucket can contain only one item or it is left empty.
Every item can only be placed in one bucket.
No item can be split to multiple buckets.
I want to fill the buckets in a way, that the smallest unused buckets are filled first.
Then initial item and bucket sets can be ordered by size or id, but are not incremental
Sizes and ids of initial bucket and item sets can be arbitrary and do not start at a known minimum value
The result has to be always correct, when there is a valid mapping
The result is allowed to be incorrect if the is no valid mapping (for example if there are more items than buckets), but I would appreciate, when the result is an empty set or has another property/signal that indicates an incorrect result.

To give you an example, let's say my bucket and items tables look like that:

Bucket:                     Item:
+---------------------+     +---------------------+
| BucketID | Size     |     | ItemID   | Size     |
+---------------------+     +---------------------+
| 1        | 2        |     | 1        | 2        |
| 2        | 2        |     | 2        | 2        |
| 3        | 2        |     | 3        | 5        |
| 4        | 4        |     | 4        | 11       |
| 5        | 4        |     | 5        | 12       |
| 6        | 7        |     +---------------------+
| 7        | 9        |
| 8        | 11       |
| 9        | 11       |
| 10       | 12       |
+---------------------+

Then, I'd like to have a mapping that is returning the following result table:

Result:
+---------------------+
| BucketID | ItemID   |
+---------------------+
| 1        | 1        |
| 2        | 2        |
| 3        | NULL     |
| 4        | NULL     |
| 5        | NULL     |
| 6        | 3        |
| 7        | NULL     |
| 8        | 4        |
| 9        | NULL     |
| 10       | 5        |
+---------------------+

Since there is no foreign key relation or something I could fix the columns to their corresponding bucket (but only the relation Bucket.Size >= Item.Size) I'm have a lot of trouble describing the result with a valid SQL query. Whenever I use joins or sub selects, I get items in buckets, that are to big (like having an item of size 2 in a bucket of size 12, while a bucket of size 2 is still available) or I get the same item in multiple buckets.

I spent some time now to find the solution myself and I am close to say, that it is better not to declare the problem in SQL but in an application, that is just fetching the tables.

Do you think this task is possible in SQL? And if so, I would really appreciate if you can help me out with a working query.

Edit : The query should be compatible to at least Oracle, Postgres and SQLite databases

Edit II: An SQL Fiddle with the given test set above an example query, that returns a wrong result, but is close, to what the result could look like http://sqlfiddle.com/#!15/a6c30/1

SoulTrain · Answer 1 · 2014-03-26T15:08:21.387

4

Try this... I was able to implement this using recursive CTE, all in 1 single SQL statement

The only assumption I had was that the Bucket and Item data set are sorted.

DECLARE @BUCKET TABLE
    (
     BUCKETID INT
     , SIZE INT
    )

    DECLARE @ITEM TABLE
    (
     ITEMID INT
     , SIZE INT
    )
    ;  
    INSERT INTO @BUCKET
    SELECT 1,2 UNION ALL
    SELECT 2,2 UNION ALL
    SELECT 3,2 UNION ALL
    SELECT 4,4 UNION ALL
    SELECT 5,4 UNION ALL
    SELECT 6,7 UNION ALL
    SELECT 7,9 UNION ALL
    SELECT 8, 11 UNION ALL
    SELECT 9, 11 UNION ALL
    SELECT 10,12 

    INSERT INTO @ITEM
    SELECT 1,2 UNION ALL
    SELECT 2,2 UNION ALL
    SELECT 3,5 UNION ALL
    SELECT 4,11 UNION ALL
    SELECT 5,12;

    WITH TOTAL_BUCKETS
    AS (
        SELECT MAX(BUCKETID) CNT
        FROM @BUCKET
        ) -- TO GET THE TOTAL BUCKETS COUNT TO HALT THE RECURSION
        , CTE
    AS (
        --INVOCATION PART
        SELECT BUCKETID
            , (
                SELECT MIN(ITEMID)
                FROM @ITEM I2
                WHERE I2.SIZE <= (
                        SELECT SIZE
                        FROM @BUCKET
                        WHERE BUCKETID = (1)
                        )
                ) ITEMID --PICKS THE FIRST ITEM ID MATCH FOR THE BUCKET SIZE
            , BUCKETID + 1 NEXT_BUCKETID --INCREMENT FOR NEXT BUCKET ID 
            , (
                SELECT ISNULL(MIN(ITEMID), 0)
                FROM @ITEM I2
                WHERE I2.SIZE <= (
                        SELECT SIZE
                        FROM @BUCKET
                        WHERE BUCKETID = (1)
                        )
                ) --PICK FIRST ITEM ID MATCH
            + (
                CASE 
                    WHEN (
                            SELECT ISNULL(MIN(ITEMID), 0)
                            FROM @ITEM I3
                            WHERE I3.SIZE <= (
                                    SELECT SIZE
                                    FROM @BUCKET
                                    WHERE BUCKETID = (1)
                                    )
                            ) IS NOT NULL
                        THEN 1
                    ELSE 0
                    END
                ) NEXT_ITEMID --IF THE ITEM IS PLACED IN THE BUCKET THEN INCREMENTS THE FIRST ITEM ID
            , (
                SELECT SIZE
                FROM @BUCKET
                WHERE BUCKETID = (1 + 1)
                ) NEXT_BUCKET_SIZE --STATES THE NEXT BUCKET SIZE
        FROM @BUCKET B
        WHERE BUCKETID = 1

        UNION ALL

        --RECURSIVE PART
        SELECT NEXT_BUCKETID BUCKETID
            , (
                SELECT ITEMID
                FROM @ITEM I2
                WHERE I2.SIZE <= NEXT_BUCKET_SIZE
                    AND I2.ITEMID = NEXT_ITEMID
                ) ITEMID -- PICKS THE ITEM ID IF IT IS PLACED IN THE BUCKET
            , NEXT_BUCKETID + 1 NEXT_BUCKETID --INCREMENT FOR NEXT BUCKET ID 
            , NEXT_ITEMID + (
                CASE 
                    WHEN (
                            SELECT I3.ITEMID
                            FROM @ITEM I3
                            WHERE I3.SIZE <= NEXT_BUCKET_SIZE
                                AND I3.ITEMID = NEXT_ITEMID
                            ) IS NOT NULL
                        THEN 1
                    ELSE 0
                    END
                ) NEXT_ITEMID --IF THE ITEM IS PLACED IN THE BUCKET THEN INCREMENTS THE CURRENT ITEM ID
            , (
                SELECT SIZE
                FROM @BUCKET
                WHERE BUCKETID = (NEXT_BUCKETID + 1)
                ) NEXT_BUCKET_SIZE --STATES THE NEXT BUCKET SIZE
        FROM CTE
        WHERE NEXT_BUCKETID <= (
                SELECT CNT
                FROM TOTAL_BUCKETS
                ) --HALTS THE RECURSION
        )
    SELECT 
        BUCKETID
        , ITEMID
    FROM CTE

edited Mar 26 '14 at 15:08

answered Mar 24 '14 at 19:12

SoulTrain

1,846
1
10
10

Awesome! Works for my test set. I replaced ISNULL to COALESCE and added "WITH RECURSIVE" to run the query in PostgreSQL. Thank you very much! – Mar 25 '14 at 10:53
Cool. How big was your test set? Did it take long to run? – SoulTrain Mar 25 '14 at 14:41
^_^ Na, I meant the initial test set mentioned in the question on top. – Mar 25 '14 at 14:49
Ohh ok.. I did try a few different sets and was a little surprised that it returned the expected output :) – SoulTrain Mar 25 '14 at 14:52
I'll post my results here to inform you and to make sure that your answer will be awarded the bounty, before the deadline ends : ) Of course there will be cases, where the query will return incorrect results. Especially when there are more items then buckets or items, that are to big for the given bucket set. But thats ok. – Mar 25 '14 at 15:09
Sure. Yup, there will be edge cases that need to accounted for as an when they are found. Thanks. – SoulTrain Mar 25 '14 at 18:37
Hey SoulTrain, I found a set, where your query produces an invalid allocation. See http://sqlfiddle.com/#!15/f6d6a/1 (item 1 of size 4 is mapped to bucket 1 of size 2). – Mar 26 '14 at 09:00
Fixed it. In the Invocation part, the ISNULL/COALEASE function was setting the ItemID to 1 in case of a NULL, which was causing a ripple effect in the rest of the buckets. See resolution [SQLFIDDLE](http://sqlfiddle.com/#!15/f6d6a/3/0) – SoulTrain Mar 26 '14 at 14:37
Hello! The more I am thinking about this query, the more impressed I am! :D But, I found another problem. Let's assume, that the bucket set is reduced (for example, when some where just used before). In this case, the result is empty... http://sqlfiddle.com/#!15/d87d2/1/0 – Mar 26 '14 at 14:56
Thanks. Actually my initial assumption was that both the sets should be sorted and in incremental order, but i missed stating the later. To fix that, I could generate a new order before initiating the sort. Try this [SQLFIDDLE](http://sqlfiddle.com/#!15/2b4df/1) – SoulTrain Mar 26 '14 at 15:44
Sadly the mapping to an incremental list is necessary then, because my bucket set is persistent over several insertions of different item sets. So, k, thanks again for spending your time. By retrying the test set from my last comment, Item 1 of size 4 was sortet to bucket 4 of size 2 http://sqlfiddle.com/#!15/bcdea/3/0 After writing this comment, I will just give that 50 points reputation to you even though the answer is not complete yet. Your effort is worth it. If you are still in the mood, I would of course appreciate further help on this problem : ) – Mar 27 '14 at 09:38
Always liked challenges, who doesn't :). Refering your latest SQL fiddle, I think the sort algo. is working correctly. The item 1 of size 4 was placed in bucket 5 with size of 6, which is the expected logic. Thanks for the Reps. – SoulTrain Mar 27 '14 at 14:40
Hm, do I have to map the result buckets to an incremental list or vice versa? Do we talk about the same sql fiddle? ^_^ http://sqlfiddle.com/#!15/bcdea/3/0 – Mar 28 '14 at 08:21
@deviolog I revisited your comment after a break and found the disconnect. [Fiddle](http://sqlfiddle.com/#!15/bcdea/10). So what was happenning was that the sorting Algo was working fine but when I was displaying the results I was showing the newly sorted Bucketid instead of the original bucketid. – SoulTrain Apr 03 '14 at 14:58
@deviolog Contd.. So when you were refering to Bucket id 4 in the result - I would assume you are referring to the Bucket id 5 from the original table coz once I resorted my table to establish the new order - Bucket id 5 in the original table got a new id of 4. I just tweaked the code to display the old bucket id in the result. hope I am making sense... Break gives one such different perspectives to look from !! Happy coding... – SoulTrain Apr 03 '14 at 14:58

score 3 · Answer 2 · answered Mar 20 '14 at 22:14

3

I'd say that a single SQL query is maybe not the tool for the job as the buckets are "consumed" by allocating items to them. You can use SQL, but not a single query. suggestion in pseudocode below:

Have a cursor on ITEM: 
    within the FETCH loop for that {
    SELECT in BUCKET the bucket with minimum bucket id and bucket size >= item size 
    INSERT bucket id, item id to MAPPING
}

If you need the NULL (unoccupied) buckets, you can locate them via a further 
INSERT into MAPPING (....)
SELECT <bucket id>, NULL
from    BUCKET 
where <bucket id> not in (SELECT <bucket id> from MAPPING);

answered Mar 20 '14 at 22:14

Angst

294
1
7
12

Well thanks for your answer! I didn't try this yet and to be honest, it is a bit beyond my knowledge about SQL. But I will try to understand. Do you think I could use such an "imperative" approach as a sub query? Having the statement as a sub query of higher level queries is one of the reasons I'm searching for a solution as a single SQL statement. – Mar 21 '14 at 15:44
Hm. Well I still don't see how to do it in one shot, but you could come at it in several passes with SQL statements: for example, statement 1, create a temporary table matching all the items with the potential buckets they could go in. statement 2, try and see if you can get the minimum matching row per item from the temporary table. Something like that anyway. – Angst Mar 21 '14 at 22:10

score 2 · Accepted Answer · answered Apr 02 '14 at 21:02

Using the table definition from @SoulTrain (but requiring the data to be sorted in advance):

; WITH ORDERED_PAIRINGS AS (
    SELECT i.ITEMID, b.BUCKETID, ROW_NUMBER() OVER (ORDER BY i.SIZE, i.ITEMID, b.SIZE, b.BUCKETID) AS ORDERING, DENSE_RANK() OVER (ORDER BY b.SIZE, b.BUCKETID) AS BUCKET_ORDER, DENSE_RANK() OVER (PARTITION BY b.BUCKETID ORDER BY i.SIZE, i.ITEMID) AS ITEM_ORDER
    FROM @ITEM i
    JOIN @BUCKET b
      ON i.SIZE <= b.SIZE
), ITEM_PLACED AS (
    SELECT ITEMID, BUCKETID, ORDERING, BUCKET_ORDER, ITEM_ORDER, CAST(1 as int) AS SELECTION
    FROM ORDERED_PAIRINGS
    WHERE ORDERING = 1
    UNION ALL
    SELECT *
    FROM (
        SELECT op.ITEMID, op.BUCKETID, op.ORDERING, op.BUCKET_ORDER, op.ITEM_ORDER, CAST(ROW_NUMBER() OVER(ORDER BY op.BUCKET_ORDER) as int) as SELECTION
        FROM ORDERED_PAIRINGS op
        JOIN ITEM_PLACED ip
          ON op.ITEM_ORDER = ip.ITEM_ORDER + 1
         AND op.BUCKET_ORDER > ip.BUCKET_ORDER
    ) AS sq
    WHERE SELECTION = 1
)
SELECT *
FROM ITEM_PLACED

Mapping fields of weakly related tables in SQL

3 Answers3

Linked