Transposition of ranges in SQL and charts
I had this database table containing ranges (a start value and an end value). The challenge was creating a query that overlaps and transposes those ranges so I can say how many ranges are at any point in the total interval or values. As an example, "SELECT * FROM Ranges" would result in a table like:
and I am looking for something like this:
A naive implementation would get the minimum Start (or start with 0, as I did) and the maximum End, create an in memory or temporary table (Values) from min to max using an ugly WHILE block, then join it with the Ranges tables something like:
This kind of works, but for large ranges it becomes difficult. It takes a lot to create the Values table and the join and for extreme cases, like I had with values from 0 to 6 billion, it becomes impossible. The bottleneck here is this Values table, which is pretty much a horror to create and maintain. But what if you don't need all the values?
Before I tell you the solution I found, be warned that you have to properly define what a range is. Is a range 10-20 actually 10-19? In my case it was so, so that is why there are some subtractions with 1 or less than rather than less or equal conditions.
The solution is this:
The idea is that after you compute the ranges count per each of the start and end values you know that between one and the next the count of ranges will remain the same. The join is significantly faster, there is no ugly WHILE block and you don't need a 6 billion value table. It's easier to plot on a chart as well, with either of these variations:
See the variations:
The result of the query above will be:
Hope this helps you
Start | End |
---|---|
10 | 20 |
10 | 30 |
25 | 35 |
20 | 40 |
Value | Count |
---|---|
0 | 0 |
1 | 0 |
... | ... |
10 | 2 |
11 | 2 |
... | ... |
24 | 2 |
25 | 3 |
26 | 3 |
A naive implementation would get the minimum Start (or start with 0, as I did) and the maximum End, create an in memory or temporary table (Values) from min to max using an ugly WHILE block, then join it with the Ranges tables something like:
SELECT v.Val,Count(1) as Nr
FROM #Values v
INNER JOIN Ranges r
ON r.Start<=v AND r.[End]>=v
This kind of works, but for large ranges it becomes difficult. It takes a lot to create the Values table and the join and for extreme cases, like I had with values from 0 to 6 billion, it becomes impossible. The bottleneck here is this Values table, which is pretty much a horror to create and maintain. But what if you don't need all the values?
Before I tell you the solution I found, be warned that you have to properly define what a range is. Is a range 10-20 actually 10-19? In my case it was so, so that is why there are some subtractions with 1 or less than rather than less or equal conditions.
The solution is this:
SELECT DISTINCT Val
INTO #Values
FROM (
SELECT 0 as Val
UNION ALL
SELECT Start FROM Ranges
UNION ALL
SELECT [End]-1 FROM Ranges
) x
ORDER BY Val
The idea is that after you compute the ranges count per each of the start and end values you know that between one and the next the count of ranges will remain the same. The join is significantly faster, there is no ugly WHILE block and you don't need a 6 billion value table. It's easier to plot on a chart as well, with either of these variations:
See the variations:
SELECT v.Val,
Count(r.Start) as Nr
FROM #Values v
LEFT JOIN Ranges r
ON r.Start<=v.Val AND r.[End]>v.Val
GROUP BY v.Val
The result of the query above will be:
Value | Nr |
---|---|
0 | 0 |
10 | 2 |
19 | 2 |
20 | 2 |
25 | 3 |
29 | 3 |
34 | 2 |
39 | 1 |
0 comments:
Post a Comment