Case Study CS00013: Break numeric vector into ranges.

The problem: Given a vector of numbers, break it into ranges and list the ranges. If your bank uses this technique to list cleared checks it makes quick work of marking off in your reconcilliation.

The solution: This example illustrates the steps one might take in sculpturing this problem. The first step is to develop an algorithm. The last step is to make it into routine so that the next time you need it, the first step becomes using the routine you created.

  1. Remove the duplicates and order ascending
  2. Find positions where first difference is not 1
  3. Segment into vectors at those positions
  4. Format ranges as (begin - end)
  5. Format singletons as (num)
  6. Separate the ranges by semicolons
  7. Make it a reusable routine

Note: You can cut and paste these code fragments into the code pane of the Glee interpreter and experiment as you go along to see the actual operations live.

The Glee code:

20 %%20 ?3 & =>theData;'theData: '(theData)$;

theData -- =>firstDiff;'firstDiff: '(firstDiff)$;

'Segs: '$;
theData \(firstDiff *~=1``)=>segs %**$;

'Ranges:'$;segs @&(=>r;:?(r# =1){r}::{(r<-)(r->)})=>ranges %**$;
'firstCut:' $; ranges,,(('a')('b')('c')('d'))=>firstCut$;

'finalCut:' $; ranges,,(('-')('; '))=>finalCut$;

'Ranges'#pgm{
  n& =>n\(n-- *~=1``)@&
    (=>r;:?(r# =1){r}::{(r<-)(r->)})
    ,,(('-')('; '))%*};

'Data ranges are: ' (theData[.n]Ranges) $;

The Output:
theData: 2 3 4 5 6 9 10 11 13 17 18
firstDiff: 0 1 1 1 1 3 1 1 2 4 1
Segs:
Seq[R1C4T:K]
[1]Num[R2C5:I]2 3 4 5 6
[2]Num[R2C3:I]9 10 11
[3]Num[R2C1:I]13
[4]Num[R2C2:I]17 18
Ranges:
Seq[R1C4T:K]
[1]Seq[R2C2T:K]
*[1]Num[R1C1:I]2
*[2]Num[R1C1:I]6
[2]Seq[R2C2T:K]
*[1]Num[R1C1:I]9
*[2]Num[R1C1:I]11
[3]Num[R3C1:I]13
[4]Seq[R2C2T:K]
*[1]Num[R1C1:I]17
*[2]Num[R1C1:I]18
firstCut:
2a6b9a11b13b17a18
finalCut:
2-6; 9-11; 13; 17-18
Data ranges are: 2-6; 9-11; 13; 17-18

The play-by-play:

  1. 20 %%20 ?3 & =>theData;'theData: '(theData)$;: We begin by creating some data. Reading left to right I take the number 20 and reshape it by 20 (i.e. make 20 copies into a vector) (20 %%20 ). Using a random seed of 3 I generate a random vector using each element (i.e. each 20) as the upper limit ( ?3 ). I use the unique operator to retain just unique numbers &. This also sorts my data in ascending order for me and I assign it to =>theData;. This just gives me some data for my example which I display: theData: 2 3 4 5 6 9 10 11 13 17 18
  2. theData -- =>firstDiff;'firstDiff: '(firstDiff)$;. My data is in order so taking the first difference yields 0, then 1 for contiguous numbers and something else for non-contiguous numbers. theData -- =>firstDiff;. I display this result: firstDiff: 0 1 1 1 1 3 1 1 2 4 1. Where I don't see 1's, a contiguous range is broken.
  3. theData \(firstDiff *~=1``)=>segs %**$;. Reading left to right, I take my data and cut it into segments theData \. My cutting positions are the indices where the first difference is not equal to 1(firstDiff *~=1. . I want field indices so I use the indices-of operator `` to turn the bit vector into a numeric vector of indexes . These are the beginnings of my fields of contiguous numbers. I save this result and display it verbosely to see the segmentation.=>segs %**$. My result Segs:
    Seq[R1C4T:K]
    [1]Num[R2C5:I]2 3 4 5 6
    [2]Num[R2C3:I]9 10 11
    [3]Num[R2C1:I]13
    [4]Num[R2C2:I]17 18

    I have a sequence of numeric vectors.
  4. 'Ranges:'$;segs @&(=>r;:?(r# =1){r}::{(r<-)(r->)})=>ranges %**$; Where there is just one element in the vector I want it. Where there are 2 or more I want just the first and last. I work on each item of segs @&. To begin my work I save the item =>r;. If r contains one element I use it :?(r# =1){r}.; else I use the first and last elements ::{(r<-)(r->)}. This result I save and display verbosely =>ranges %**$; to see the segmentation.
    Ranges:
    Seq[R1C4T:K]
    [1]Seq[R2C2T:K]
    *[1]Num[R1C1:I]2
    *[2]Num[R1C1:I]6
    [2]Seq[R2C2T:K]
    *[1]Num[R1C1:I]9
    *[2]Num[R1C1:I]11
    [3]Num[R3C1:I]13
    [4]Seq[R2C2T:K]
    *[1]Num[R1C1:I]17
    *[2]Num[R1C1:I]18
  5. 'firstCut:' $; ranges,,(('a')('b')('c')('d'))=>firstCut$;. I want to format out these ranges. I use the dyadic segment formatting operator  ,,. The fastest way to do this is to use some easily distinguishable strings to insert between segments at various levels(('a')('b')('c')('d')). You then look at the result and introduce the actual separators you need at those levels as shown in the next step.
    firstCut:
    2a6b9a11b13b17a18
  6. 'finalCut:' $; ranges,,(('-')('; '))%* =>finalCut$; In the previous step, I supplied more separators than I needed. Looking at the result I see only the a and b separators were used 2a6b9a11b13b17a18. Where I see the a's I want a dash('-'); where I see the b's I want a semicolon and a space('; '). I remove the sequence depth by converting to a simple string %*. Now I have the result I'm looking for.
    finalCut:
    2-6; 9-11; 13; 17-18
  7. 'Ranges'#pgm{
      n& =>n\(n-- *~=1``)@&
        (=>r;:?(r# =1){r}::{(r<-)(r->)})
        ,,(('-')('; '))%*};
    Having quickly carved out my solution, I'm now ready to create a Glee program that does this for me. I name my program 'Ranges'#pgm{. It assumes a variable n in its namespace. It just keeps reusing this variable and acting on it until all the steps of the sculpture are reproduced.
  8. 'Data ranges are: ' (theData[.n]Ranges) $;. It is now ready to be used any time I need to display numeric data in ranges.
    Data ranges are: 2-6; 9-11; 13; 17-18

This completes the example. To better understand these operators and other things you can do with them, consult the operator pages according to the type of data you see being operated on.