Case Study CS00013: Break numeric vector into ranges.
The problem: Given a vector of numbers, break
it into ranges and list the ranges. If your bank uses this technique to list
cleared checks it makes quick work of marking off in your reconcilliation.
The solution: This example illustrates the
steps one might take in sculpturing this problem. The first step is to develop
an algorithm. The last step is to make it into routine so that the next time
you need it, the first step becomes using the routine you created.
- Remove the duplicates and order ascending
- Find positions where first difference is not 1
- Segment into vectors at those positions
- Format ranges as (begin - end)
- Format singletons as (num)
- Separate the ranges by semicolons
- Make it a reusable routine
Note: You can cut and paste these code fragments into the code pane of
the Glee interpreter and experiment as you go along to see the actual
operations live.
The Glee code:
20 %%20 ?3 & =>theData;'theData:
'(theData)$;
theData -- =>firstDiff;'firstDiff: '(firstDiff)$;
'Segs: '$;
theData \(firstDiff *~=1``)=>segs %**$;
'Ranges:'$;segs @&(=>r;:?(r# =1){r}::{(r<-)(r->)})=>ranges
%**$;
'firstCut:' $; ranges,,(('a')('b')('c')('d'))=>firstCut$;
'finalCut:' $; ranges,,(('-')('; '))=>finalCut$;
'Ranges'#pgm{
n& =>n\(n-- *~=1``)@&
(=>r;:?(r# =1){r}::{(r<-)(r->)})
,,(('-')('; '))%*};
'Data ranges are: ' (theData[.n]Ranges) $;
The Output:
theData: 2 3 4 5 6 9 10 11 13 17 18
firstDiff: 0 1 1 1 1 3 1 1 2 4 1
Segs:
Seq[R1C4T:K]
[1]Num[R2C5:I]2 3 4 5 6
[2]Num[R2C3:I]9 10 11
[3]Num[R2C1:I]13
[4]Num[R2C2:I]17 18
Ranges:
Seq[R1C4T:K]
[1]Seq[R2C2T:K]
*[1]Num[R1C1:I]2
*[2]Num[R1C1:I]6
[2]Seq[R2C2T:K]
*[1]Num[R1C1:I]9
*[2]Num[R1C1:I]11
[3]Num[R3C1:I]13
[4]Seq[R2C2T:K]
*[1]Num[R1C1:I]17
*[2]Num[R1C1:I]18
firstCut:
2a6b9a11b13b17a18
finalCut:
2-6; 9-11; 13; 17-18
Data ranges are: 2-6; 9-11; 13; 17-18
The play-by-play:
- 20 %%20 ?3 & =>theData;'theData:
'(theData)$;: We begin by creating some data. Reading left to right
I take the number 20 and reshape it by 20 (i.e. make 20 copies into a vector)
(20 %%20 ). Using a random seed of 3 I
generate a random vector using each element (i.e. each 20) as the upper limit (
?3 ). I use the unique operator to
retain just unique numbers &. This
also sorts my data in ascending order for me and I assign it to
=>theData;. This just gives me some
data for my example which I display: theData: 2 3 4 5
6 9 10 11 13 17 18
- theData -- =>firstDiff;'firstDiff:
'(firstDiff)$;. My data is in order so taking the first difference
yields 0, then 1 for contiguous numbers and something else for non-contiguous
numbers. theData -- =>firstDiff;. I
display this result: firstDiff: 0 1 1 1 1 3 1 1 2 4
1. Where I don't see 1's, a contiguous range is broken.
- theData \(firstDiff *~=1``)=>segs
%**$;. Reading left to right, I take my data and cut it into
segments theData \. My cutting positions
are the indices where the first difference is not equal to 1(firstDiff *~=1. . I want field indices so I
use the indices-of operator `` to
turn the bit vector into a numeric vector of indexes . These are the beginnings
of my fields of contiguous numbers. I save this result and display it verbosely
to see the segmentation.=>segs %**$.
My result Segs:
Seq[R1C4T:K]
[1]Num[R2C5:I]2 3 4 5 6
[2]Num[R2C3:I]9 10 11
[3]Num[R2C1:I]13
[4]Num[R2C2:I]17 18
I have a sequence of numeric vectors.
- 'Ranges:'$;segs @&(=>r;:?(r#
=1){r}::{(r<-)(r->)})=>ranges %**$; Where there is just
one element in the vector I want it. Where there are 2 or more I want just the
first and last. I work on each item of segs
@&. To begin my work I save the item =>r;. If r contains one element I
use it :?(r# =1){r}.; else I use the
first and last elements ::{(r<-)(r->)}. This result I save and
display verbosely =>ranges %**$;
to see the segmentation.
Ranges:
Seq[R1C4T:K]
[1]Seq[R2C2T:K]
*[1]Num[R1C1:I]2
*[2]Num[R1C1:I]6
[2]Seq[R2C2T:K]
*[1]Num[R1C1:I]9
*[2]Num[R1C1:I]11
[3]Num[R3C1:I]13
[4]Seq[R2C2T:K]
*[1]Num[R1C1:I]17
*[2]Num[R1C1:I]18
- 'firstCut:' $;
ranges,,(('a')('b')('c')('d'))=>firstCut$;. I want to format out
these ranges. I use the dyadic segment formatting
operator ,,. The fastest
way to do this is to use some easily distinguishable strings to insert between
segments at various levels(('a')('b')('c')('d')). You then look at the
result and introduce the actual separators you need at those levels as shown in
the next step.
firstCut:
2a6b9a11b13b17a18
- 'finalCut:' $; ranges,,(('-')('; '))%*
=>finalCut$; In the previous step, I supplied more separators
than I needed. Looking at the result I see only the a and b
separators were used 2a6b9a11b13b17a18.
Where I see the a's I want a dash('-'); where I see the b's I want a
semicolon and a space('; '). I remove the
sequence depth by converting to a simple string %*. Now I have the result I'm looking for.
finalCut:
2-6; 9-11; 13; 17-18
- 'Ranges'#pgm{
n& =>n\(n-- *~=1``)@&
(=>r;:?(r# =1){r}::{(r<-)(r->)})
,,(('-')('; '))%*};
Having quickly carved out my solution, I'm now ready to create a
Glee program that does this for me. I name my program
'Ranges'#pgm{. It assumes a variable
n in its namespace. It just keeps reusing this variable and acting on
it until all the steps of the sculpture are reproduced.
- 'Data ranges are: ' (theData[.n]Ranges)
$;. It is now ready to be used any time I need to display numeric
data in ranges.
Data ranges are: 2-6; 9-11; 13; 17-18
This completes the example. To better understand these operators and other
things you can do with them, consult the operator pages according to the type of
data you see being operated on.