General:. In
Glee, sequences are a way of collecting dissimilar things into an
ordered object. A sequence is a vector of pointers to Glee
objects. Sequences at any point in time are either homogeneous (all elements
are the same Glee object type as in all numerics), or
heterogeneous (at least one element is different from others). Homogeneous
sequences can be disclosed to elements of their objects. For example, a
sequence containing two numeric vectors can be disclosed to a single numeric
vector containing the elements of the two.
Catenate
Column:Glee really doesn't have columns but it does have
sequences of sequences. The catenate column operator helps you form
these constructs.
Depth ( ** ): Depth is
not a difficult concept. These examples illustrate it clearly.
Disclose ( <
):The first three example pairs just repeat the previous example
showing how disclose (<) undoes the work of enclose
(>). The example pair (1 'a' > < $; 1 'a' > < <
< < $;) shows that no matter how hard we try, we can't disclose away
heterogeneity. However, we can dislose away mere depth or homogeneity as in
( (1 2 3>)(4 5 6>) $; (1 2 3>)(4 5 6>)< $; ) where we
do obtain a base object with combined elements (Num[ I719R1C6:I] 1 2 3 4 5
6)
Disclose Deep (
<** ):Sequences can get built up in many implicit ways. Often
you just want to display output and the depth and complexity of the sequences
is not a concern. The disclose deep operator (<**) is formed by
combining the glyphs for disclose "<" and depth
"**". It peals away depth and homogeneity bringing the
sequence to as shallow a level as possible or even reducing it to a single
object combining vector elements.
Dyadic Segment ( string
\| delimiters) :With the dyadic segmentation operator, you
have the freedom to choose your own delimiters. They are given as a string of
delimiting characters in the right argument. This operator glyph is made up the
the segmenting glyph "\" and the any glyph
"|". So it reads, segment-on-any delimiters.
Dyadic Segment and eat(
string \|~ delimiters) :Extending the previous
example and staying with the philosophy, adding the without glyph to our
operator yields a result that has eaten the delimiters it finds.
:Enclose ( > )The
enclose operator symbol " > " was chosen for the way it
visually depicts pinching or closing off what is to its left. The disclose
" < " operator opens up or discloses what is to its left.
Now in this example, the expression (1 2 3 $ ;) is a simple numeric
vector whose verbose display is (Num[ I227R1C3:I] 1 2 3). Using the
enclose (>) operator in (1 2 3 > $;) makes a one
element sequence. Its verbose display is (Seq[ I246R1C1T:P]: [1]Num[
I242R1C3:I] 1 2 3). The example for the string ('abc' $; 'abc' >
$;) does the same. The next example starts with a heterogeneous sequence
(1 'a' $;). It contains a numeric object in the first element and a
string object in the second element. The verbose display is (Seq[
I270R1C2T:P]: [1]Num[ I261R1C1:I]1 [2]String[ I263R2C1:C]a). Enclosing
this (1 'a' > $;) yields a single element sequence containing the
two element heterogeneous sequence (Seq[ I288R1C1T:P]: [1]Seq[
I284R1C2T:P]: *[1]Num[ I275R1C1:I]1 *[2]String[ I277R2C1:C]a). The stars
(e.g.*[1]Num) show depth.
Expose with explicit separator
(Dyadic) ( obj ,, separators): This example repeats
the previous example except I explicitly specify the separators (e.g
'<1>' and '<2>' ). The first separator is used at
the deepest points with subsequent separators at lesser depths. The last
separator is used repeatedly if there is more depth than separators. If we
replace the '<1>' separator with space and the
'<2>' with newline we have the monadic form of this
operator.
Expose with default separator
(Monadic) ( ,, ): Glee is unusual in the way it
implicitly forms sequences. It is common in Glee to take
advantage of this to assemble output. If you just display deep sequences, the
element displays are catenated with no separation. It is common to want
space separation at the deepest level and newline separation at
all other levels. This example shows displays with and without monadic expose.
In the next example you can see more clearly what is really happening.
Expose with default separator
(Monadic) ( ,,\ ): This is a variation on the ,, theme.
Rather than separating by space-lf, this operator separates by lf -lf. It is
commonly used when listing words, lines, or filenames down the page rather than
across it.
Make Column (Monadic) (
,| ):Glee really doesn't have columns but it does
have sequences of sequences. The make column operator helps you form
these constructs.
Monadic Segmentation ( \
)This example makes use of Glee's "raw capture ( $R
... $r ...) facility to produce a source object for demonstration. The
expression " $R*** newline $rNow ... " does
raw capturing. The leading "$R" switches the parser into
capture mode. It begins looking for a string given between
"$R" and "$r" (in this case
"***newline"). Armed with that string, it then
searches remaining text for the string (finding it at " country.***
newline)". It then peals off this "delimiter" string
( "*** newline") and assigns its catch
("=>text;"). Text now contains the raw data. This example
and those that follow will use this technique to make up objects for
demonstration. So you see, I have a string object with embedded newline
characters (i.e. 13 10 #asc). The monadic segmentation operator
"\" will take a string and produce a sequence. Each item of
the sequence is a string broken where newlines occur. The facility is
robust. It looks for combinations of return and newline
characters and makes logical breaks. This is commonly needed to break text or
program code into lines for subsequent comparison and analysis.
Note: In this example and those that follow, I have abbreviated the verbose
output (e.g. "[1]String[ I526R1C17:C]" to
"[1]Str[]") to make the results more readable and to take up
less space.
Monadic Segment and Eat (
\~ ):Often when segmenting text, you are only interested in
material between the delimiters; the delimiters themselves you want to discard.
This example shows the monadic version of the "segment and eat"
operator ( \~ ). It removes the newline type whitespace. The
genesis of this operator is the combination of the glyph for segmentation
"\" and the glyph for without "~"
yielding "\~". You see, there is some method in the choice
of operator symbols.
Segment words (
\& ):I've chosen a compound example to give you a taste of
Glee's power. Remember "|" is the glyph for
"any". Similarly, "&" is the glyph for
"all". It doesn't make much sense to segment on all delimiters so I
have chosen the "\&" to mean word segmentation (segment
all letters together into words). This example begins by capturing some raw
text into the variable named text ($R____$r To be or not to be; that
is the question. ____ =>text;). I segment it into its words and save
them (text \& =>w;). Display of the word count is simply
((w#) ' total words' $;). The unique operator
("&") gives me the unique words ((w & => u#)
' unique words' $;) which I save, count, and display. Using Indices of
group ("``&") I collect in a sequence, the indices where
each word begins in the text (w ``& =>aggr;). It is interesting
in an article to see how special the words are. I count the occurrences of the
words and save it (aggr /# =>cnts;). I determine special words with
( 'Words Used Once:' (cnts *= 1 \+) $;). To see the words in
decreasing order of occurrence I sort (aggr[cnts ``<]=>aggr;).
Using the :for control structure, I display the words and their counts
(:for(aggr){aggr< => idx; idx # =>n; w[idx <-]':' n ' time'
('s'[n>1]) (13#asc)} $;). Notice how I use the count to report with
proper grammar (' time'('s'[n>1])). On an article with 1505 words,
583 were unique, 364 were used once. This little Glee program is
384 bytes. It analyzes the article in 2.25 seconds on my 400MHz Pentium II. So
if anyone asks about Glee, tell them it is lightning fast ... 171
bytes per second.
Ravel ( , ):There are
four example pairs here to illustrate the behavior of the ravel
operator. I use the %** operator to display the object's
contents as a in verbose string. The first pair <1a,1b> shows
ravel removing depth. It differs from the expose deep operator
(<**) in that it always returns a sequence. The expose deep
operator can return a simple object if the sequence is homogeneous.
<2a,2b>and <3a,3b>illustrate creating a sequence
from a simple object. <4a,4b> shows that ravel has no effect on
depth 1 sequences.
Segment on index:This example
is very simple. You would typically find white space between the words. So you
can see what is going on, I use "^" with the characters in
this and the next example. I save a string in text. I mark in text where a
character is "^". This produces a bit vector or
"marks". I then use those marks to segment the text. With marking,
the marked character is considered at the end of the field. The second part of
this example assumes a known record layout. It uses integers defining the
beginning of fields for doing the segmentation.
Segment on Field:Often you
have a string you wish to segment into fields defined as certain positions.
This example illustrates a simple way of doing this.
Segment on index and
eat:This example is the same as the previous one except it eats the
characters at the marked positions.
Set Depth ( obj **
depth): As the previous example showed, you can achieve depth
through repeated enclosure. However, the dyadic use of the depth (**)
operator allows you to add or reduce depth in a single step. It is limited by
the maximum depth allowed and by the minimum depth achievable. This operator
may be used when indexing into a sequence where elements are to each have the
same depth.
Contains ( ^&): When
dealing with sequences there are contexts. First is the context of the sequence
itself. Second is the context of each element of the sequence. The first item
in this example shows marking sequences containing objects. With the addition
of the at each operator (@& ), detailed later, the context
for a subsequent operator is made to be each element rather than the
sequence as a whole. Glee takes the sequence apart and
delivers each element to the operator collecting results in a sequence which it
returns. When this is expected to be homogeneous, you will then likely disclose
the result to get a base object (e.g bit vector).