Glee Programming Language: Sequence Operator Comments

Sequence Operator Commentary

General:. In Glee, sequences are a way of collecting dissimilar things into an ordered object. A sequence is a vector of pointers to Glee objects. Sequences at any point in time are either homogeneous (all elements are the same Glee object type as in all numerics), or heterogeneous (at least one element is different from others). Homogeneous sequences can be disclosed to elements of their objects. For example, a sequence containing two numeric vectors can be disclosed to a single numeric vector containing the elements of the two.

Catenate Column:Glee really doesn't have columns but it does have sequences of sequences. The catenate column operator helps you form these constructs.

Depth ( ** ): Depth is not a difficult concept. These examples illustrate it clearly.

Disclose ( <):The first three example pairs just repeat the previous example showing how disclose (<) undoes the work of enclose (>). The example pair (1 'a' > < $; 1 'a' > < < < < $;) shows that no matter how hard we try, we can't disclose away heterogeneity. However, we can dislose away mere depth or homogeneity as in ( (1 2 3>)(4 5 6>) $; (1 2 3>)(4 5 6>)< $; ) where we do obtain a base object with combined elements (Num[ I719R1C6:I] 1 2 3 4 5 6)

Disclose Deep ( <** ):Sequences can get built up in many implicit ways. Often you just want to display output and the depth and complexity of the sequences is not a concern. The disclose deep operator (<**) is formed by combining the glyphs for disclose "<" and depth "**". It peals away depth and homogeneity bringing the sequence to as shallow a level as possible or even reducing it to a single object combining vector elements.

Dyadic Segment ( string \| delimiters) :With the dyadic segmentation operator, you have the freedom to choose your own delimiters. They are given as a string of delimiting characters in the right argument. This operator glyph is made up the the segmenting glyph "\" and the any glyph "|". So it reads, segment-on-any delimiters.

Dyadic Segment and eat( string \|~ delimiters) :Extending the previous example and staying with the philosophy, adding the without glyph to our operator yields a result that has eaten the delimiters it finds.

:Enclose ( > )The enclose operator symbol " > " was chosen for the way it visually depicts pinching or closing off what is to its left. The disclose " < " operator opens up or discloses what is to its left. Now in this example, the expression (1 2 3 $ ;) is a simple numeric vector whose verbose display is (Num[ I227R1C3:I] 1 2 3). Using the enclose (>) operator in (1 2 3 > $;) makes a one element sequence. Its verbose display is (Seq[ I246R1C1T:P]: [1]Num[ I242R1C3:I] 1 2 3). The example for the string ('abc' $; 'abc' > $;) does the same. The next example starts with a heterogeneous sequence (1 'a' $;). It contains a numeric object in the first element and a string object in the second element. The verbose display is (Seq[ I270R1C2T:P]: [1]Num[ I261R1C1:I]1 [2]String[ I263R2C1:C]a). Enclosing this (1 'a' > $;) yields a single element sequence containing the two element heterogeneous sequence (Seq[ I288R1C1T:P]: [1]Seq[ I284R1C2T:P]: *[1]Num[ I275R1C1:I]1 *[2]String[ I277R2C1:C]a). The stars (e.g.*[1]Num) show depth.

Expose with explicit separator (Dyadic) ( obj ,, separators): This example repeats the previous example except I explicitly specify the separators (e.g '<1>' and '<2>' ). The first separator is used at the deepest points with subsequent separators at lesser depths. The last separator is used repeatedly if there is more depth than separators. If we replace the '<1>' separator with space and the '<2>' with newline we have the monadic form of this operator.

Expose with default separator (Monadic) ( ,, ): Glee is unusual in the way it implicitly forms sequences. It is common in Glee to take advantage of this to assemble output. If you just display deep sequences, the element displays are catenated with no separation. It is common to want space separation at the deepest level and newline separation at all other levels. This example shows displays with and without monadic expose. In the next example you can see more clearly what is really happening.

Expose with default separator (Monadic) ( ,,\ ): This is a variation on the ,, theme. Rather than separating by space-lf, this operator separates by lf -lf. It is commonly used when listing words, lines, or filenames down the page rather than across it.

Make Column (Monadic) ( ,| ):Glee really doesn't have columns but it does have sequences of sequences. The make column operator helps you form these constructs.

Monadic Segmentation ( \ )This example makes use of Glee's "raw capture ( $R ... $r ...) facility to produce a source object for demonstration. The expression " $R*** newline $rNow ... " does raw capturing. The leading "$R" switches the parser into capture mode. It begins looking for a string given between "$R" and "$r" (in this case "***newline"). Armed with that string, it then searches remaining text for the string (finding it at "country.***newline)". It then peals off this "delimiter" string ( "*** newline") and assigns its catch ("=>text;"). Text now contains the raw data. This example and those that follow will use this technique to make up objects for demonstration. So you see, I have a string object with embedded newline characters (i.e. 13 10 #asc). The monadic segmentation operator "\" will take a string and produce a sequence. Each item of the sequence is a string broken where newlines occur. The facility is robust. It looks for combinations of return and newline characters and makes logical breaks. This is commonly needed to break text or program code into lines for subsequent comparison and analysis.

Note: In this example and those that follow, I have abbreviated the verbose output (e.g. "[1]String[ I526R1C17:C]" to "[1]Str[]") to make the results more readable and to take up less space.

Monadic Segment and Eat ( \~ ):Often when segmenting text, you are only interested in material between the delimiters; the delimiters themselves you want to discard. This example shows the monadic version of the "segment and eat" operator ( \~ ). It removes the newline type whitespace. The genesis of this operator is the combination of the glyph for segmentation "\" and the glyph for without "~" yielding "\~". You see, there is some method in the choice of operator symbols.

Segment words ( \& ):I've chosen a compound example to give you a taste of Glee's power. Remember "|" is the glyph for "any". Similarly, "&" is the glyph for "all". It doesn't make much sense to segment on all delimiters so I have chosen the "\&" to mean word segmentation (segment all letters together into words). This example begins by capturing some raw text into the variable named text ($R____$r To be or not to be; that is the question. ____ =>text;). I segment it into its words and save them (text \& =>w;). Display of the word count is simply ((w#) ' total words' $;). The unique operator ("&") gives me the unique words ((w & => u#) ' unique words' $;) which I save, count, and display. Using Indices of group ("``&") I collect in a sequence, the indices where each word begins in the text (w ``& =>aggr;). It is interesting in an article to see how special the words are. I count the occurrences of the words and save it (aggr /# =>cnts;). I determine special words with ( 'Words Used Once:' (cnts *= 1 \+) $;). To see the words in decreasing order of occurrence I sort (aggr[cnts ``<]=>aggr;). Using the :for control structure, I display the words and their counts (:for(aggr){aggr< => idx; idx # =>n; w[idx <-]':' n ' time' ('s'[n>1]) (13#asc)} $;). Notice how I use the count to report with proper grammar (' time'('s'[n>1])). On an article with 1505 words, 583 were unique, 364 were used once. This little Glee program is 384 bytes. It analyzes the article in 2.25 seconds on my 400MHz Pentium II. So if anyone asks about Glee, tell them it is lightning fast ... 171 bytes per second.

Ravel ( , ):There are four example pairs here to illustrate the behavior of the ravel operator. I use the %** operator to display the object's contents as a in verbose string. The first pair <1a,1b> shows ravel removing depth. It differs from the expose deep operator (<**) in that it always returns a sequence. The expose deep operator can return a simple object if the sequence is homogeneous. <2a,2b>and <3a,3b>illustrate creating a sequence from a simple object. <4a,4b> shows that ravel has no effect on depth 1 sequences.

Segment on index:This example is very simple. You would typically find white space between the words. So you can see what is going on, I use "^" with the characters in this and the next example. I save a string in text. I mark in text where a character is "^". This produces a bit vector or "marks". I then use those marks to segment the text. With marking, the marked character is considered at the end of the field. The second part of this example assumes a known record layout. It uses integers defining the beginning of fields for doing the segmentation.

Segment on Field:Often you have a string you wish to segment into fields defined as certain positions. This example illustrates a simple way of doing this.

Segment on index and eat:This example is the same as the previous one except it eats the characters at the marked positions.

Set Depth ( obj ** depth): As the previous example showed, you can achieve depth through repeated enclosure. However, the dyadic use of the depth (**) operator allows you to add or reduce depth in a single step. It is limited by the maximum depth allowed and by the minimum depth achievable. This operator may be used when indexing into a sequence where elements are to each have the same depth.

Contains ( ^&): When dealing with sequences there are contexts. First is the context of the sequence itself. Second is the context of each element of the sequence. The first item in this example shows marking sequences containing objects. With the addition of the at each operator (@& ), detailed later, the context for a subsequent operator is made to be each element rather than the sequence as a whole. Glee takes the sequence apart and delivers each element to the operator collecting results in a sequence which it returns. When this is expected to be homogeneous, you will then likely disclose the result to get a base object (e.g bit vector).