Case Study CS00004: Display article out of text file

The problem: Locate text in a file using table of contents information and then display the text.

The solution:

  1. Read the contents of the file
  2. Build a sequence of indices where articles begin
  3. Segment the text into a sequence of articles by breaking on the indices
  4. Locate an article in sequence by searching for article title
  5. Display article by indexing out the sequence item

Note: You can cut and paste these code fragments into the code pane of the Glee interpreter and experiment as you go along to see the actual operations live.

The Glee code:

'The Peacock and Juno'=> title;
'c:\glee\website\casestudies\' #fc => path;
'aesopsfables.txt'#file path => f;
f[]=>t; t @== ``& 'FABLE:' => i;
t \ i <-_1 => t;
t[t @& ^& title < ]

The Output:
FABLE: The Peacock and Juno

A Peacock once placed a petition before Juno desiring to have the voice of a nightingale in addition to his other attractions; but Juno refused his request. When he persisted, and pointed out that he was her favourite bird, she said:

"Be content with your lot; one cannot be first in everything."

The play-by-play:

  1. 'The Peacock and Juno'=> title;
    Save the title in a variable so it is easily changed.
  2. 'c:\glee\website\casestudies\' #fc => path;
    Path to the file (file context) is defined separately in case the file gets moved.
  3. 'aesopsfables.txt' #file path => f;
    The file is defined by its path and name.
  4. f[]=>t;
    f[] Indexing into a file object with an empty index reads the complete contents of the file which we => assign to t
  5. t @== ``&'FABLE:' => i;
    t @== sets a switch in the t object saying we want to use an exact (i.e. same case and all symbols used) compare. In ``&  "``" means Indices of and "&" means All matching so the operator returns all the indices of the locations in t of the string 'FABLE:'. Each article begins with this string so we get back the indices of the beginning of each article. We assign => the result to i. If you inspect the contents of i you will see 2939 3483 4401 ... 64832 65674 66114. These are the indices of the beginning of the articles.
  6. t \ i <-_1 => t;
    The  " \ " means Segment. When the right operand  i is a number it means create a sequence of items by breaking at each index position. <-_1 => t; Drops the first item (it contains the table of contents) and we replace t with the result.
  7. t[t @& ^& title < ]
    Using the title string we find the item in the sequence and index the article out. In "t @& ^& title " we read "t @& as "At each item of  t " mark where the item ^& contains the string title. If you examined the result you would see:...0000000010000000 ... with the "1" marking the item containing the string. On closer inspection you would see that this is a sequence of bit items rather than a bit vector. We can obtain the bit vector we need for indexing with the < expose operator. It removes one level of depth in a sequence if the result will be homogeneous (i.e. all the same type ... here we have all bits). With this bit vector we can "index out" the items of interest directly which returns our article.

This completes the example. To better understand these operators and other things you can do with them, consult the operator pages according to the type of data you see being operated on.