Case Study CS00004: Display article out of text file
The problem: Locate text in a file using table
of contents information and then display the text.
The solution:
- Read the contents of the file
- Build a sequence of indices where articles begin
- Segment the text into a sequence of articles by breaking on the indices
- Locate an article in sequence by searching for article title
- Display article by indexing out the sequence item
Note: You can cut and paste these code fragments into the code pane of
the Glee interpreter and experiment as you go along to see the actual
operations live.
The Glee code:
'The Peacock and Juno'=> title;
'c:\glee\website\casestudies\' #fc => path;
'aesopsfables.txt'#file path => f;
f[]=>t; t @== ``& 'FABLE:' => i;
t \ i <-_1 => t;
t[t @& ^& title < ]
The Output:
FABLE: The Peacock and Juno
A Peacock once placed a petition before Juno desiring to have the voice of a
nightingale in addition to his other attractions; but Juno refused his request.
When he persisted, and pointed out that he was her favourite bird, she said:
"Be content with your lot; one cannot be first in
everything."
The play-by-play:
- 'The Peacock and Juno'=> title;
Save the title in a variable so it is easily changed.
- 'c:\glee\website\casestudies\' #fc =>
path;
Path to the file (file context) is defined separately in case the file gets
moved.
- 'aesopsfables.txt' #file path => f;
The file is defined by its path and name.
- f[]=>t;
f[] Indexing into a file object
with an empty index reads the complete contents of the file which we
=> assign to t
- t @== ``&'FABLE:' => i;
t @== sets a switch in the
t object saying we want to use an exact
(i.e. same case and all symbols used) compare. In ``& "``" means Indices
of and "&" means
All matching so the operator returns all the
indices of the locations in t of the
string 'FABLE:'. Each article begins with
this string so we get back the indices of the beginning of each article. We
assign => the result to
i. If you inspect the contents of
i you will see 2939 3483 4401 ... 64832 65674 66114. These are the
indices of the beginning of the articles.
- t \ i <-_1 => t;
The " \ " means
Segment. When the right operand
i is a number it means create a sequence
of items by breaking at each index position. <-_1
=> t; Drops the first item (it contains the table of contents)
and we replace t with the result.
- t[t @& ^& title < ]
Using the title string we find the item in the sequence and index the article
out. In "t @& ^& title
" we read "t @& as
"At each item of t " mark where the item
^& contains the
string title. If you examined the
result you would see:...0000000010000000
... with the "1" marking the item containing the string.
On closer inspection you would see that this is a sequence of bit items rather
than a bit vector. We can obtain the bit vector we need for indexing with the
< expose
operator. It removes one level of depth in a sequence if the result will
be homogeneous (i.e. all the same type ... here we have all bits). With this
bit vector we can "index out" the items of interest directly which
returns our article.
This completes the example. To better understand these operators and other
things you can do with them, consult the operator pages according to the type of
data you see being operated on.