General:. Work focused on
bringing GREP (General Regular Expression Processor) to GLEE. This should
please new GLEE users coming from PERL (Practical Extraction Report Language or
Pathologically Eclectic Rubbish Lister?). The Borland C++ Builder has PCRE
(PERL-Compatible Regular Expressions
project) (use this link for documentation.) in the runtime. Thus, I
have added GREP capability with virtually no size cost to GLEE. I suspect there
will be considerable refactoring. However, the examples here at least
illustrate how you could use GLEE to learn GREP. I had the option of deciding
what combination of 4 mode switches would constitute GLEE's default. Having
virtually no knowledge of GREP before this exercise, my choices may need
adjustment later. The defaults I chose are (m):MULTILINE, (i):CASELESS,
(s):DOTALL and (x):EXTENDED. The GREP expression can use (?-misx) to change all
of this or something like (?mi-sx) to set Multiline and Caseless and not Dotall
and not Extended.
The #grep type: The
GREP facility is wrapped by GLEE as a new type, the GREP type. As a type, it
has methods and properties which can be explored with the : query
operator. Right now it only displays the Regular Expression (:RE). In
this example I give the GREP Regular Expression pattern when I create the
object "p". It's the left argument string. The next example
shows changing the RE. Regardless, when the RE is set, GLEE
compiles the pattern and saves the compiled result. The GREP object can then be
used for matching against any string. In this case I'm using the ``
(indices of) operator. With a GREP object right argument and string left
argument, the pattern is run agains the text. Where the patter matches, a two
element numeric is created. The first element is the index in the string of the
beginning of the match. The second element is the end of the match plus 1.
These can be used for displaying in the vacinity of the match as illustrated.
Comparing Glee with
GREP:In this example I show finding the word "ram" in all of
the Aesops fables. I first read the file into "t". I then
use Glee's word search ``& to get the word indices
and display the word in context. I do the same thing using a GREP pattern. For
some reason, GREP does not find the last pattern.
Relative Timing:. This
example runs each technique 1,000 times. It reveals that GREP is much faster
than Glee in this example. However, finding all occurrences of a
word in a 66K file in 7ms is acceptable for most applications. And where it is
not, there is GREP for those who understand how to use it.
Why is GREP missing a
match?:. I still don't know why GREP misses one occurrence when working
in bulk. In this example, I run GREP line by line (which is typical of how we
see it used) and it finds all the occurances.
Line by Line processing and
timing. This illustrates that the Glee word search is
nearly twice as fast as GREP on the line by line basis. Until I find out why
GREP isn't finding all occurrences in bulk, I will have to recommend this
approach. In the mean time I think I'll havef to favor Glee for
doing bulk searches.
Patterns and subpatterns:.
I'm going to have to get together with a GREP jock before settling on exactly
how Glee will ultimately support GREP. In the meantime I've
chosen to use nesting to collect the GREP results for patterns and sub
patterns. There are three examples here. All three match on "cat"
(1-3 ... remember the 2nd index is the beginning of the next character). The
first example then matches "aract" (4-8). The second example matchs
"erpillar" (4-11). The last one matches nothing more (4-4 is a 0
length field). In these cases I may want that to show 0 0 or _1 _1. I just
don't know yet. If someone does, please message me at
<feedback@WithGLEE.com>.
Parsing:. In those cases
where you want to parse the text based on matches, Glee as usual
gives you several choices. This example shows the indirect method of securing
field start and end (+1) indices and using the field start for the cut point. I
implemented a more direct method for GREP types that allows you to use the
pattern to do the cutting directly. Again, please grant me, this is an
experimental work in progress. It's going to take me a while to determine if I
like GREP matching powers better than Glee's. That can't happen
until I really master GREP and then know how to knit it into Glee
philosophically. One thing that has come out of this little diversion. When
someone quips that APL, J, K, and now Glee are write only
languages, I can drag out some GREP patterns.
Indices Bracketing:.
The ``& with a #GREP right argument naturally produces
indices for beginning and ending+1 of words. However, for a simple string right
argument ``& just produces the indices at the beginning of the
word. A new operator ``&<> produces the #GREP style
result for such an argument.
Marking Beginning and Ending of
words. With Glee's flexible matching capabilities, the length
of the match varies with each match. Thus you need to be able to mark the
beginning as well as the ending of matches. With all the operators that do
matches explicitly (*&) or implicitly (->>&) I
needed away to signal the operator. I used the same technique I used for
signaling exact (@==) and Glee (@=) sorting.
For beginning I use (@<) before the operator. For ending I use
(@>) to signal I'm interested in the end of the matched string.
Query. The choice of
@> to signal ending caused me to find another glyph set for query
End Of String. The ? is the natural query glyph, so I am now employing
it where the operator is about making a query. So @> now becomes
?@> (are we at the end) and @< becomes ?@<
(are we at the beginning). What is the index of the previous cursor position
?`@<<-; what is the current cursor position ?`@.