Case Study CS00001: Comparing two lists of e-mail addresses

The problem: Given two lists of e-mail addresses, report deficiencies in each list and create a combined list.

The solution:

  1. From raw data, extract sequences of e-mail addresses
  2. Show list A without list B entries
  3. Show list B without list A entries
  4. Combine lists and remove duplicates.

Note: You can cut and paste these code fragments into the code pane of the Glee interpreter and experiment as you go along to see the actual operations live.

The Glee code:

$* Capture "List A" raw data removing returns and newline chars *$
$R__$r
"A:Name 1" <name-1@addr1.com>
"A:Name 2" <name-2@addr2.com>
"A:Name 3" <name-3@addr3.com>
__ ~(13 10#asc)=> A;

$* Capture "List B" raw data removing returns and newline chars *$
$R__$r
"B:Name 2" <Name-2@Addr2.com>
"B:Name 3" <Name-3@Addr3.com>
"B:Name 4" <Name-4@Addr4.com>
__ ~(13 10#asc) => B;

$$ Extract e-mail address part and segment
"--- aa -----------------" $;
A[A *<>'<>'] %\ \|'>' => aa %**$;
"--- bb -----------------" $;
B[B *<>'<>'] %\ \|'>' => bb %**$;

$$ Show each list without the other
"--- aa ~ bb -----------"$;
aa ~ bb %**$;
"--- bb ~ aa -----------"$;
bb ~ aa %**$;

$$ Combine the lists and return unique elements
"--- aa,bb @== & ----------" $;
aa,bb @== & => cc%**$;

The Output:
--- aa -----------------
Seq[I497R1C3T:K]:
[1]String[I472R2C18:C]<name-1@addr1.com>
[2]String[I473R2C18:C]<name-2@addr2.com>
[3]String[I474R2C18:C]<name-3@addr3.com>
--- bb -----------------
Seq[I579R1C3T:K]:
[1]String[I554R2C18:C]<name-2@addr2.com>
[2]String[I555R2C18:C]<name-3@addr3.com>
[3]String[I556R2C18:C]<name-4@addr4.com>
--- aa ~ bb -----------
Seq[I618R1C1T:K]:
[1]String[I472R2C18:C]<name-1@addr1.com>
--- bb ~ aa -----------
Seq[I659R1C1T:K]:
[1]String[I556R2C18:C]<name-4@addr4.com>
--- aa,bb @== = ----------
Seq[I730R1C4T:K]:
[1]String[I472R3C18=C]<name-1@addr1.com>
[2]String[I554R3C18=C]<name-2@addr2.com>
[3]String[I555R3C18=C]<name-3@addr3.com>
[4]String[I556R3C18=C]<name-4@addr4.com>

The play-by-play:

$* Capture "List A" raw data removing returns and newline chars *$
$R__$r
"A:Name 1" <name-1@addr1.com>
"A:Name 2" <name-2@addr2.com>
"A:Name 3" <name-3@addr3.com>
__ ~(13 10#asc)=> A;

  1. The $* ... *$ span is Glee's way of entering a bulk comment
  2. $R__$r....__ captures everything between the "$r and the __" delimiter as raw text. Note: You can make up your own delimiter strings and be sure you won't collide with something in the actual raw data.
  3. ~(13 10#asc) means "~ without" the 13th and 10th "#asc ASCII" characters (i.e return and new line) . Note: the parenthesis are required for precedence ordering ... otherwise it would read:  without numbers 10 and 13 and that's mixing characters and numbers which wins you a diagnostic message and terminates the program.
  4. => A; "=> assigns" the result creating A and "; ends the line" without output.

$$ Extract e-mail address part and segment
"--- aa -----------------" $  ;
A[A *<>'<>'] %\ \|'>' => aa %**$;

  1. ... $$ Extract e-ma... is Glee's end of line comment.
  2. "--- aa -----------------" is just a string. $ ;means "$ display" and "; end the line".
  3. A[ ... ] Means index out of variable A
  4. *<> means "* mark" "<> bracketing". So A *<>'<>' means mark spans within the left bracket of '<' and right bracket of '>'.
    For example 'Chunk <one> and <two>' *<>  '<>'
    Returns: 00000011 11100000 11111. These 0's and 1's then mark positions where characters of the left are to be ignored or selected respectively (i.e. it is Glee's way of indexing out with booleans.)
  5. So: 'Chunk <one> and <two>' => x; x[x *<>  '<>']
    Results in: <one><two>
  6. %\ means "% convert" "\ down" which for characters is lower case. For numbers it is lower integer from float.
  7. \| means "\ segment " by "| any" listed character. Segmenting creates a sequence of items. Thus \|'>' means make a sequence of items by breaking where you see the ">" character.
  8. This results in a sequence of strings:
    Seq[I222R1C3T:K]:
    [1]String[I198R2C18:C]<name-1@addr1.com>
    [2]String[I199R2C18:C]<name-2@addr2.com>
    [3]String[I200R2C18:C]<name-3@addr3.com>
    The use of the %** "% convert to" "** verbose string" gives the Seq[I222R1C3T:K]:
    [1]String[I198R2C18:C]...
    formatting of the output so we can see the elements of the sequence

$$ Show each list without the other
aa ~ bb %**$;

  1. aa ~ bb means the sequence aa "~ without" the elements found in sequence bb
  2. $; means "$ display" before "; ending line"

$$ Combine the lists and return unique elements
aa,bb @== & => cc%**$;

  1. aa,bb means ", catenate" sequences aa and bb to form a single sequence.
  2. @== &means "@ using" "== exact" comparison return "& unique elements". This eliminates duplicates in the catenated sequence.

This completes the example. To better understand these operators and other things you can do with them, consult the operator pages according to the type of data you see being operated on..