A simplified regular expression module for the slope programming language
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
sloum 6a521be640 Bumps version for namespace support 3 months ago
README.md Adds namespace support 3 months ago
main.slo Adds namespace support 3 months ago
module.json Bumps version for namespace support 3 months ago

README.md

re

re is a regular expression module written entirely in Slope. Slope has build in regular expressions, mostly making this redundant. However, it can serve as a base for implementing custom match functions, as a learning tool for Slope code, or you can e-mail me to let me know your cool reason to use it!

The matching engine

This regular expression module does not support the full range of possible regular expression sequences and thus will not have as much utility as a full engine such as PCRE or the like.

Quantifiers

Three quantifiers are available. re does not support the regular expression format for exact count matches ({3}, {3,5}, or the like). Quantifiers can be escaped as described in the next section in order to match a literal ?, *, or +.

Token Effect
? Matches the previous character zero or one times
* Matches the previous character zero or more times
+ Matches the previous character one or more times

Tokens / Meta Sequences

Any token outside of the ones below match themselves including space, non-printable charcaters, newline, etc. Any character can be escaped with a percent character, including percent itself ("%%") to be the literal character: "%." will match a period and "." will match any character, for example.

Token Effect
. Matches any character
%s Matches any whitespace character
%S Matches any non-whitespace character
%p Matches any punctuation character (non-digit/non-alpha/non-control)
%P Matches any non-punctuation character
%w Matches any word character (A-Z, a-z, 0-9, _, -, ')
%W Matches any non-word character
%d Matches any digit character (0-9)
%D Matches any non-digit character
%l Matches any lowercase ascii character (a-z)
%L Matches any non-lowercase ascii character
%n Matches any ascii control character (non-printing/non-space)
%n Matches any non-control character
%u Matches any uppercase ascii character (A-Z)
%U Matches any non-uppercase ascii character
%c Matches a custom match function (set at call time)
%x Matches the given char, x here exactly

Procedures

At present re does not return match strings and can only report whether or not a match was found. To do so, use the re::match? procedure:

; (re::match? [pattern: string] [text: string] [[custom-matcher: lambda]]) => bool

(re::match? "a.%d?c+" "afcc") ; => #t
(re::match? "%c+" "afcc"
         (lambda (ch#)
                 (member? [(string->rune "x") (string->rune "y") (string->rune "z")] ch#))) ; => #f

Since re does not support character groupings a custom match procedure can optionally be passed to re::match?. It should accept one argument: a number representing a character's code point, aka rune. A rune can be acquired from a regular string via the string->rune builtin. The custom match defaults to acting the same as .: it matches anything. This is a small workaround to lacking the more complex and capable grouping constructs found in most regular expression engines.

Utility procedures

There are also a few utility procedures that could be useful in other places:

  • re::is-space?, takes a rune number and checks if it represents a whitespace character
  • re::is-word-char?, takes a rune number and checks if it represents a word character (A-Z, a-z, 0-9, _, -)
  • re::is-digit?, takes a rune number and checks if it represents a digit character (0-9)
  • re::is-lower-a-z?, takes a rune number and checks if it represents a lowercase character in the ascii range (a-z)
  • re::is-upper-a-z?, takes a rune number and checks if it represents a uppercase character in the ascii range (A-Z)

Usage

Usage information is provided. Once the module has been loaded with load-mod you can run:

(usage re::)          ; Get a list of procedures in the re module
(usage re::re-match?) ; Get a procedure definition/description from the re module