Wildcard
--------
Search strings using wildcard patterns
This command is a further development of my WILD% command. To
differentiate, I have called it WIRX%, based on the name I gave to the
pattern "language".
Wirx Isnt RegeX (but it WIRX!)
Im not proud of the code, but if I were to redo all my stumbling efforts
over the past 35 years, Id be exactly nowhere. Instead we have
Knoware! It is patched ad hoc to comply with evolving needs over a long
period of time. It is presented here so it can be adapted, fixed,
developed, and/or improved - until something better comes along. The result
is here to be USED: for command line operations or in applications.
Usage:
------
true% = WIRX%(pattern$, string$)
where true% is +ve => match, or 0 => no match
pattern$ is a pre-parsed pattern (see below)
string$ is the string to be searched
ppat$ = WPARSE$(pattern$)
where
ppat$ is a parsed version of pattern$
and
pattern$ is a pattern string, more of which, below
The reason for this division of labour is to take the effort of parsing out
of the loop; however many strings you want to scan, the parsing of the
pattern only happens once. Great if you want to scan many lines, eg a
directory tree. For short searches use something like:
PRINT WIRX%(WPARSE$(pattern$), string$)
or, for simple patterns, use WILD% instead.
WIRX syntax:
------------
A wild card pattern is a string containing a mixture characters and wild-
card codes, together with some help codes/commands. A pattern can contain
any ascii code (0..255) and can match (almost) any ascii code, not only
printable ones. Matches may be done on a case-sensitive ("A" = "A") or
case-insensitive mode ("A" = "A" and "A = "a"). The QL foreign character
set is supported with the correct interpretation of case (one hopes). The
following wildcard characters may be used:
'*' will match any number of characters, including none at all
'?' will match any one character (0..255)
'%' will match any one (decimal) digits (ascii 48..57)
'$' will match any one hex digit (always non-case sensitive)
'#' will match one or more decimal digits (ie an integer)
In addition there are some help codes:
'-' is NOT => the result is flipped. Any match returns FALSE while a
mismatch returns TRUE. '-' needs to be the very first character of a
pattern string, ie before '!' (see next). Subsequent use of this
character in the pattern string are considered literal, so no need to
cancel.
'!' as the first pattern character => case sensitive match.
There is no need to cancel subsequent uses on this character
within the same section (See cancel and section, below).
'/' is the cancel character. It converts the characters above, including
itself, to its literal equivalent.
'&' + code inserts the charcter corresponding to the code into the pattern
string: '&65' => 'A', or '&$41' => 'A'.
Note: These charcters will be converted to lowercase in case-
insensitive comparisons!
Note: The current version does not differentiate between codes < 9,
eg chr(5) will match chrs(0..8)
';' splits the pattern into two or more sections, each forming an
alternative pattern. The comparison will return with the first
section that matches, or when the list is exhausted.
Note: you cannot have two consecutive ';'s
No syntax errors are reported, so Rubbish In => Rubbish Out, or RIRO
Examples: (All quotes are in aid of illustration only)
pattern matches
------- -------
"*" anything and everything
"????" any four-character string
"????*" any string with at least four characters
"abc" "abc" or "ABC"
"abc*" any string starting with "abc" or "ABC"
"*ing" any string ending on "ing"
"*test*" "Test", "test", "TESTING", "untested", ..
"fr??nd" Ideal if you cant spell "friend"
"!" matches nothing at all
"!ABC" "ABC" but not "abc"
"!A*" any string starting with capital "A"
"!end!" Matches "end!" but not "END!"
"!end/!" Matches "end!" but not "END!"
"/!end!" Matches "!end!", "!END!", ..
"!/Abc" "Abc" not "ABC" nor "abc" (syntactically incorrect)
"/!" "!"
"+//-" "+/-"
"+/-" "+-"
"Test/?" "test?", "TEST?", "Test?", ..
"%%%%" any four digits (ascii 48..57)
"$$$$" any four hex digits (ascii 48..57, 41..46, 61..66)
"!$$$$" as above
"#" any (positive) integer
"-#" any negative integer
"*#" any string ending on positive or negative integer
"*#*" any string including positive or negative integers
"abc,#" "abc,123", "ABC,1" but not "abc,-1"
"abc,/$$$" "abc,$1F", "ABC,$C0", ..
"!abc,/$$$" "abc,$1F", "abc,$6b", .. but not "ABC,$1F"
"&65bc" "abc", "ABC", etc
"!&$41" "ABC", "Abc" but not "abc"
"ab&993" "abc3",..
"ab&00099" "abc" !
"ab&000990" "abc0" !
"ab&0/99" "'ab',0,'99'" = $6162003939
"ab&$FFF" "'ab',255,'f'" = $6162FF66
"ab&$f/ff" "'ab',15,'ff'" = $61620F6666
"abc&" "abc" ! - & ignored
"a&bc" "abc" ! - & ignored
"&$abc" "Œc" = $AB63 and "œc" = $8B63
"&4" '0' and '1' and '2' .. '8'! This is a feature!
"abc;efg" "abc", "efg"
"abc;" "" (invalid syntax)
"a;b;c" "a" or "b" or "c" or "A" or "B" or "C"
"!abc;efg" "abc" or "efg" or "EFG" but not "ABC"
"abc;!EFG" "abc" or "ABC" or "EFG"
";abc;def" "" or "abc" or "def"..
"abc;def;" "" or "abc" or "def"..
"abc;;def" "abc" or "def" but not ""!
"-abc" matches "cde" and "efg" but not "abc"
"-" not nothing - matches anything bar nul
"-*" doesnt match anything at all!
"-?*" matches anything that is not nothing
"/-#" matches any negative integer
Programming notes:
------------------
For special or "foreign" character case translation needs, simply supply a
different translation table than tab_lower. (Then the reference to
lib_sbu_str in wirx_link can be omitted to avoid confusion.)
Different parts of the utility have different version numbers. The
combined toolkit is now designated:
V0.24, pjw, May 27th 2019
Conditions of use and DISCLAIMER as per Knoware.no
QL Software
