-DiskState-

Regular Expressions

DiskState uses regular expressions (regexp or regex as it also called). These expressions may seem pretty difficult at first, but they shall prove themselves highly powerful once you learn to use them. Regular expressions are commonly used in the Perl, Tcl language (from the Unix world) and Python.

We will describe the special characters of regexp and conclude with a few examples.

New to this?

If you are not used to regular expression, the first thing you will probably notice is the use of .* instead of the normal *. The dot before the asterisk means that it will match any character. Combined with the asterisk gives match any character zero or more times. With the power of regular expressions, one could use s* to match zero or more of the letter s. It might be a bit unusual at first, but regular expressions are very powerful.

There might be tutorials on regular expressions on the internet. Do a search for it on the Internet to learn more.

Want to play with DiskState's Regular Expression Tester?

You can test regular expressions on an arbitrary string by bringing up the regular expressions help window from within DiskState. This is especially helpful if you are new to the wonderful world of regular expressions (regex). Before you know it, you could be expanding the DiskClean rules with your

The quickest way to bring up the Regex Tester is by right-clicking on the DiskState tray icon. You can also find the Regex Tester from right-clicking in the DiskState main window and select the menu item "Regular expressions..." and click on the "Regular Expression Tester" button in the left corner of the next help window.

OS Specific folders and environment variables

OS specific folders and system environment variables can be expanded in regular expressions. E.g. an OS specific folder is "%!Local Appdata%\\MyCompany" without the quotes. An environment variable is given by e.g. "%APPDATA%\\MyCompany" without the quotes.

Regular Expressions

Characters Description

 ^
Matches beginning of string. For example, "^F", will match an "F" only at the beginning of the string.

 ^
A caret "^" immediately following a left-bracket ( [ ) excludes the remaining characters within the brackets. [^0-9] matches non-digits only.

 $
Dollar sign ($) matches the end of the string. "abc$" will match sub-string "abc" only if it is at the end of the string.

 |
Alternation character ( | ) allows either expression on its side to match target string. "a|b" will match "a" as well as "b".

 .
The dot ( . ) matches any character.

 *
Asterisk (*) means that the character to the left should match 0 or more times.

 +
Similar as asterisk (*), but requires at least one match (1 or more times).

 ?
Question mark (?) matches the left character 0 or 1 times.

 [ ]
Brackets enclosing a set of characters indicates that any of the enclosed characters may match the target character. We call this a character set.

 \
Indicates special character, like the "." with "\.", "?" as "\?", brackets etc. It is a good rule to escape special regular expression characters, also known as metacharacters. Example: a+b where '+' should match the ASCII character '+', we escape the '+' like a\+b. Otherwise, it would have matches one or more 'a' characters followed immediately by one 'b' character rather than "a+b=c".

 \d
Matches a single character that is a digit.

 \w
Matches a single character that is a word character (alphanumeric characters including that underscore _ character).

 \s
Matches a white space character.

 d(?!s)
A negative lookahead. Matches a 'd' not followed by an 's'. You can use any literals here instead of 'd' and 's' as shown in this example. Useful if you want to match something not followed by something else. For instance, it will match "dt_file" but not "ds_file".

 d(?=s)
A positive lookahead. Matches 'd' followed by an 's'. You can use any literals here instead of 'd' and 's'. Useful for matching something A followed by something B without making that something B part of the match. For instance, it will match "ds_file" but not "dt_file".

 (?<!a)b
A negative lookbehind. Matches a 'b' that is not preceded by an 'a'. You can use any literals here instead of the 'a' and 'b' shown in the example. It works the same way as the lookahead, just backwards. Internally, the regexp implementation will step backwards in the string to check if the text inside the lookbehind can not be matched. If this is true, we got a hit. Example: it will match debtfile.txt but not cabfile.txt.

 (?<a)b
A positive lookbehind. Matches a 'b' that is preceded by an 'a'. You can use any literals here instead of the 'a' and 'b' shown in the example. Working backwards looking and upon seeing b here, the engine takes a step back in the string to check for a match. If the text inside the lookbehind can be found, we have a hit. Example: it will match cabfile.txt but not debtfile.txt.

 (?# comment)s
A comment by itself. The comment is just for explaining the expression in more detail and does not effect the matching. A comment is thus enclosed by a (# and ends with the next closing parenthesis.

Popular regular expressions in DiskState

Regular Expression Meaning
  
.*backup.* Finds any text with the substring backup in it. Example: SrcBackupFile.tmp
  
File[0-9]+\.tmp$ Looks for any filename beginning with the four letters File, followed by at least any number, and ending with the file extension tmp.
  
.*\.exe$ The very analogous *.exe searches one can do. Note the backslash to avoid the special character dot, as well as the $-sign at the end to match end of string (extensions are always at the end).
  
.*gr[ae]ydoc\.txt$ To match an 'a' or an 'e' for gray and grey, we use the character sets. The order inside the set does not matter. E.g. gr[ae]y and gr[ea]y will match gray, grey but not graey for instance. So this example will match files like mygraydocfile.txt or graydoc.txt.
  
Look above for examples of lookahead and lookbehind.
  
.*program\s*files.* Will match any string that includes "program" followed by one or more white spaces followed by "files" and then any characters. For instance "my program     file store".