An Introduction to Regular Expressions in C#
In this blog
Regular expressions (also called "regexes") are a powerful way to find pattern matches in text and optionally replace matched text. They provide a more versatile approach than a literal find and replace in text.
I have used regular expressions in C# professionally for over a decade to convert text from one format to another and to extract data from text files. Regexes can accomplish things in one line of code that would require several lines of code in C#.
In this article, I describe the features that I find most useful in the Regex C# class. And I assume you have at least a basic understanding of regex pattern syntax and C# syntax.
Regex class
The Regex C# class provides methods to match text with a regex pattern. There are also methods that accept C# code to replace the matched text, letting you leverage regex and C# logic to perform find and replace operations.
The class is in the .NET library System.Text.RegularExpressions. This library ships with all versions of .NET, so if you have .NET installed on your machine, you can easily use this library in your C# project.
Additionally, the Regex class has static and instance versions of most methods — I focus on the static versions in this article. I use the static versions of the methods most of the time because they are more convenient. (The instance versions require creating a Regex
object first before calling them.)
IsMatch() method
Let's start with the most basic method: IsMatch(). This method checks if a regex pattern string matches anywhere in the given input string and returns true
or false
. Think of this method as a more advanced version of string.Contains().
Here is a simple example that checks if the text
variable contains a lowercase letter:
var text = "test";
var hasLetter = Regex.IsMatch(text, "[a-z]");
In this example, the Boolean variable hasLetter
is set to true
because the regex pattern [a-z]
will match the first lowercase letter in text
: t.
Match() method
The Match() method builds on IsMatch()
by returning a Match
object that contains information about the first text segment that the regex pattern matches. In the Match
object, there is info like the actual matched text segment (Value
property) and the start index of the matched text segment in the input text (Index
property). This method is useful for extracting text from a string.
Here is an example that will get the first lowercase letter from a string:
var text = "Sample text";
var match = Regex.Match(text, "[a-z]");
var firstLowercaseLetter = match.Value;
In this example, the Match()
method uses the regex pattern [a-z]
to match the first lowercase letter in text
: a. The Match()
method returns the Match
object, then firstLowercaseLetter
is assigned to the match text value via the Value
property.
This method only finds the first occurrence that the pattern matches. If you expect a regex pattern to have more than one match in a string, then you can use the Matches() method to get a MatchCollection
(a collection of Match
objects). Here is an example:
var text = "Sample text";
var matches = Regex.Matches(text, "[a-z]");
foreach (Match match in matches)
{
var letter = match.Value;
// Do something with letter string
}
In this example, the Matches()
method finds every occurrence of a lowercase letter in text
and returns a MatchCollection
representing the matches. Next, the variable matches
is assigned to the MatchCollection
and each match is used by iterating over matches
in the foreach
loop.
Replace() method
The Replace() method takes functionality to the next level. You can use this method to match text like with the Match()
method above, but then you can use a replacement string to replace the matched text. The method returns a copy of the input text modified with the replaced text in place of the matched text segments.
The static version of this method replaces all match occurrences; the instance version allows you to set a limit to how many occurrences are replaced. In my experience, I have almost always wanted to replace all occurrences.
Here is an example that finds a number surrounded by an HTML bold tag and replaces it with an italic tag:
var text = "<b>1.</b> First point\n<b>2.</b> Second point";
text = Regex.Replace(text, @"<b>(\d+)\.</b>", "<i>$1.</i>");
In this example, the Replace()
method searches text
for all occurrences of bold tags with a number and period in them. Then it replaces each occurrence with an italic tag, leaving the number and period intact, as defined by the replacement string. The variable text
is updated to be:
"<i>1.</i> First point\n<i>2.</i> Second point"
Notice that the match pattern uses the capture group (\d+)
to store the number in a variable that the replacement string references via $1
. Parentheses in a regex pattern are used to indicate a capture group that can be referenced via a variable in the replacement string. The first capture group in the pattern is referenced with the variable $1
, the second with $2
, etc. In this example, using the capture group is required to retain the number in the replacement string.
Also, the regex pattern is a C# verbatim string defined by the @
character immediately before the beginning quote of the string. The C# verbatim string eliminates the need to perform a C# escape on each backslash (\) character (accomplished using another backslash), making the regex pattern shorter and easier to read.
Evaluated search and replace
An overload of the Replace() method accepts the MatchEvaluator
parameter, instead of a replacement string, as in the previous example. MatchEvaluator
is a delegate method called for each match occurrence to perform the replacement. This enables you to leverage the power of regular expressions without pushing regular expressions too far by creating an overly complicated pattern. In other words, you can utilize the best of regular expressions and the best of C#.
Building on the previous example, suppose you want to replace the matched number with the spelled out number, like replacing 1 with One. Here is how you can use a MatchEvaluator
to process the matched text with C# code:
var dictionary = new Dictionary<int, string>
{
{ 1, "One" },
{ 2, "Two" }
};
var text = "<b>1.</b> First point\n<b>2.</b> Second point";
text = Regex.Replace(text, @"<b>(\d+)\.</b>", (Match m) =>
{
var number = int.Parse(m.Groups[1].Value);
var numberAsWord = dictionary.GetValueOrDefault(number);
return $"<i>{numberAsWord}.</i>";
});
In this example, an anonymous function is passed in as the MatchEvaluator
parameter of the Replace()
method. In the anonymous function, the Match
parameter m
is used to obtain the matched number from the first capture group. Next, the number text value is parsed as an integer value and the integer value is used to look up the spelling of the number in the dictionary. Ultimately, the anonymous function returns the replacement string which includes the spelling of the number.
I want to unpack capture groups in the Match
class. The Match
class has a Groups
property that is a collection of Group
objects, one for each capture group in the match. Capture groups are accessed via an index that corresponds to their order in the regex pattern. So the first capture group would be accessed with an index of 1, the second with 2, etc. Index 0 in the Groups
collection is different — the value represents the entire match text (equivalent to the Match.Value
property). In this example, the first capture group is accessed using m.Groups[1].Value
.
Escape() method
The Escape() method will escape metacharacters in a string so they are treated as literal characters in the regular expression pattern. Examples of metacharacters are the dot (.) and star (*) characters. It's particularly valuable when you're composing a regex pattern dynamically at run time, as opposed to a constant string pattern as in all the previous code examples. When you're dynamically composing a regex pattern, you could be including metacharacters that you would want to match as literal characters. Escaping metacharacters in the text prevents creating an invalid regex pattern or a regex pattern that won't match what you expect it to match.
Suppose you need to get text from a file and then search for the line that text is on in a string:
var textFromFile = File.ReadAllText("file.txt");
var pattern = $".+{Regex.Escape(textFromFile)}.+";
var match = Regex.Match(text, pattern);
In this example, text from the file is loaded into the textFromFile
variable. Next textFromFile
is used to build a pattern to get a line with the text in it — first escaping the textFromFile
value. If a metacharacter such as the dot (.) character happens to be in textFromFile
, it will be modified to "\." by the Escape()
method.
RegexOptions Enum
All of the methods described above (with the exception of the Escape() method) have an overload that accepts a RegexOptions enumerated type parameter that lets you set various regex options. These are the options that I find most valuable:
IgnoreCase
: This option makes the pattern matching case-insensitive. (Pattern matching is case-sensitive by default.)SingleLine
: This option changes the dot metacharacter (.) in the pattern string to match every character, including the newline character. This is useful when a pattern match will need to span across multiple lines in the input text.Compiled
: This option compiles the regular expression to the assembly so that the regex will execute faster. I've used this sparingly. But you may want to use this option when the input text you're searching is very large, or each match could span multiple lines in the input text. If you think you need to use this option, you may want to first check that your regex pattern is written correctly. Or you may need to split up the input text into smaller pieces so you can run your regex on smaller pieces of text.
This enumerated type lets you set more than one option at a time. It is designed as a set of flags (i.e. a bit field). Here is an example of how to set more than one option:
var text = "<b>1.</b> First point\n<b>2.</b> Second point";
text = Regex.Replace(text, @"<b>(\d+)\.</b>", "<i>$1.</i>",
RegexOptions.Compiled | RegexOptions.IgnoreCase);
Resources
For more information on the Regex C# class, check out this documentation.
For a convenient way to develop and test a regular expression pattern, go to RegExr. RegExr will analyze your regular expression and explain what each element in the regex matches. It also has very helpful regex reference documentation.
In this article, I described the features in the Regex C# class that I find the most valuable. I also described how to use regexes and C# code together to perform more complex find and replace tasks. I hope this article helps you get started in using regular expressions in C#.