Splitting Strings with Regex in Managed C++ Applications

Tuesday Feb 22nd 2005 by Tom Archer

Learn how to use this simple yet useful feature of the Regex class to delimit strings in your Managed C++ applications.

People who are familiar with regular expressions tend to think of them only in conjunction with searching a string for specific literals or patterns (such as email address formats). However, one very nice feature that they often overlook is the ability to split strings into substrings based on defined delimiters or tokens.

Before I started using regular expressions, I—like many programmers who started in C—used the strtok function to delimit strings. The following is an example of using the strtok function to split a comma-delimited string into its various tokens:

void strtoktest(){  char input[] = _T("Tom,Archer,Programmer/Trainer,CodeGuru");  char delimiters[] = _T(",");  for (char* token = strtok(input, delimiters);       token != NULL;       token = strtok(NULL, delimiters))  {      Console::WriteLine(token);  }}
The output is as follows:
As you can see, strtok is pretty basic and easy to use. In fact, the strtok function is at the heart of a popular comma-delimited file class that I use much more than one would assume in this day and age. However, let's face facts. The function really is a hold-over from the C day; its not object-oriented and certainly not very intuitive(I've used it for years and have to look up the syntax every single time I need it). A more modern approach is using the .NET Regex class and its Split method. Using the same input as the previous example, here's how you would delimit the same string input using the Regex class:
using namespace System::Text::RegularExpressions;...void regextest(){  String* input = _T("Tom,Archer,Programmer/Trainer,CodeGuru");  Regex* regex = new Regex(S",");  String* tokens[] = regex->Split(input);  for (int i = 0; i < tokens->Length; i++)  {    Console::WriteLine(tokens[i]);  }}
As you can see, it only takes two lines of code: instantiating a Regex object (passing the delimiter list) and calling the Split method. One more advantage of using Regex::Split instead of strtok is that the result of the single call to Split is an array of all the strings (tokens). Obviously, it's not that much work to write the code to stuff the strings into an array yourself, but this is just one less step if your delimiting function is being called by another function that needs all of the strings returned in an array.

Looking Ahead

For various reasons—probably inertia more than any other—I never really got into using regular expressions until I started programming with .NET back in 2000. However, regular expressions really do make a lot of basic chores so much easier. I sometimes kick myself for not having used them much sooner. In future articles, I'll cover more aspects of the Regex class, such as using the Match and MatchCollection classes, how to properly use captures and groups, and searching for complex patters such as email addresses.

About the Author

Tom Archer owns his own training company, Archer Consulting Group, which specializes in educating and mentoring .NET programmers and providing project management consulting. If you would like to find out how the Archer Consulting Group can help you reduce development costs, get your software to market faster, and increase product revenue, contact Tom through his Web site.

Mobile Site | Full Site
Copyright 2017 © QuinStreet Inc. All Rights Reserved