From: herbs@cntc.com (Herb Sutter) Subject: Guru of the Week #29: Solution Date: 22 Jan 1998 00:00:00 GMT Message-ID: <6a8q26$9qa@netlab.cs.rpi.edu> Newsgroups: comp.lang.c++.moderated .--------------------------------------------------------------------. | Guru of the Week problems and solutions are posted regularly on | | news:comp.lang.c++.moderated. For past problems and solutions | | see the GotW archive at http://www.cntc.com. | | Is there a topic you'd like to see covered? mailto:herbs@cntc.com | `--------------------------------------------------------------------' _______________________________________________________ GotW #29: Strings Difficulty: 7 / 10 _______________________________________________________ >Write a ci_string class which is identical to the >standard 'string' class, but is case-insensitive in the >same way as the C function stricmp(): The "how can I make a case-insensitive string?" question is so common that it probably deserves its own FAQ -- hence this issue of GotW. Note 1: The stricmp() case-insensitive string comparison function is not part of the C standard, but it is a common extension on many C compilers. Note 2: What "case insensitive" actually means depends entirely on your application and language. For example, many languages do not have "cases" at all, and for languages that do you have to decide whether you want accented characters to compare equal to unaccented characters, and so on. This GotW provides guidance on how to implement case-insensitivity for standard strings in whatever sense applies to your particular situation. Here's what we want to achieve: > ci_string s( "AbCdE" ); > > // case insensitive > assert( s == "abcde" ); > assert( s == "ABCDE" ); > > // still case-preserving, of course > assert( strcmp( s.c_str(), "AbCdE" ) == 0 ); > assert( strcmp( s.c_str(), "abcde" ) != 0 ); The key here is to understand what a "string" actually is in standard C++. If you look in your trusty string header, you'll see something like this: typedef basic_string string; So string isn't really a class... it's a typedef of a template. In turn, the basic_string<> template is declared as follows, in all its glory: template, class Allocator = allocator > class basic_string; So "string" really means "basic_string, allocator >". We don't need to worry about the allocator part, but the key here is the char_traits part because char_traits defines how characters interact and compare(!). basic_string supplies useful comparison functions that let you compare whether a string is equal to another, less than another, and so on. These string comparisons functions are built on top of character comparison functions supplied in the char_traits template. In particular, the char_traits template supplies character comparison functions named eq(), ne(), and lt() for equality, inequality, and less-than comparisons, and compare() and find() functions to compare and search sequences of characters. If we want these to behave differently, all we have to do is provide a different char_traits template! Here's the easiest way: struct ci_char_traits : public char_traits // just inherit all the other functions // that we don't need to override { static bool eq( char c1, char c2 ) { return tolower(c1) == tolower(c2); } static bool ne( char c1, char c2 ) { return tolower(c1) != tolower(c2); } static bool lt( char c1, char c2 ) { return tolower(c1) < tolower(c2); } static int compare( const char* s1, const char* s2, size_t n ) { return strnicmp( s1, s2, n ); // if available on your compiler, // otherwise you can roll your own } static const char* find( const char* s, int n, char a ) { while( n-- > 0 && tolower(*s) != tolower(a) ) { ++s; } return s; } }; And finally, the key that brings it all together: typedef basic_string ci_string; All we've done is created a typedef named "ci_string" which operates exactly like the standard "string", except that it uses ci_char_traits instead of char_traits to get its character comparison rules. Since we've handily made the ci_char_traits rules case-insensitive, we've made ci_string itself case-insensitive without any further surgery -- that is, we have a case-insensitive string without having touched basic_string at all! This GotW should give you a flavour for how the basic_string template works and how flexible it is in practice. If you want different comparisons than the ones stricmp() and tolower() give you, just replace the five functions shown above with your own code that performs character comparisons the way that's appropriate in your particular application. Exercise for the reader: Is it safe to inherit ci_char_traits from char_traits this way? Why or why not?