Date: Mon, 31 Oct 94 13:59:04 JST From: Stephen Turnbull To: dj AT stealth DOT ctron DOT com Cc: tony AT nt DOT tuwien DOT ac DOT at, djgpp AT sun DOT soe DOT clarkson DOT edu Subject: NULL pointers in (ANSI) string functions [was: strcat() ?] > [Anton.Helm] would like to know what > > strcat(a, b); > > should do when b happens to be NULL. DJ says: It *should* cause a code fault, stack trace, and immediate exit to DOS, since you're accessing memory that is off limits. You won't get this under DPMI though because of the way DPMI memory is set up. Passing NULL to strcat is illegal. I was through this once before, and passing cp to *any* string function, where char *cp = 0, is illegal. I'm not sure that it isn't strictly speaking undefined behavior under ANSI standards, and that implementations *are allowed* to check for NULL pointers and turn them into pointers to `""'. I assume this dangerous interaction of the general rule that "a NULL pointer is allowed anywhere that a legal non-NULL pointer of that type is" as far as the compiler is concerned, and undefined behavior of the string functions when passed NULL pointers is permitted for efficiency (it's hard to imagine a more elegant implementation than char *strcat (char *d, char* s) { register char *td = d, *ts = s; while (*td) td++; while (*td++=*ts++); return d; } especially if provision is made for inlining). If ANSI makes it illegal (not undefined), then the following discussion doesn't belong here (and I apologize), since GCC attempts to provide the option of ANSI conformance. If it's undefined then theoretically DJ could provide more robust libraries, but this is dangerous, as we'll see. The problem is that not everyone will agree, at least in some cases, what should be done with NULL pointers. For s = NULL, the obvious (and inexpensive) answer is "do nothing". I can't see a problem with that, ever. (I'm not very good at predicting bugs though; anybody else see a problem with it? See below for the closely related consideration of associativity, however.) For d = NULL, all of the possibilities are a problem. Returning NULL is easy, but who wants that? Returning s is easy, but since s is declared const *, overwriting the return value (eg, with a later strcat) would be seriously impolite and impossible to find in the programmer's own source; he'd have to look at the implementation in the library. (Note that s is not necessarily known by the compiler to be constant; it may be a variable whose value is unchanged for a greater scope than that of d.) Returning a pointer to a static buffer (a) means that the return value of strcat will sometimes get overwritten (yuck) and (b) can't necessarily hold the return value since it's of fixed size (shades of the DOS command line!) Returning a string allocated on the heap is a serious memory leak, since you don't know whether (in general) a return value from strcat will need to be free'd or not, especially since lazy programmers will take great advantage of initializing all otherwise uninitialized strings to NULL. The argument d to strcat is required to be preallocated enough space to contain the result, again for efficiency reasons, so assigning a pointer to "" to d won't work. Having different rules for different arguments, even if obvious, would still lead to problems. Eg, string concatenation is associative in value (but not in side effects, of course). Why should the programmer have to worry about the difference between strcat(strcat(a,b),c) and strcat(a,strcat(b,c))? The former would be legal when a is non-NULL, the latter would be legal only if a and b are non-NULL. (Some people might consider this to be the problem I wanted above.) And what about assignments in the argument expressions (which might later be used as string arguments), or the various associations possible for strcat(a,strcat(b,strcat(c,d)))? In these cases the programmer must either ensure that all strings are initialized or trap for NULL pointers in her own code. And to give you some idea of just how ugly it could get, suppose the programmer in C++ did class String { private: char *s; String (char *initializer) { s = strdup(initializer); } public: String operator+ (String source) { char *init = malloc(strlen(s)+strlen(source.s)+1); (void) strcat(init,source.s); String newString(init); free(init); return newString; } }; The C and C++ compilers do not define the order in which associative operations take place; the implmentation is permitted to optimize. Oops. I trust I've made my point. +-----------------------------------------------------------------------+ | Stephen Turnbull | | University of Tsukuba, Institute of Socio-Economic Planning | | Tennodai 1-chome 1--1, Tsukuba, Ibaraki 305 JAPAN | | Phone: +81 (298) 53-5091 Fax: +81 (298) 55-3849 | | Email: turnbull AT shako DOT sk DOT tsukuba DOT ac DOT jp | | | | Founder and CEO, Skinny Boy Associates | | Mechanism Design and Social Engineering | | REAL solutions to REAL problems of REAL people in REAL time! REALLY. | | Phone: +81 (298) 56-2703 | +-----------------------------------------------------------------------+