From: kdh5j AT weyl DOT math DOT virginia DOT edu (Kirk Hilliard) Subject: Bison for NT 18 May 1997 08:46:01 -0700 Sender: mail AT cygnus DOT com Approved: cygnus DOT gnu-win32 AT cygnus DOT com Distribution: cygnus Message-ID: <199705181456.KAA29831.cygnus.gnu-win32@weyl.math.Virginia.EDU> X-Mailer: Mail User's Shell (7.2.5 10/14/92) Original-To: gnu-win32 AT cygnus DOT com Original-Sender: owner-gnu-win32 AT cygnus DOT com A parser generated by the cygwin-32 byacc behaves differently than ones generated by bison or the various vendor supplied yaccs on 5 UNIX platforms (IBM AIX, HP HP-UX, SGI IRIX, DEC OSF1, and SUN SunOS). With my grammar, the byacc generated parser reduces a rule later than the other parsers, reporting a parse error instead of exiting via a YYACCEPT action. Has anyone here ported bison to the NT? If so, I would appreciate the details. When successful, I will post instructions and also put them up on a web site. While I would be happy to use the cygwin-32 package to build bison, I need to be able to compile the parsers it generates using MSVC. Kirk Hilliard - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - The discussion that follows is supplied for those from Missouri (geographically or philosophically) who, having read my message, would like a concrete example. My grammar is for a program which intercepts statements from an enhanced SQL language being sent to a data server, passes the standard SQL statements through, and acts on the enhanced statements. Here is a simplified grammar: ------------------------------ 8< ------------------------------ sql: set_command | update_command ; set_command: 'S' { YYACCEPT; } | 'S' 'A' 'X' ; update_command: 'U' | 'U' 'T' 'S' 'G' ; ------------------------------ 8< ------------------------------ (I have included an entire .y file using this grammar at the end of this message. In it, each rule has an additional action which prints out a message when it is reduced.) Here, my desire is to act on the special statement "S A X", but to pass through all other statements that start with 'S' and are followed by any token other than an 'A'. When the parser is generated by bison I get the desired behavior: SAX token: S token: A token: X reduced: set_command <-> 'S' 'A' 'X' reduced: sql <-> set_command yyparse returned 0 SAY token: S token: A token: Y parse error yyparse returned 1 SQ token: S token: Q reduced: set_command <-> 'S' YYACCEPT yyparse returned 0 When the parser is generated by the cygwin-32 byacc, however, I get this behavior: SQ token: S token: Q parse error yyparse returned 1 The yacc generated parser is checking if the 'Q' token can possibly fit anywhere in the grammar before reducing the 'S'. This could be considered a feature since it will catch some syntax errors before reducing rules and taking actions where the bison generated parser will reduce the rules and take actions before deciding that there is a syntax error. First the bison generated parser: UQ token: U token: Q reduced: update_command <-> 'U' reduced: sql <-> update_command parse error yyparse returned 1 And then the byacc generated parser: UQ token: U token: Q parse error yyparse returned 1 Note that the behavior of the bison generated parser is identical to those generated by the vendor supplied yacc on the 5 UNIX platforms mentioned above. (The only other NT yacc I have tried is PCYACC on a demo disk from ABRAXAS, and it generates a parser from my simplified grammar which behaves the same as bison, but it chokes on my actual grammar (too many ) and there is no indication that as a demo it is crippled in this respect. Even if it could handle my complete grammar, I can't see spending US$995 for a yacc!) Not a great deal is said about the use of YYACCEPT in the books on yacc that I have read. ORA's "lex and yacc" says, "Since the parser may have a one-token lookahead, the rule action containing the YYACCEPT may not be reduced until the parser has read another token." This makes sense (even bison does it) but I don't want the parser jumping to conclusions about a parse error before I get a chance for an early return. One solution would be for me to add rules for all 'S' commands from the standard grammar, but I would rather not have to do this. I realize that I don't really need to add the entire grammar for these commands, just the token following the 'S' such as: set_command: 'S' pass_through_set_clause { YYACCEPT; } | 'S' 'A' 'X' ; pass_through_set_clause: 'Q' | 'W' | 'E' ; Now both "S Q" and "S W J" will work, even on the byacc generated parser, but since bison and the other yaccs give me what I want, I would rather simply get bison running on the NT. Here is a complete .y file containing my simple grammar ------------------------------ 8< ------------------------------ %{ #include int yylex(); int yyerror( char *s ); %} %% sql: set_command { printf("reduced: sql <-> set_command\n"); } | update_command { printf("reduced: sql <-> update_command\n"); } ; set_command: 'S' { printf("reduced: set_command <-> 'S'\n"); printf(" YYACCEPT\n"); YYACCEPT; } | 'S' 'A' 'X' { printf("reduced: set_command <-> 'S' 'A' 'X'\n"); } ; update_command: 'U' { printf("reduced: update_command <-> 'U'\n"); } | 'U' 'T' 'S' 'G' { printf("reduced: update_command <-> 'U' 'T' 'S' 'G'\n"); } ; %% int yylex () { int c; while ( ( c = getchar() ) == ' ' || c == '\t' ) ; if ( c == '\n' ) { ungetc( c, stdin ); /* so main knows we made it to end of line */ return 0; } if ( c == EOF ) return 0; printf( "token: %c\n", c ); return c; } /* yylex */ int yyerror( char *s ) { printf ("%s\n", s); return 0; } /* yyerror */ int main() { int c; while (1) { if( ( c = getchar() ) == EOF ) break; ungetc( c, stdin ); printf( "yyparse returned %d\n", yyparse() ); while ( ( c = getchar() ) != '\n' && c != EOF ) ; /* in case yyparse() returned midline */ } return 0; } /* main */ ------------------------------ 8< ------------------------------ - For help on using this list (especially unsubscribing), send a message to "gnu-win32-request AT cygnus DOT com" with one line of text: "help".