Fatal error start symbol start does not derive any sentence - Исправление ошибок и поиск оптимальных решений проблем

Iam trying to develop a program for a calculator using lex and yacc. I keeping getting the following error:

calc.y: warning: 5 nonterminals useless in grammar [-Wother]
calc.y: warning: 8 rules useless in grammar [-Wother]
calc.y:8.1: fatal error: start symbol I does not derive any sentence
 I : E 'n' {printf("%dn",$1);}

I looked at similiar problems but they had infinite recursions but this one doesn’t have .

calc.l

%{
#include"y.tab.h"
%}

digits [0-9]*

%%
{digits} {return DIGITS}
%%

int yywrap()
{
}

calc.y

%{
    #include<stdio.h>   
%}

%token DIGITS

%%
I : E 'n' {printf("%dn",$1);}
  ;
E : E '+' F {$$ = $1 + $3;}
  | E '-' F {$$ = $1 - $3;}
  ;
F : F '*' G {$$ = $1 * $3;}
  | F '/' G {$$ = $1 / $3;}
G :'('E')' 
  | DIGITS 
  ;
%%

int main()
{
    yyparse();
}
int yyerror()
{
}

asked Jan 9, 2017 at 7:53

I don’t know yacc, but:

To build an I, you need an E:
```
I : E 'n'
```
To build an E, you need an E:
```
E : E '+' F
  | E '-' F
```

Since there is no way to build an E if you don’t already have one (and in the beginning you don’t have anything), there is no way to build an I either.

Or looking at it from the other side: E is infinitely recursive because it always refers back to itself.

If we start with the lexer, we get DIGITS first.

DIGITS can be used to build a G.

But there’s nothing we can do with a G because the only rules that use it (F '*' G and F '/' G) also require an F to proceed, and we don’t have an F. So we’re stuck.

answered Jan 9, 2017 at 7:59

melpomenemelpomene

83.2k8 gold badges81 silver badges145 bronze badges

Источник

I am trying to develop a mini C compiler, and here is the code of the yacc program , kindly assist me where am I wrong and how do I improvise it.

The grammar must be able to recognize : The C program structure ie the preprocessor directives, the main function , the variable declarations, expressions, while loop , if and if else statements.

    %{
    #include<stdio.h>
    #include<stdlib.h>
    #include<string.h>
    
    void yyerror(char* s);
    int yylex();
    extern int yylineno;
    %}
    
    %token T_INT T_DOUBLE T_FLOAT T_CHAR T_WHILE T_IF T_ELSE T_DO T_INCLUDE T_MAIN T_STRLITERAL T_EQCOMP T_NOTEQUAL T_GREATEREQ T_LESSEREQ T_NUM T_HEADER T_ID
    
    %start START
    
    %%
    START   :   PROG {printf("Valid syntaxn"); YYACCEPT;}
        ;
    PROG    :   T_INCLUDE'<'T_HEADER'>'PROG
        |   MAIN PROG
        |   DECLR';'PROG
        |   ASSGN';'PROG
        ;
    DECLR   :   TYPE LISTVAR
        ;
    LISTVAR :   LISTVAR','T_ID
        |   T_ID
        ;
    TYPE    :   T_INT
        |   T_FLOAT
        |   T_DOUBLE
        |   T_CHAR
        ;
    ASSGN   :   T_ID'='EXPR
        ;
    EXPR    :   EXPR REL_OP E
        |   E
        ;
    REL_OP  :   T_LESSEREQ
        |   T_GREATEREQ
        |   '<'
        |   '>'
        |   T_EQCOMP
        |   T_NOTEQUAL
        ;
    E   :   E'+'T
        |   E'-'T
        |   T
        ;
    T   :   T'*'F
        |   T'/'F
        |   F
        ;
    F   :   '('EXPR')'
        |   T_ID
        |   T_NUM
        ;
    
    MAIN    :   TYPE T_MAIN'('EMPTY_LISTVAR')''{'STMT'}'
            ;
    EMPTY_LISTVAR   :   LISTVAR
            |
            ;
    STMT    :   STMT_NO_BLOCK STMT
        |   BLOCK STMT
        |
        ;
    
    
    %nonassoc T_IFX;
    %nonassoc T_ELSE;
    
    STMT_NO_BLOCK   :   DECLR';'
            |   ASSGN';'
            |   T_IF COND STMT %prec T_IFX 
            |   T_IF COND STMT T_ELSE STMT
            |   WHILE
            ;
    
    BLOCK   :   '{'STMT'}';
    WHILE   :   T_WHILE'('COND')' WHILE_2;
    COND    :   EXPR
        |   ASSGN
        ;
    WHILE_2 :   '{'STMT'}'
        |';'
        ;
    %%
    void yyerror(char *s)
    {
        printf("Error : %s at %dn",s,yylineno);
    }
    
    main(int argc,char* argv[])
    {
        yyparse();
        return 0;
    }

These are the errors I am getting :

48 rules never reduced
parser.y: warning: 20 useless nonterminals and 48 useless rules
parser.y:13.8-12: fatal error: start symbol START does not derive any sentence

Every production for PROG is recursive. So the recursion can never end, and PROG cannot derive a finite sentence.

Since the only production for START is PROG, that also applies to START.

Источник

13 Years Ago

I have attached the latest listing of the program in question. It is actually parse.y but had to be uploaded as parse.txt. When I run
Bison now, then the messages I get from that are…
parse.y: warning: 8 useless nonterminals and 30 useless rules
parse.y:52.1-7: fatal error: start symbol grammar does not derive any
sentence. The 2nd message is the most crucial at the moment.

I believe ( could be wrong) that The particular code it refers to is…
%%
grammar: domains
constants
predicates
;

My intention is that it reads a set of domains…
typically each one is to a line of a format…
D0 11 0 10 // domainName, number_of_values, floor_value, ceil_value
…
Then reads a set of constants…
typically each one to a line of a format…
X0 D0 9

Then reads and processes a set of predicates, each one to a line, which
may appear as…
or(lt(mul(X0,X1),X2),ge(abs(sub(X3,X4)),X5))

However, I have no precedent or pro-forma for such as ‘grammar’ above. I
think that the domains constants predicates sequence I’ve written will
read all the
domains, then all the constants then all the predicates without allowing
any regression through any of them (eg a domain in the constants).

That the start symbol ‘grammar’ cannot derive any sentence only suggests
to me that the grammar high level nonTerminal cannot access it’s
subsidiary terminals and non-terminals, but the exact cause is not yet
clear to me.

The classic example such as p75 of lex & yacc…[1995 Levine et al]
statement_list: statement ‘n’
statement_list statement ‘n’
;
…doesn’t shed much light on my codes attached below, where I must read a
group of domains, followed by a group of constants, followed by a group of
predicates.

13 Years Ago

I’m not exactly sure what the error means, but I believe you mean

grammar:  
	  domains	
	  constants 	
	  predicates
	;

grammar:  
	  domains	
	 | grammar constants 	
	 | grammar predicates
	;

depending upon if you want just one set of constants followed by one set of predicates, or just one set of domains followed by any number of constants and predicates.

13 Years Ago

I’m not exactly sure what the error means, but I believe you mean
grammar:  
	  domains	
	  constants 	
	  predicates
	;
or
grammar:  
	  domains	
	 | grammar constants 	
	 | grammar predicates
	;
depending upon if you want just one set of constants followed by one set of predicates, or just one set of domains followed by any number of constants and predicates.

Thanks for answering Marc,
Firstly a list of uninterrupted domain data are read,
then a list of uninterrupted constants are read,
then a list of uninterrupted predicates are read,
the formats for each above are as described in the initial message.
None of the domains or constants or predicates can appear out of place in the described sequence described in this message above.

I understand the Yacc code you wrote, but am surprised that the Bison code described after could be different. The Yacc code FORCES sequence, the Bison code would appear to not do so because of the |. It looks to me as though the Bison code could be any sequence of- all domains or all constants or all predicates with no forced sequence.

The requirement I am trying to fulfill is just one set of domains, then THEY’RE finished, then one set of constants, then THEY’RE finished, then the set of predicates, at completion of which is the end of the program.

Thanks, Joe, Cork, Ireland

13 Years Ago

The yacc/bison language varianced was just testing if daniweb put any syntax highlighting for either language.

Yes, according to your requirements you want the «yacc» one.

Reply to this topic

Be a part of the DaniWeb community

We’re a friendly, industry-focused community of developers, IT pros, digital marketers,
and technology enthusiasts meeting, networking, learning, and sharing knowledge.

Источник

%token ENTIER REEL VECTOR INTEGER FLOAT CONST READ DISPLAY IF ELSE FOR END AND OR NOT G L GE LE EQ DI ACOUV AT
%token ACFER AFFEC PLUS MOIN MUL DIV CROOUV CROFER PAROUV PARFER SEP VERG PVERG DEUXPT ESPACE ID CAR CHCAR STRING
%start S
%%
S: ID ACOUV DEC CODE ACFER;
J'ai ce message qui apparait lorse que je fait bison -d prog.y :
fatal error: start symbol S does not derive any sentence

bison -d on your input gives me:
test.y:6.17-20: symbol CODE is used, but is not defined as a token and has no rules
test.y:6.13-15: symbol DEC is used, but is not defined as a token and has no rules
which tells you exactly what the problem is -- you're using CODE and DEC and not defining them. Add them to one of the %token lines and it works fine...
edit
The error 'start symbol S does not derive any sentence' is telling you that you have unbounded recursion in your grammar, so that no (finite) input can match the start symbol. In you case, S must include a CODE, which must include a command (directly or via a listcommand), which must contain a boucle, which must in turn contain another listcommand. So you end up with an infinite expansion chain of listcommands -> command -> boucle -> listcommands.
The problem is probably your rule
listcommands: command | listcommands ;
which matches exactly one command, plus a useless (and ambiguous) unbounded noop-expansion of that command. You probably want
listcommands: /*epsilon*/ | listcommands command ;
which matches 0 or more commands in sequence. Making this change fixes the fatal error, but still leaves a bunch of shift-reduce conflicts, along with the useless rule dectype: dectype.
To track down and fix shift/reduce conflicts, use bison -v to produce a .output file listing the grammar, states and conflicts in detail. Most of yours come from not having a precedence for NOT, and the other two come from the ambiguous dectype and CODE rules.

this is my code :
%{#include<stdio.h>
extern FILE* yyin;
extern tab;
%}
%union
{char *name;
int num;
char car;
}
%token ENTIER REEL VECTOR INTEGER FLOAT CONST CHAR READ DISPLAY IF ELSE FOR END AND OR NOT G L GE LE EQ DI ACOUV AT
%token ACFER AFFEC PLUS MOIN MUL DIV CROOUV CROFER PAROUV PARFER SEP VERG PVERG DEUXPT ESPACE ID CAR CHCAR STRING
%left OR
%left AND
%left G GE EQ DI LE L
%left PLUS MOIN
%left MUL DIV
%start S
%%
S: ID ACOUV DEC CODE ACFER {printf("Chaine correcte !");YYACCEPT;};
DEC: ACOUV dectype ACFER;
dectype: dectype | type DEUXPT listidf PVERG | VECTOR DEUXPT type DEUXPT ID CROOUV ENTIER VERG ENTIER CROFER PVERG;
listidf: ID SEP listidf | ID;
type: INTEGER STRING CHAR FLOAT CONST ;
CODE:ACOUV command ACFER | ACOUV listcommands ACFER;
listcommands: command | listcommands;
command: affichage lecture affectation condition boucle;
affichage: DISPLAY PAROUV CHCAR DEUXPT ID PARFER PVERG;
lecture: READ PAROUV CHCAR DEUXPT AT ID PARFER PVERG;
affectation: ID AFFEC expression PVERG;
expression: expression1 | expressioncond | ID |CONST | PAROUV expression PARFER;
expressioncond: expression G expression | expression L expression | expression GE expression | expression EQ expression | expression LE expression | expression DI expression |NOT expression | expression OR expression | expression AND expression;
expression1: expression MOIN expression | expression PLUS expression | expression DIV expression | expression MUL expression ;
condition: IF PAROUV expressioncond PARFER listcommands ELSE listcommands END | IF PAROUV expressioncond PARFER listcommands END | IF PAROUV expressioncond PARFER condition END;
boucle: FOR PAROUV ID DEUXPT ENTIER conditiondarret PARFER listcommands END;
conditiondarret:ID;
%%
int yyerror(char* msg)
{printf("%s",msg);
return 1;}
int main(int argc,char *argv[]) {
yyin=fopen(argv[1],"r");
yypase();
affiche();
return 0;
}

Related

Bison shift/reduce conflicts

I'm trying describe this syntax on Bison input file:
https://courses.engr.illinois.edu/cs421/sp2011/mps/mp2/minijavasyntax.pdf
this is my input:
%start Program
%token KW_CLASS KW_EXTENDS KW_PUBLIC KW_STATIC KW_BOOLEAN KW_STRING KW_FLOAT KW_INT EOF
%token KW_IF KW_WHILE KW_BREAK KW_CONTINUE KW_SWITCH KW_CASE KW_DEFAULT KW_RETURN
%token KW_NEW KW_THIS KW_NULL KW_TRUE KW_FALSE KW_PRINTLN
%token IDENT INT_LITERAL FLOAT_LITERAL STRING_LITERAL
%nonassoc "THEN"
%nonassoc KW_ELSE
%right "STATEMENTS"
%right OP_ASSIGN
%left OP_OR
%left OP_AND
%nonassoc CMP_EQ CMP_NEQ
%nonassoc CMP_GT CMP_LT CMP_GTE CMP_LTE
%left OP_ADD OP_MINUS
%left OP_MULT OP_DIV OP_MOD
%right OP_NOT OP_UNARY "NEW"
%left "FUNCALL" "SUBSCRIPT" '.'
%nonassoc '('
%nonassoc ')'
%%
Program: ClassDeclp EOF
;
ClassDeclp: ClassDecl
| ClassDeclp ClassDecl
;
ClassDecl: KW_CLASS IDENT ExtendsFrom
'{' VarDecls MethodDecls '}'
;
ExtendsFrom: /*empty*/
| KW_EXTENDS IDENT
;
VarDecls: /*empty*/
| VarDecls VarDecl
;
VarDecl: Type IDENT ';'
| KW_STATIC Type IDENT ';' /*Co the sua thanh AcessModifier Type IDENT*/
;
MethodDecls: /*empty*/
| MethodDecls MethodDecl
;
MethodDecl: KW_PUBLIC Type IDENT
'('MethodParams')'
'{'VarDecls Statements KW_RETURN Expression ';' '}'
;
MethodParams: /*empty*/
| MethodParams ',' MethodParam
;
MethodParam: Type IDENT;
Type : Type '['']'
| KW_BOOLEAN
| KW_STRING
| KW_FLOAT
| KW_INT
| IDENT
;
Statements: Statements Statement %prec "STATEMENTS"
| /*empty*/ %prec "STATEMENT"
;
Statementp: Statements Statement %prec "STATEMENTS"
;
Statement: '{'Statements'}'
| KW_IF '(' Expression ')' Statement %prec "THEN"
| KW_IF '(' Expression ')' Statement KW_ELSE Statement
| KW_WHILE '(' Expression ')'Statement
| KW_PRINTLN '(' Expression ')' ';'
| IDENT OP_ASSIGN Expression ';'
| KW_BREAK ';'
| KW_CONTINUE ';'
| IDENT %prec "SUBSCRIPT" '['Expression']' '=' Expression ';'
| KW_SWITCH '(' Expression ')' '{'
Cases
KW_DEFAULT ':' Statementp '}'
;
Cases: Cases Case
| /*empty*/
;
Case: KW_CASE INT_LITERAL ':' Statementp
;
Expression: Expression OP_OR Expression
| Expression OP_AND Expression
| Expression CMP_EQ Expression
| Expression CMP_NEQ Expression
| Expression CMP_GT Expression
| Expression CMP_GTE Expression
| Expression CMP_LT Expression
| Expression CMP_LTE Expression
| Expression OP_ADD Expression
| Expression OP_MINUS Expression
| Expression OP_MULT Expression
| Expression OP_DIV Expression
| Expression OP_MOD Expression
| '-' Expression %prec OP_UNARY
| OP_NOT Expression
| Expression %prec "SUBSCRIPT" '['Expression']'
| Expression '.'"length"
| Expression '.' IDENT %prec "FUNCALL" '(' ParamList ')'
| INT_LITERAL
| FLOAT_LITERAL
| STRING_LITERAL
| KW_NULL
| KW_TRUE
| KW_FALSE
| IDENT
| KW_THIS
| KW_NEW Type '[' Expression ']' %prec "NEW"
| KW_NEW IDENT '('')' %prec "NEW"
| '(' Expression ')'
;
ParamList: /*empty*/
| ParamList ',' Expression
| Expression
;
%%
main(int argc, char** argv[])
{
extern FILE *yyin;
++argv; --argc;
yyin = fopen(argv[0], "r");
yydebug = 1;
errors = 0;
yyparse();
}
yyerror(char *s)
{
printf("%sn", s);
}
/* Co 3 conflict RR can xu ly khi bien thuoc kieu bool
giua BoolExpr va Expresstion */
I got two 16 conflicts when compile.
One of conflicts, I ran Bison with --report=lookahead:
OP_NOT Expression . [OP_OR, OP_AND, CMP_EQ, CMP_NEQ, CMP_GT, CMP_LT, CMP_GTE, CMP_LTE, OP_ADD, OP_MINUS, OP_MULT, OP_DIV, OP_MOD, ')', ';', ',', '[', ']']
What I expected is '[' not in OP_NOT lookahead token, because SUBSCRIPT precedence must higher than Operator !.
Others conflicts are like this. How can I solve it.
Tks

That's not how precedence works.
EDIT: If you find the following description confusing or you don't want to wade through so much English text, you could take my usual advice: Don't use precedence. You can almost always write an unambiguous grammar which doesn't require precedence declarations. And if you do that, you don't need to understand precedence. (Although, honestly, it is not that complicated if you understand how LR parsing works.) /EDIT
Precedence always compares a possible reduction (i.e. a production whose rightt-hand side matches the top of the current parser stack) and the lookahead symbol.
At the point:
Expression : Expression · '[' Expression ']'
the only possible parser action is a shift, because a reduction can only happen when the point is at the end of the right-hand side.
However, in one of the states in which that point occurs, there is another production:
Expression : OP_NOT Expression ·
This one can be reduced, since the point is at the end.
Since both these points are in the same state, they must both be valid. That means that we are looking at:
OP_NOT Expression · '[' Expression ']'
And we are trying to figure out what to do. We could reduce OP_NOT Expression to Expression, at which point we would have:
Expression · '[' Expression ']'
Or we could shift the '[', leaving us with
OP_NOT Expression '[' · Expression ']'
Since both of these are possible, there is a shift/reduce conflict. Yacc/Bison will try to resolve that conflict using a precedence rule if one exists. In particular, it needs to compare the precedence of the production which might be reduced:
Expression : OP_NOT Expression
and the symbol which might be shifted: '['.
However, a thorough look through the precedence declarations shows that there is no precedence assigned to '['. So yacc/bison cannot test it against the production (whose precedence is defined by the last terminal in the right-hand side, OP_NOT, since there is no %prec declaration.
If you want the postfix subscript operator ([ Expression ']') to have higher precedence than the prefix operator OP_NOT, you have to declare a precedence for [ which is higher than OP_NOT.
By the way, I don't see the point of the inconsistency here. You could have used ! for OP_NOT (and - for OP_MINUS and so on), which would be easier to read and less work.
You seem to think that the %prec declaration in
Expression %prec "SUBSCRIPT" '['Expression']'
is relevant. It is not. It would only be applicable if the parser could possible reduce Expression '[' Expression ']'. But it is also pointless, because you don't need to create a fake terminal to hold the precedence of that production; it's precedence is defined by the last terminal on the right-hand side, ']', so you could just declare a precedence for that terminal.
The fake token in the declaration Expression : OP_MINUS Expression %prec OP_UNARY is needed because '-' has two different precedences, or more accurately because OP_MINUS Expression has different precedence from Expresson OP_MINUS Expression. You didn't actually need to invent a fake token, though; you could use any token with the correct precedence, such as OP_NOT or OP_NEW.
In case that wasn't enough words, I have tried to explain this in several different SO answers. Here's one and here's another one and here's another one. Also, here's one by Chris Dodd and here's the documentation in the bison manual. If you're lucky, you might find a description in your own language, by using whatever internet search engine works best for you or by talking with your professor if you have one.
By the way, the lookahead report tells you which symbols might follow a given production. Precedence has no effect on this. [ can definitely follow !a. Precedence tells the parser whether to reduce or shift, but the token which might trigger the reduce or shift is definitely in the lookahead set.

Bison: If/Else reduce/reduce conflict

I'm new in Bison. I'm getting reduce/reduce conflict error but not getting where it is happening .The error message is "conflicts: 1 reduce/reduce". This is my grammar rule .
%token INT FLOAT CHAR EXIT V_MAIN BS BE NL EQU CM ADD SUB MUL DIV LT GT LP RP PRINT IF ELSE THN HH
%token <VarNm> V_NM
%token <num> NUMBER
%token <flt> R_NUM
%type <num> EXP TERM FACTOR CON STATEMENTS
%type <VarNm> X
%nonassoc THN
%nonassoc ELSE
%start STRT
%left LT GT
%left PLUS MINUS
%left MULT DIV
%%
STRT : V_MAIN BS CODE BE {printf("Compilation complete. :)n");}
| EXIT {exit(EXIT_SUCCESS);}
;
CODE : /* empty */
| CODE STATEMENTS NL {printf("Statements completen");}
| CODE DECLARATION NL {printf("Declaration completen");}
| STMNT
;
DECLARATION : TYPE V {printf("Dn");}
;
TYPE : INT {printf("In");}
| FLOAT {printf("Fn");}
| CHAR
;
V : V CM V_NM {AddNewVar($3);printf("Vn");}
| V_NM {AddNewVar($1);printf("Vn %sn",$1);}
| /* empty */ {printf("En");}
;
STATEMENTS : { $$ = 0; }
| EXP EQU X {AsgnVal($3,$1);}
| PRINT EXP {printf("Output: %dn",$2);}
| EXP { $$ = $1 ;}
;
STMNT : MIF NL
| UIF NL
;
MIF : IF CON THN HH MIF HH ELSE HH MIF HH {printf("MIF1n");}
| CODE STATEMENTS NL
;
UIF : IF CON THN HH STMNT HH {printf("UIF1n");}
| IF CON THN HH MIF HH ELSE HH UIF HH {printf("UIF2n");}
;
CON : EXP GT EXP { $$ = $1 > $3? 1: 0 ; }
| EXP LT EXP { $$ = $1 < $3? 1: 0 ; }
| EXP EQU EXP { $$ = $1 == $3? 1: 0 ; }
;
X : V_NM { $$=$1;CheckIfFound($1);}
;
EXP : TERM
| EXP ADD TERM { $$ = $1 + $3; }
| EXP SUB TERM { $$ = $1 - $3; }
;
TERM : TERM MUL FACTOR { $$ = $1 * $3; }
| TERM DIV FACTOR { if($3){$$ = $1 / $3;}
else {printf("Division by zeron");}
}
| FACTOR { $$ = $1; }
| X { $$=VarVal($1); }
;
FACTOR : NUMBER { $$ = $1; }
| LP EXP RP { $$ = $2; }
;
The conflict error started when I inserted the grammar for IF/ELSE.
Excluding that portion my code works just fine . I would also like to know if there is any way to detect where is this conflict happening using command .

The problem is indeed triggered by your if productions. It goes like this (leaving out other irrelevant productions):
MIF : CODE STATEMENTS NL
CODE : STMNT
| CODE STATEMENTS NL
STATEMENTS: %empty
STMNT: MIF NL
Note that NL is always a possible lookahead for CODE because CODE: CODE STATEMENTS NL and STATEMENTS: %empty, which means that CODE can derive CODE NL.)
CODE NL⇒ CODE STATEMENTS NL NL (CODE: CODE STATEMENTS NL)
⇒ STMNT STATEMENTS NL NL (CODE: STMNT)
⇒ MIF NL STATEMENTS NL NL (STMNT: MIF NL)
⇒ MIF NL STATEMENTS NL NL (MIF: CODE STATEMENTS NL)
⇒ CODE STATEMENTS NL NL STATEMENTS NL NL
That's certainly a reduce-reduce conflict. When the parser has found CODE STATEMENTS NL and sees NL as the lookahead, it could use either the reduction CODE: CODE STATEMENTS NL or MIF: CODE STATEMENTS NL.
If you were using the Wikipedia dangling-else page as a guide (or even if you weren't), take a careful look at the production closed_statement: non_if_statement. (closed_statement is the equivalent of your MIF).
Although there is no magic command where_is_my_error, it's pretty easy to see that problem using the report file which bison will produce if requested (with the -v or --report options). See a fully worked out example in the bison manual debugging chapter. Also super-useful is bison's trace facility, which will save you from having to scatter printf statements throughout your parser (and later remove them).
Your grammar would be a lot more readable (to people other than you) if you conformed to the usual style guidelines:
Use UPPER_CASE for tokens and lower_case or camelCase for non-terminals.
Write words out in full. Rmvng vwls frm dntfrs mks yr cd nrdbl.
Use bison's alias feature to write keywords as the keyword itself (in quotes):
%token T_THEN "then" T_IF "if" T_ELSE "else"
%%
matchedIf: "if" condition "then" HH matchedIf HH "else" HH matchedIf HH
See the bison manual chapter on Symbols
You can also use single-quoted character tokens to simplify your grammar. These don't even need to be declared:
term: term '*' factor
factor: '(' expression ')'

Shift Reduce Conflict

I'm having trouble fixing a shift reduce conflict in my grammar. I tried to add -v to read the output of the issue and it guides me towards State 0 and mentions that my INT and FLOAT is reduced to variable_definitions by rule 9. I cannot see the conflict and I'm having trouble finding a solution.
%{
#include <stdio.h>
#include <stdlib.h>
%}
%token INT FLOAT
%token ADDOP MULOP INCOP
%token WHILE IF ELSE RETURN
%token NUM ID
%token INCLUDE
%token STREAMIN ENDL STREAMOUT
%token CIN COUT
%token NOT
%token FLT_LITERAL INT_LITERAL STR_LITERAL
%right ASSIGNOP
%left AND OR
%left RELOP
%%
program: variable_definitions
| function_definitions
;
function_definitions: function_head block
| function_definitions function_head block
;
identifier_list: ID
| ID '[' INT_LITERAL ']'
| identifier_list ',' ID
| identifier_list ',' ID '[' INT_LITERAL ']'
;
variable_definitions:
| variable_definitions type identifier_list ';'
;
type: INT
| FLOAT
;
function_head: type ID arguments
;
arguments: '('parameter_list')'
;
parameter_list:
|parameters
;
parameters: type ID
| type ID '['']'
| parameters ',' type ID
| parameters ',' type ID '['']'
;
block: '{'variable_definitions statements'}'
;
statements:
| statements statement
;
statement: expression ';'
| compound_statement
| RETURN expression ';'
| IF '('bool_expression')' statement ELSE statement
| WHILE '('bool_expression')' statement
| input_statement ';'
| output_statement ';'
;
input_statement: CIN
| input_statement STREAMIN variable
;
output_statement: COUT
| output_statement STREAMOUT expression
| output_statement STREAMOUT STR_LITERAL
| output_statement STREAMOUT ENDL
;
compound_statement: '{'statements'}'
;
variable: ID
| ID '['expression']'
;
expression_list:
| expressions
;
expressions: expression
| expressions ',' expression
;
expression: variable ASSIGNOP expression
| variable INCOP expression
| simple_expression
;
simple_expression: term
| ADDOP term
| simple_expression ADDOP term
;
term: factor
| term MULOP factor
;
factor: ID
| ID '('expression_list')'
| literal
| '('expression')'
| ID '['expression']'
;
literal: INT_LITERAL
| FLT_LITERAL
;
bool_expression: bool_term
| bool_expression OR bool_term
;
bool_term: bool_factor
| bool_term AND bool_factor
;
bool_factor: NOT bool_factor
| '('bool_expression')'
| simple_expression RELOP simple_expression
;
%%

Your definition of a program is that it is either a list of variable definitions or a list of function definitions (program: variable_definitions | function_definitions;). That seems a bit odd to me. What if I want to define both a function and a variable? Do I have to write two programs and somehow link them together?
This is not the cause of your problem, but fixing it would probably fix the problem as well. The immediate cause is that function_definitions is one or more function definition while variable_definitions is zero or more variable definitions. In other words, the base case of the function_definitions recursion is a function definition, while the base case of variable_definitions is the empty sequence. So a list of variable definitions starts with an empty sequence.
But both function definitions and variable definitions start with a type. So if the first token of a program is int, it could be the start of a function definition with return type int or a variable definition of type int. In the former case, the parser should shift the int in order to produce the function_definitions base case:; in the latter case, it should immediately reduce an empty variable_definitions base case.
If you really wanted a program to be either function definitions or variable definitions, but not both. you would need to make variable_definitions have the same form as function_definitions, by changing the base case from empty to type identifier_list ';'. Then you could add an empty production to program so that the parser could recognize empty inputs.
But as I said at the beginning, you probably want a program to be a sequence of definitions, each of which could either be a variable or a function:
program: %empty
| program type identifier_list ';'
| program function_head block
By the way, you are misreading the output file produced by -v. It shows the following actions for State 0:
INT shift, and go to state 1
FLOAT shift, and go to state 2
INT [reduce using rule 9 (variable_definitions)]
FLOAT [reduce using rule 9 (variable_definitions)]
Here, INT and FLOAT are possible lookaheads. So the interpretation of the line INT [reduce using rule 9 (variable_definitions)] is "if the lookahead is INT, immediately reduce using production 9". Production 9 produces the empty sequence, so the reduction reduces zero tokens at the top of the parser stack into a variable_definitions. Reductions do not use the lookahead token, so after the reduction, the lookahead token is still INT.
However, the parser doesn't actually do that because it has a different action for INT, which is to shift it and go to state 1. as indicated by the first line start INT. The brackets [...] indicate that this action is not taken because it is a conflict and the resolution of the conflict was some other action. So the more accurate interpretation of that line is "if it weren't for the preceding action on INT, the lookahead INT would cause a reduction using rule 9."

Cannot find “2 rules never reduce” and “conflicts: 2 shift/reduce” in yacc file

I cannot seem to find the above errors when I am trying to "compile" my yacc file. I am hoping someone can point out the lines and let me know what I would need to do to correct the problem(s). I have listed my file:
%token Identifier Int Real_Num And Array
%token Begin Boolean Div Do Else
%token End False For Goto If
%token Imply Integer Label Not Or
%token Own Procedure Real Step String
%token Then True Until Value While
%token Plus Minus Mult Divide Less
%token LessEq Great GreatEq Eq NotEq
%token Comma Colon Semi LParan RParan
%token LBrak RBrak Assign
%{
%}
%start program
%%
program
: block
;
block
: Begin optdecls stmts End
;
optdecls
: /* empty */
| decl Comma optdecls
;
decl
: vardecl
| arraydecl
;
vardecl
: type idlist
;
type
: Real
| Integer
| Boolean
;
idlist
: Identifier
| Identifier Comma idlist
;
arraydecl
: Array arraylist
| type Array arraylist
;
arraylist
: arrayseg
| arrayseg Comma arraylist
;
arrayseg
: Identifier LBrak a_expr Colon a_expr RBrak
stmts
: stmt
| stmt Semi stmts
;
stmt
: u_stmt
| if_stmt
| for_stmt
;
u_stmt
: assign
| dummy
| block
;
assign
: var Assign expr
| var Assign assign
;
dummy
: /* empty */
;
for_stmt
: For var Assign a_expr Step a_expr Until a_expr Do stmt
;
if_stmt
: If expr Then u_stmt
| If expr Then u_stmt Else stmt
| If expr Then for_stmt
;
var
: Identifier
| Identifier LBrak a_expr RBrak
;
factor
: Int
| Real_Num
| var
| LParan expr RParan
;
term
: factor
| term Mult factor
| term Divide factor
| term Div factor
;
sum
: term
| Plus term
| Minus term
| sum Plus term
| sum Minus term
;
brel
: sum
| True
| False
| sum relation sum
;
relation
: Less
| LessEq
| Great
| GreatEq
| Eq
| NotEq
;
bsecond
: brel
| Not brel
;
bfactor
: bsecond
| bfactor And bsecond
;
bterm
: bfactor
| bterm Or bfactor
;
expr
: bterm
| If expr Then bterm Else expr
;
a_expr
: sum
| If expr Then sum Else a_expr
;
%%
Sorry for the long post but I believe the coded all of the code is relevant. Thank you for you help.

Use the -v option to bison, which will generate a .output file with information on all the states and conflicts. With you grammar, this gives conflicts in states 16 and 30.
State 16 is:
state 16
15 arraylist: arrayseg .
16 | arrayseg . Comma arraylist
Comma shift, and go to state 34
Comma [reduce using rule 15 (arraylist)]
Which is telling you that it doesn't know whether to shift the Comma after seeing an arrayseg in order to parse more arraylist, or if it should reduce the arraylist and allow the Comma to be part of some other rule (such as optdecls) that allows for a Comma after an arraylist.
State 30 is:
state 30
11 idlist: Identifier .
12 | Identifier . Comma idlist
Comma shift, and go to state 59
Comma [reduce using rule 11 (idlist)]
which is basically the same thing.
The essential problem is that your grammar describes comma-separated lists of things that contain comma separated lists, so it needs more lookahead to determine if the commas belong to the inner list or the outer list.
Perhaps you should have Semi rather than Comma between decls?

How to write flex and bison file parsing this language?

Let's define a language:
VAR := [0-9A-Za-z_]+
Exp := VAR
| VAR,'=',VAR
| '(', Exp, ')'
| Exp, '&', Exp
| Exp ,'|', Exp
eg: "( a = b ) & ( c | (d=e) ) "is legal
I've read the YASS & Lex manual, but I'm totally confused , I just want the compiler that can parse this language
Can you tell me how to write the flex&bison configure file for this language?
I've done so far:
file a.l:
%{
#include <string.h>
#include "stdlib.h"
#include "stdio.h"
#include "y.tab.h"
%}
%%
("&"|"and"|"AND") { return AND; }
("|"|"or"|"OR") { return OR; }
("="|"eq"|"EQ") { return EQ; }
([A-Za-z0-9_]+) { return VAR;}
("(") { return LB ;}
(")") { return RB ;}
("n") { return LN ;}
%%
int main(void)
{
yyparse();
return 0;
}
int yywrap(void)
{
return 0;
}
int yyerror(void)
{
printf("Errorn");
exit(1);
}
file a.y
%{
#include <stdio.h>
%}
%token AND OR EQ VAR LB RB LN
%left AND OR
%left EQ
%%
line :
| exp LN{ printf("LN: %s",$1);}
;
exp: VAR { printf("var:%s",$1);}
| VAR EQ VAR { printf("var=:%s %s %s",$1,$2,$3);}
| exp AND exp { printf("and :%s %s %s",$1,$2,$3);}
| exp OR exp { printf("or :%s %s %s",$1,$2,$3);}
| LB exp RB { printf("abstract :%s %s %s",$1,$2,$3);}
;
Now I edited file as Chris Dodd guided,it seems much better(at least the lex worked fine),but I get output like this:
disk_path>myprogram
a=b
var=:(null) (null) (null)LN: (null)ab=b
Error
So, why the function printf output null? and after input the second ,it prompt Error and exit the program?

First write a lex file to tokenize input (and print out what it sees)
You want to introduce the terminals:
[0-9A-Za-z_]+ --> VAR
( --> LPAREN and ) --> RPAREN
& --> AND
| --> OR
= --> EQUAL
and just print out a word for each. For your example
( a = b ) & ( c | (d=e) ) --> LPAREN VAR EQUAL VAR RPAREN AND LPAREN VAR OR LPAREN VAR EQUAL VAR RPAREN RPAREN
This is doable in pure lex. When you do this, update your response and we can talk about the next step

Your lex rule ("[0-9A-Za-z_]+") will match (only) the literal string [0-9A-Za-z_]+ -- get rid of the " characters to have it be a pattern to match any identifier or number.
Your yacc code does not match your lex code for punctuation -- the lex code returns AND for & while the yacc code is expecting an & -- so either change the lex code to return '&' or change the yacc code to use the token AND, and similarly for |, (, and ). You might also want to ignore spaces in the lex code (rather than treating them as errors). You also have no lex rule to match and return 'n', even though you use that in your yacc grammar.
Your yacc code is otherwise correct, but is ambiguous, thus giving you shift/reduce conflicts. That's because your grammar is ambiguous -- an input like a&b|c can be parsed as either (a&b)|c or a&(b|c). You need to decide how that ambiguity should be resolved and reflect that in your grammar -- either by using more non-terminals, or by using yacc's built-in precedence support for resolving this kind of ambiguity. If you stick the declarations:
%left '|'
%left '&'
in the top of your yacc file, that will resolve the ambiguity by making both & and | left associative, and & higher precedence than |, which would be the normal interpretation.
Edit
The problem you have now is that you never define YYSTYPE (either directly or with %union) in your .y file and you never set yylval in your .l file. The first problem means that $1 etc are just ints, not pointers (so it makes no sense to try to print them with %s -- you should be getting a warning from your C compiler over that). The second problem means that they never have a value anyways, so its just always the default 0 value of an uninitialized global variable
The easiest fix would be to add
%union {
const char *name;
}
%token <name> VAR LB RB LN
%left <name> AND OR
%left <name> EQ
%type <name> expr
to the top of the yacc file. Then change the all the lex rules to be something like
([A-Za-z0-9_]+) { yylval.name = strdup(yytext); return VAR;}
Finally, you also need to change the bison actions for expr to set $$, eg:
| LB exp RB { asprintf(&$$, "%s %s %s",$1,$2,$3); printf("abstract: %sn", $$); }
This will at least work, though it will leak lots of memory for the allocated strings.
The last problem you have is that your line rule only matches a single line, so a second line of input causes an error. You need a recursive rule like:
line: /* empty */
| line exp LN { printf....

Источник

Related

Bison shift/reduce conflicts

Bison: If/Else reduce/reduce conflict

Shift Reduce Conflict

Cannot find “2 rules never reduce” and “conflicts: 2 shift/reduce” in yacc file

How to write flex and bison file parsing this language?

Читайте также: