lex(1) 맨 페이지 - 윈디하나의 솔라나라

개요

섹션
맨 페이지 이름
검색(S)

lex(1)

lex(1)                           User Commands                          lex(1)



NAME
       lex - generate programs for lexical tasks

SYNOPSIS
       lex [-cntvV] [-e | -w] [-V -Q [y | n]] [file]...

DESCRIPTION
       The  lex  utility generates C programs to be used in lexical processing
       of character input, and that can be used as an interface to yacc. The C
       programs  are  generated  from lex source code and conform to the ISO C
       standard. Usually, the lex utility writes the program it  generates  to
       the  file  lex.yy.c. The state of this file is unspecified if lex exits
       with a non-zero exit status. See EXTENDED DESCRIPTION  for  a  complete
       description of the lex input language.

OPTIONS
       The following options are supported:

       -c           Indicates C-language action (default option).


       -e           Generates a program that can handle EUC characters (cannot
                    be used with the -w option). yytext[] is of type  unsigned
                    char[].


       -n           Suppresses  the summary of statistics usually written with
                    the -v option. If no table sizes are specified in the  lex
                    source code and the -v option is not specified, then -n is
                    implied.


       -t           Writes the resulting program to standard output instead of
                    lex.yy.c.


       -v           Writes  a summary of lex statistics to the standard error.
                    (See the discussion of lex table sizes under  the  heading
                    Definitions  in  lex.) If table sizes are specified in the
                    lex source code, and if the -n option  is  not  specified,
                    the -v option can be enabled.


       -w           Generates a program that can handle EUC characters (cannot
                    be used  with  the  -e  option).  Unlike  the  -e  option,
                    yytext[] is of type wchar_t[].


       -V           Print version information.
       --version


       -Q[y|n]      Prints  out version information to output file lex.yy.c by
                    using -Qy. The -Qn  option  does  not  print  out  version
                    information and is the default.


       -?           Print usage message and immediately exit.
       --help


OPERANDS
       The following operand is supported:

       file    A  pathname  of  an  input  file. If more than one such file is
               specified, all files is concatenated to produce  a  single  lex
               program.  If no file operands are specified, or if a file oper‐
               and is −, the standard input is used.


OUTPUT
       The lex output files are described below.

   Stdout
       If the -t option is specified, the text file of C source code output of
       lex is written to standard output.

   Stderr
       If the -t option is specified informational, error and warning messages
       concerning the contents of lex source code  input  is  written  to  the
       standard error.


       If the -t option is not specified:

           1.     Informational error and warning messages concerning the con‐
                  tents of lex source code input  is  written  to  either  the
                  standard output or standard error.


           2.     If the -v option is specified and the -n option is not spec‐
                  ified, lex statistics is also  written  to  standard  error.
                  These  statistics  can  also be generated if table sizes are
                  specified with a % operator in the Definitions  in  lex sec‐
                  tion (see EXTENDED DESCRIPTION), as long as the -n option is
                  not specified.



   Output Files
       A text file containing C source code is written to lex.yy.c, or to  the
       standard output if the -t option is present.

EXTENDED DESCRIPTION
       Each  input  file contains lex source code, which is a table of regular
       expressions with corresponding actions in the form of C  program  frag‐
       ments.


       When lex.yy.c is compiled and linked with the lex library (using the -l
       l operand with c89 or cc), the resulting program reads character  input
       from  the  standard input and partitions it into strings that match the
       given expressions.


       When an expression is matched, these actions occur:

           o      The input string that was matched is left  in  yytext  as  a
                  null-terminated string; yytext is either an external charac‐
                  ter array or a pointer to a character string.  As  explained
                  in  Definitions  in lex, the type can be explicitly selected
                  using the %array or %pointer declarations, but  the  default
                  is %array.


           o      The  external int  yyleng is set to the length of the match‐
                  ing string.


           o      The expression's corresponding program fragment, or  action,
                  is executed.



       During  pattern matching, lex searches the set of patterns for the sin‐
       gle longest possible match. Among rules that match the same  number  of
       characters, the rule given first is chosen.


       The general format of lex source is:

         Definitions
         %%
         Rules
         %%
         User Subroutines



       The  first  %%  is required to mark the beginning of the rules (regular
       expressions and actions); the second %% is required only if  user  sub‐
       routines follow.


       Any  line  in  the  Definitions  in  lex section beginning with a blank
       character is assumed to be a C program fragment and is  copied  to  the
       external  definition  area of the lex.yy.c file. Similarly, anything in
       the Definitions  in  lex section included between delimiter lines  con‐
       taining only %{ and %} is also copied unchanged to the external defini‐
       tion area of the lex.yy.c file.


       Any such input (beginning with a blank character or within  %{  and  %}
       delimiter lines) appearing at the beginning of the Rules section before
       any rules are specified is written to lex.yy.c after  the  declarations
       of  variables  for the yylex function and before the first line of code
       in yylex. Thus, user variables local to yylex can be declared here,  as
       well as application code to execute upon entry to yylex.


       The  action  taken  by lex when encountering any input beginning with a
       blank character or within %{ and %} delimiter lines  appearing  in  the
       Rules  section  but  coming  after  one or more rules is undefined. The
       presence of such input can result in an  erroneous  definition  of  the
       yylex function.

   Definitions in lex
       Definitions   in  lex appear before the first %% delimiter. Any line in
       this section not contained between %{ and %} lines  and  not  beginning
       with  a blank character is assumed to define a lex substitution string.
       The format of these lines is:

         name   substitute



       If a name does not meet the requirements for identifiers in the  ISO  C
       standard,  the  result is undefined. The string substitute replaces the
       string {  name  } when it is used in a rule. The name string is  recog‐
       nized  in  this  context  only when the braces are provided and when it
       does not appear within a bracket expression or within double-quotes.


       In the Definitions  in  lex section, any line beginning with a %  (per‐
       cent  sign)  character  and  followed by an alphanumeric word beginning
       with either s or S defines a set of start conditions. Any  line  begin‐
       ning with a % followed by a word beginning with either x or X defines a
       set of exclusive start conditions. When the generated scanner is  in  a
       %s  state, patterns with no state specified also active; in a %x state,
       such patterns are not active. The rest of the  line,  after  the  first
       word,  is  considered to be one or more blank-character-separated names
       of start conditions. Start condition names are constructed in the  same
       way  as  definition names. Start conditions can be used to restrict the
       matching of regular expressions to one or more states as  described  in
       Regular expressions in lex.


       Implementations  accept  either of the following two mutually exclusive
       declarations in the Definitions  in  lex section:

       %array      Declare the type of yytext to be a null-terminated  charac‐
                   ter array.


       %pointer    Declare the type of yytext to be a pointer to a null-termi‐
                   nated character string.



       When using the %pointer option, you cannot also use the yyless function
       to alter yytext.


       %array  is  the  default. If %array is specified (or neither %array nor
       %pointer is specified), then the correct way to make an external refer‐
       ence to yyext is with a declaration of the form:


       extern char  yytext[]


       If %pointer is specified, then the correct external reference is of the
       form:


       extern char *yytext;


       lex accepts declarations in the Definitions in lex section for  setting
       certain internal table sizes. The declarations are shown in the follow‐
       ing table.


       Table  Size  Declaration  in  lex


       tab() box; cw(1.28i) cw(2.94i) cw(1.28i) cw(1.28i) lw(2.94i) lw(1.28i)


       DeclarationDescriptionDefault _ %pnNumber of positions2500 %nnNumber of
       states500   %anNumber   of  transitions2000  %enNumber  of  parse  tree
       nodes1000 %knNumber of packed character  classes10000  %onSize  of  the
       output array3000



       Programs  generated  by  lex  need either the -e or -w option to handle
       input that contains EUC characters from supplementary codesets. If nei‐
       ther  of  these options is specified, yytext is of the type char[], and
       the generated program can handle only ASCII characters.


       When the -e option is used, yytext is of the type unsigned  char[]  and
       yyleng gives the total number of bytes in the matched string. With this
       option, the macros input(), unput(c), and output(c) should do  a  byte-
       based  I/O  in  the  same  way as with the regular ASCII  lex. Two more
       variables are available with the -e option, yywtext and yywleng,  which
       behave the same as yytext and yyleng would under the -w option.


       When  the -w option is used, yytext is of the type wchar_t[] and yyleng
       gives the total number of characters in the matched string. If you sup‐
       ply  your  own input(), unput(c), or output(c) macros with this option,
       they must return or accept EUC characters in the form of wide character
       (wchar_t).  This  allows a different interface between your program and
       the lex internals, to expedite some programs.

   Rules in lex
       The Rules  in  lex source files are a table in which  the  left  column
       contains  regular  expressions and the right column contains actions (C
       program fragments) to be executed when the expressions are recognized.

         ERE action
         ERE action
         ...



       The extended regular expression (ERE) portion of  a  row  is  separated
       from  action by one or more blank characters. A regular expression con‐
       taining blank characters is recognized under one of the following  con‐
       ditions:

           o      The entire expression appears within double-quotes.


           o      The  blank  characters appear within double-quotes or square
                  brackets.


           o      Each blank character is preceded by a backslash character.


   User Subroutines in lex
       Anything in the user subroutines section is copied to lex.yy.c  follow‐
       ing yylex.

   Regular Expressions in lex
       The lex utility supports the set of Extended Regular Expressions (EREs)
       described on regex(7) with the following additions  and  exceptions  to
       the syntax:

       ...          Any  string enclosed in double-quotes represents the char‐
                    acters within the double-quotes as themselves, except that
                    backslash  escapes  (which  appear in the following table)
                    are recognized. Any backslash-escape  sequence  is  termi‐
                    nated  by  the closing quote. For example, "\01""1" repre‐
                    sents a single string: the octal value 1 followed  by  the
                    character 1.



       <state>r

       <state1, state2, ...>r

           The regular expression r is matched only when the program is in one
           of the start conditions indicated by state, state1, and  so  forth.
           For  more  information,  see Actions in lex. As an exception to the
           typographical conventions of the rest of  this  document,  in  this
           case  <state>  does  not  represent a metavariable, but the literal
           angle-bracket characters surrounding a symbol. The start  condition
           is  recognized  as  such only at the beginning of a regular expres‐
           sion.


       r/x

           The regular expression r is matched only if it is  followed  by  an
           occurrence of regular expression x. The token returned in yytext is
           only matched r. If the trailing portion of r matches the  beginning
           of  x,  the  result is unspecified. The r expression cannot include
           further trailing context or the $ (match-end-of-line)  operator;  x
           cannot include the ^ (match-beginning-of-line) operator, nor trail‐
           ing context, nor the $ operator. That is, only  one  occurrence  of
           trailing  context is allowed in a lex regular expression, and the ^
           operator only can be used at the beginning of such an expression. A
           further restriction is that the trailing-context operator / (slash)
           cannot be grouped within parentheses.


       {name}

           When name is one of the substitution symbols from  the  Definitions
           section, the string, including the enclosing braces, is replaced by
           the substitute value.  The  substitute  value  is  treated  in  the
           extended  regular expression as if it were enclosed in parentheses.
           No substitution occurs if {name} occurs within a bracket expression
           or within double-quotes.



       Within  an  ERE, a backslash character (\\, \a, \b, \f, \n, \r, \t, \v)
       is considered to begin an escape  sequence.  In  addition,  the  escape
       sequences in the following table is recognized.


       A  literal  newline  character  cannot  occur within an ERE; the escape
       sequence \n can be used to represent a  newline  character.  A  newline
       character cannot be matched by a period operator.


       Escape Sequences in lex


       tab()  box; cw(1.22i) lw(1.22i) lw(2.92i) lw(1.36i) lw(1.22i) lw(2.92i)
       lw(1.36i)


       Escape Sequences in lex _ Escape SequenceDescription  Meaning  _  \dig‐
       itsT{  A  backslash  character followed by the longest sequence of one,
       two or three octal-digit characters (01234567). Ifall of the digits are
       0,  (that  is,  representation  of  the NUL character), the behavior is
       undefined.  T}T{ The character whose encoding  is  represented  by  the
       one-,  two- or three-digit octal integer. Multi-byte characters require
       multiple, concatenated escape sequences of  this  type,  including  the
       leading  \  for  each byte.  T} _ \xdigitsT{ A backslash character fol‐
       lowed  by  the  longest  sequence   of   hexadecimal-digit   characters
       (01234567abcdefABCDEF). If all of the digits are 0, (that is, represen‐
       tation of the NUL character), the  behavior  is  undefined.   T}T{  The
       character whose encoding is represented by the hexadecimal integer.  T}
       _ \cT{ A backslash character followed by any character not described in
       this  table.  (\\,  \a,  \b,  \f, \en, \r, \t, \v).  T}The character c,
       unchanged.



       The order of precedence given to extended regular expressions  for  lex
       is as shown in the following table, from high to low.


       The escaped characters entry is not meant to imply that these are oper‐
       ators, but they are included in the table to show  their  relationships
       to  the  true  operators.  The  start  condition,  trailing context and
       anchoring notations have been omitted from the  table  because  of  the
       placement  restrictions described in this section; they can only appear
       at the beginning or ending of an ERE.


       tab() box; cw(2.75i) lw(2.75i) lw(2.75i)


       ERE Precedence in lex _ collation-related bracket symbols[= =] [: :] [.
       .]   escaped characters\<special character> bracket expression[ ] quot‐
       ing"..."  grouping() definition{name} single-character RE  duplication*
       + ?  concatenation interval expression{m,n} alternation|



       The  ERE anchoring operators (^ and $) do not appear in the table. With
       lex regular expressions, these operators are restricted in  their  use:
       the  ^  operator can only be used at the beginning of an entire regular
       expression, and the $ operator only at the end. The operators apply  to
       the   entire   regular  expression.  Thus,  for  example,  the  pattern
       (^abc)|(def$) is undefined; it can instead be written as  two  separate
       rules,  one  with  the regular expression ^abc and one with def$, which
       share a common action via the special | action (see below). If the pat‐
       tern  were  written ^abc|def$, it would match either of abc or def on a
       line by itself.


       Unlike the general ERE rules, embedded anchoring is not allowed by most
       historical  lex implementations. An example of embedded anchoring would
       be for patterns such as (^)foo($) to match foo when it exists as a com‐
       plete  word. This functionality can be obtained using existing lex fea‐
       tures:

         ^foo/[ \n]|
         " foo"/[ \n]    /* found foo as a separate word */



       Notice also that $ is a form of trailing context (it is  equivalent  to
       /\n  and  as  such  cannot  be used with regular expressions containing
       another instance of the  operator  (see  the  preceding  discussion  of
       trailing context).


       The  additional regular expressions trailing-context operator / (slash)
       can be used as an ordinary character if presented within double-quotes,
       "/";  preceded by a backslash, \/; or within a bracket expression, [/].
       The start-condition < and > operators are special only in a start  con‐
       dition at the beginning of a regular expression; elsewhere in the regu‐
       lar expression they are treated as ordinary characters.


       The following examples clarify  the  differences  between  lex  regular
       expressions  and  regular expressions appearing elsewhere in this docu‐
       ment. For regular expressions of the form r/x, the string matching r is
       always  returned;  confusion  can arise when the beginning of x matches
       the trailing portion of r. For example, given  the  regular  expression
       a*b/cc  and  the  input aaabcc, yytext would contain the string aaab on
       this match. But given the regular expression x*/xy and the input  xxxy,
       the  token xxx, not xx, is returned by some implementations because xxx
       matches x*.


       In the rule ab*/bc, the b* at the end of r extends r's match  into  the
       beginning  of  the  trailing  context, so the result is unspecified. If
       this rule were ab/bc, however, the rule matches the text ab when it  is
       followed  by the text bc. In this latter case, the matching of r cannot
       extend into the beginning of x, so the result is specified.

   Actions in lex
       The action to be taken when an ERE is matched can be a C program  frag‐
       ment  or  the special actions described below; the program fragment can
       contain one or more C statements, and can also include special actions.
       The  empty  C statement ; is a valid action; any string in the lex.yy.c
       input that matches the pattern portion of such a  rule  is  effectively
       ignored or skipped. However, the absence of an action is not valid, and
       the action lex takes in such a condition is undefined.


       The specification for an action, including  C  statements  and  special
       actions, can extend across several lines if enclosed in braces:

         ERE <one or more blanks> { program statement
         program statement }



       The  default action when a string in the input to a lex.yy.c program is
       not matched by any expression is to copy  the  string  to  the  output.
       Because  the  default behavior of a program generated by lex is to read
       the input and copy it to the output, a minimal lex source program  that
       has  just  %% generates a C program that simply copies the input to the
       output unchanged.


       Four special actions are available:

         |       ECHO;      REJECT;      BEGIN


       |          The action | means that the action for the next rule is  the
                  action for this rule. Unlike the other three actions, | can‐
                  not be enclosed in braces  or  be  semicolon-terminated.  It
                  must be specified alone, with no other actions.


       ECHO;      Writes the contents of the string yytext on the output.


       REJECT;    Usually  only  a  single  expression  is  matched by a given
                  string in the input.  REJECT  means  continue  to  the  next
                  expression  that matches the current input, and causes what‐
                  ever rule was the second choice after the current rule to be
                  executed  for  the  same  input. Thus, multiple rules can be
                  matched and executed for one  input  string  or  overlapping
                  input  strings.  For  example, given the regular expressions
                  xyz and xy and the  input  xyz,  usually  only  the  regular
                  expression  xyz  would match. The next attempted match would
                  start after z. If the last action in the xyz rule is  REJECT
                  ,  both  this  rule  and  the xy rule would be executed. The
                  REJECT action can be implemented in such a fashion that flow
                  of control does not continue after it, as if it were equiva‐
                  lent to a goto to another part of yylex. The use  of  REJECT
                  can result in somewhat larger and slower scanners.


       BEGIN      The action:

                  BEGIN  newstate;

                  switches  the  state  (start  condition) to newstate. If the
                  string newstate has not been declared previously as a  start
                  condition  in  the Definitions  in  lex section, the results
                  are unspecified. The initial state is indicated by the digit
                  0 or the token INITIAL.



       The  functions  or  macros  described below are accessible to user code
       included in the lex input. It is unspecified whether they appear in the
       C  code  output of lex, or are accessible only through the -l l operand
       to c89 or cc (the lex library).

       int yylex(void)     Performs lexical analysis on the input; this is the
                           primary  function generated by the lex utility. The
                           function returns zero when  the  end  of  input  is
                           reached;   otherwise  it  returns  non-zero  values
                           (tokens)  determined  by  the  actions   that   are
                           selected.


       int yymore(void)    When  called,  indicates  that  when the next input
                           string is recognized, it is to be appended  to  the
                           current  value  of yytext rather than replacing it;
                           the value in yyleng is adjusted accordingly.


       intyyless(int n)    Retains n initial characters in yytext,  NUL-termi‐
                           nated,  and  treats  the remaining characters as if
                           they had not been read;  the  value  in  yyleng  is
                           adjusted accordingly.


       int input(void)     Returns  the next character from the input, or zero
                           on end-of-file. It obtains input  from  the  stream
                           pointer yyin, although possibly via an intermediate
                           buffer. Thus, once scanning has begun,  the  effect
                           of  altering  the  value  of yyin is undefined. The
                           character read is removed from the input stream  of
                           the scanner without any processing by the scanner.


       int unput(int c)    Returns  the  character  c to the input; yytext and
                           yyleng are undefined until the next  expression  is
                           matched. The result of using unput for more charac‐
                           ters than have been input is unspecified.



       The following functions appear  only  in  the  lex  library  accessible
       through the -l l operand; they can therefore be redefined by a portable
       application:

       int yywrap(void)

           Called by yylex at end-of-file; the default yywrap  always  returns
           1.  If  the  application requires yylex to continue processing with
           another source of input, then the application can include  a  func‐
           tion  yywrap, which associates another file with the external vari‐
           able FILE *yyin and returns a value of zero.


       int main(int argc, char *argv[])

           Calls yylex to perform lexical analysis, then exits. The user  code
           can  contain main to perform application-specific operations, call‐
           ing yylex as applicable.



       The reason for breaking these functions into two  lists  is  that  only
       those  functions  in  libl.a  can  be  reliably redefined by a portable
       application.


       Except for input, unput and main, all external and static names  gener‐
       ated by lex begin with the prefix yy or YY.

USAGE
       Portable  applications  are warned that in the Rules in lex section, an
       ERE without an action is not acceptable, but need not  be  detected  as
       erroneous by lex. This can result in compilation or run-time errors.


       The  purpose  of  input  is to take characters off the input stream and
       discard them as far as the lexical analysis is concerned. A common  use
       is  to discard the body of a comment once the beginning of a comment is
       recognized.


       The lex utility is not fully internationalized in its treatment of reg‐
       ular  expressions in the lex source code or generated lexical analyzer.
       It would seem desirable to have the lexical analyzer interpret the reg‐
       ular  expressions  given in the lex source according to the environment
       specified when the lexical analyzer is executed, but this is not possi‐
       ble  with  the  current lex technology. Furthermore, the very nature of
       the lexical analyzers produced by lex must be closely tied to the lexi‐
       cal  requirements  of the input language being described, which is fre‐
       quently locale-specific anyway. (For example, writing an analyzer  that
       is  used  for French text is not automatically be useful for processing
       other languages.)

EXAMPLES
       Example 1 Using lex



       The following is an example of a lex program that implements a rudimen‐
       tary scanner for a Pascal-like syntax:




         %{
         /* need this for the call to atof() below */
         #include <math.h>
         /* need this for printf(), fopen() and stdin below */
         #include <stdio.h>
         %}

         DIGIT    [0-9]
         ID       [a-z][a-z0-9]*
         %%

         {DIGIT}+  {
                                printf("An integer: %s (%d)\n", yytext,
                                atoi(yytext));
                                }

         {DIGIT}+"."{DIGIT}*    {
                                printf("A float: %s (%g)\n", yytext,
                                atof(yytext));
                                }

         if|then|begin|end|procedure|function        {
                                printf("A keyword: %s\n", yytext);
                                }

         {ID}                   printf("An identifier: %s\n", yytext);

         "+"|"-"|"*"|"/"        printf("An operator: %s\n", yytext);

         "{"[^}\n]*"}"         /* eat up one-line comments */

         [ \t\n]+               /* eat up white space */

         .                      printf("Unrecognized character: %s\n", yytext);

         %%

         int main(int argc, char *argv[])
         {
                               ++argv, --argc;  /* skip over program name */
                               if (argc > 0)
                                     yyin = fopen(argv[0], "r");
                               else
                               yyin = stdin;

                               yylex();
         }


ENVIRONMENT VARIABLES
       See  environ(7) for descriptions of the following environment variables
       that affect the execution of lex: LANG, LC_ALL,  LC_COLLATE,  LC_CTYPE,
       LC_MESSAGES, and NLSPATH.

EXIT STATUS
       The following exit values are returned:

       0     Successful completion.


       >0    An error occurred.


ATTRIBUTES
       See attributes(7) for descriptions of the following attributes:


       tab()  box; cw(2.75i) |cw(2.75i) lw(2.75i) |lw(2.75i) ATTRIBUTE TYPEAT‐
       TRIBUTE VALUE _ Availabilitydeveloper/base-developer-utilities _ Inter‐
       face StabilityCommitted _ StandardSee standards(7).


SEE ALSO
       yacc(1), attributes(7), environ(7), regex(7), standards(7)

NOTES
       If  routines such as yyback(), yywrap(), and yylock() in .l (ell) files
       are to be external C functions, the command line to compile a C++  pro‐
       gram must define the __EXTERN_C__ macro. For example:

         example%  CC -D__EXTERN_C__ ... file




Oracle Solaris 11.4             30 August 2017                          lex(1)
맨 페이지 내용의 저작권은 맨 페이지 작성자에게 있습니다.
RSS ATOM XHTML 5 CSS3