lex(1) 맨 페이지 - 윈디하나의 솔라나라

개요

섹션
맨 페이지 이름
검색(S)

lex(1)

lex(1)                           User Commands                          lex(1)



NAME
       lex - generate programs for lexical tasks

SYNOPSIS
       lex [-cntvV] [-e | -w] [-V -Q [y | n]] [file]...

DESCRIPTION
       The  lex  utility generates C programs to be used in lexical processing
       of character input, and that can be used as an interface to yacc. The C
       programs  are  generated  from lex source code and conform to the ISO C
       standard. Usually, the lex utility writes the program it  generates  to
       the  file  lex.yy.c. The state of this file is unspecified if lex exits
       with a non-zero exit status. See EXTENDED DESCRIPTION  for  a  complete
       description of the lex input language.

OPTIONS
       The following options are supported:

       -c           Indicates C-language action (default option).


       -e           Generates a program that can handle EUC characters (cannot
                    be used with the -w option). yytext[] is of type  unsigned
                    char[].


       -n           Suppresses  the summary of statistics usually written with
                    the -v option. If no table sizes are specified in the  lex
                    source code and the -v option is not specified, then -n is
                    implied.


       -t           Writes the resulting program to standard output instead of
                    lex.yy.c.


       -v           Writes  a summary of lex statistics to the standard error.
                    (See the discussion of lex table sizes under  the  heading
                    Definitions  in  lex.) If table sizes are specified in the
                    lex source code, and if the -n option  is  not  specified,
                    the -v option can be enabled.


       -w           Generates a program that can handle EUC characters (cannot
                    be used  with  the  -e  option).  Unlike  the  -e  option,
                    yytext[] is of type wchar_t[].


       -V           Print version information.
       --version


       -Q[y|n]      Prints  out version information to output file lex.yy.c by
                    using -Qy. The -Qn  option  does  not  print  out  version
                    information and is the default.


       -?           Print usage message and immediately exit.
       --help


OPERANDS
       The following operand is supported:

       file    A  pathname  of  an  input  file. If more than one such file is
               specified, all files is concatenated to produce  a  single  lex
               program.  If no file operands are specified, or if a file oper‐
               and is −, the standard input is used.


OUTPUT
       The lex output files are described below.

   Stdout
       If the -t option is specified, the text file of C source code output of
       lex is written to standard output.

   Stderr
       If the -t option is specified informational, error and warning messages
       concerning the contents of lex source code  input  is  written  to  the
       standard error.


       If the -t option is not specified:

           1.     Informational error and warning messages concerning the con‐
                  tents of lex source code input  is  written  to  either  the
                  standard output or standard error.


           2.     If the -v option is specified and the -n option is not spec‐
                  ified, lex statistics is also  written  to  standard  error.
                  These  statistics  can  also be generated if table sizes are
                  specified with a % operator in the Definitions in  lex  sec‐
                  tion (see EXTENDED DESCRIPTION), as long as the -n option is
                  not specified.



   Output Files
       A text file containing C source code is written to lex.yy.c, or to  the
       standard output if the -t option is present.

EXTENDED DESCRIPTION
       Each  input  file contains lex source code, which is a table of regular
       expressions with corresponding actions in the form of C  program  frag‐
       ments.


       When lex.yy.c is compiled and linked with the lex library (using the -l
       l operand with c89 or cc), the resulting program reads character  input
       from  the  standard input and partitions it into strings that match the
       given expressions.


       When an expression is matched, these actions occur:

           o      The input string that was matched is left  in  yytext  as  a
                  null-terminated string; yytext is either an external charac‐
                  ter array or a pointer to a character string.  As  explained
                  in  Definitions  in lex, the type can be explicitly selected
                  using the %array or %pointer declarations, but  the  default
                  is %array.


           o      The  external int  yyleng is set to the length of the match‐
                  ing string.


           o      The expression's corresponding program fragment, or  action,
                  is executed.



       During  pattern matching, lex searches the set of patterns for the sin‐
       gle longest possible match. Among rules that match the same  number  of
       characters, the rule given first is chosen.


       The general format of lex source is:

         Definitions
         %%
         Rules
         %%
         User Subroutines



       The  first  %%  is required to mark the beginning of the rules (regular
       expressions and actions); the second %% is required only if  user  sub‐
       routines follow.


       Any line in the Definitions in lex section beginning with a blank char‐
       acter is assumed to be a C program fragment and is copied to the exter‐
       nal  definition  area  of the lex.yy.c file. Similarly, anything in the
       Definitions in lex section included between delimiter lines  containing
       only %{ and %} is also copied unchanged to the external definition area
       of the lex.yy.c file.


       Any such input (beginning with a blank character or within  %{  and  %}
       delimiter lines) appearing at the beginning of the Rules section before
       any rules are specified is written to lex.yy.c after  the  declarations
       of  variables  for the yylex function and before the first line of code
       in yylex. Thus, user variables local to yylex can be declared here,  as
       well as application code to execute upon entry to yylex.


       The  action  taken  by lex when encountering any input beginning with a
       blank character or within %{ and %} delimiter lines  appearing  in  the
       Rules  section  but  coming  after  one or more rules is undefined. The
       presence of such input can result in an  erroneous  definition  of  the
       yylex function.

   Definitions in lex
       Definitions  in  lex  appear before the first %% delimiter. Any line in
       this section not contained between %{ and %} lines  and  not  beginning
       with  a blank character is assumed to define a lex substitution string.
       The format of these lines is:

         name   substitute



       If a name does not meet the requirements for identifiers in the  ISO  C
       standard,  the  result is undefined. The string substitute replaces the
       string {  name  } when it is used in a rule. The name string is  recog‐
       nized  in  this  context  only when the braces are provided and when it
       does not appear within a bracket expression or within double-quotes.


       In the Definitions in lex section, any line beginning with a % (percent
       sign)  character  and  followed  by an alphanumeric word beginning with
       either s or S defines a set of start  conditions.  Any  line  beginning
       with  a % followed by a word beginning with either x or X defines a set
       of exclusive start conditions. When the generated scanner is  in  a  %s
       state,  patterns  with  no  state specified also active; in a %x state,
       such patterns are not active. The rest of the  line,  after  the  first
       word,  is  considered to be one or more blank-character-separated names
       of start conditions. Start condition names are constructed in the  same
       way  as  definition names. Start conditions can be used to restrict the
       matching of regular expressions to one or more states as  described  in
       Regular expressions in lex.


       Implementations  accept  either of the following two mutually exclusive
       declarations in the Definitions in lex section:

       %array      Declare the type of yytext to be a null-terminated  charac‐
                   ter array.


       %pointer    Declare the type of yytext to be a pointer to a null-termi‐
                   nated character string.



       When using the %pointer option, you cannot also use the yyless function
       to alter yytext.


       %array  is  the  default. If %array is specified (or neither %array nor
       %pointer is specified), then the correct way to make an external refer‐
       ence to yyext is with a declaration of the form:


       extern char  yytext[]


       If %pointer is specified, then the correct external reference is of the
       form:


       extern char *yytext;


       lex accepts declarations in the Definitions in lex section for  setting
       certain internal table sizes. The declarations are shown in the follow‐
       ing table.

       Table 1 Table Size Declaration in lex


       tab() box; cw(1.28i) cw(2.94i) cw(1.28i) cw(1.28i) lw(2.94i)  lw(1.28i)
       DeclarationDescriptionDefault _ %pnNumber of positions2500 %nnNumber of
       states500  %anNumber  of  transitions2000  %enNumber  of   parse   tree
       nodes1000  %knNumber  of  packed  character classes10000 %onSize of the
       output array3000



       Programs generated by lex need either the -e or  -w  option  to  handle
       input that contains EUC characters from supplementary codesets. If nei‐
       ther of these options is specified, yytext is of the type  char[],  and
       the generated program can handle only ASCII characters.


       When  the -e option is used, yytext is of the type unsigned  char[] and
       yyleng gives the total number of bytes in the matched string. With this
       option,  the  macros input(), unput(c), and output(c) should do a byte-
       based I/O in the same way as with the  regular  ASCII   lex.  Two  more
       variables  are available with the -e option, yywtext and yywleng, which
       behave the same as yytext and yyleng would under the -w option.


       When the -w option is used, yytext is of the type wchar_t[] and  yyleng
       gives the total number of characters in the matched string. If you sup‐
       ply your own input(), unput(c), or output(c) macros with  this  option,
       they must return or accept EUC characters in the form of wide character
       (wchar_t). This allows a different interface between your  program  and
       the lex internals, to expedite some programs.

   Rules in lex
       The Rules in lex source files are a table in which the left column con‐
       tains regular expressions and the right column contains actions (C pro‐
       gram fragments) to be executed when the expressions are recognized.

         ERE action
         ERE action
         ...



       The  extended  regular  expression  (ERE) portion of a row is separated
       from action by one or more blank characters. A regular expression  con‐
       taining  blank characters is recognized under one of the following con‐
       ditions:

           o      The entire expression appears within double-quotes.


           o      The blank characters appear within double-quotes  or  square
                  brackets.


           o      Each blank character is preceded by a backslash character.


   User Subroutines in lex
       Anything  in the user subroutines section is copied to lex.yy.c follow‐
       ing yylex.

   Regular Expressions in lex
       The lex utility supports the set of Extended Regular Expressions (EREs)
       described  on  regex(7)  with the following additions and exceptions to
       the syntax:

       ...

           Any string enclosed  in  double-quotes  represents  the  characters
           within  the  double-quotes  as  themselves,  except  that backslash
           escapes (which appear in the following table) are  recognized.  Any
           backslash-escape  sequence  is terminated by the closing quote. For
           example, "\01""1" represents a single string:  the  octal  value  1
           followed by the character 1.


       <state>r
       <state1, state2, ...>r

           The regular expression r is matched only when the program is in one
           of the start conditions indicated by state, state1, and  so  forth.
           For  more  information,  see Actions in lex. As an exception to the
           typographical conventions of the rest of  this  document,  in  this
           case  <state>  does  not  represent a metavariable, but the literal
           angle-bracket characters surrounding a symbol. The start  condition
           is  recognized  as  such only at the beginning of a regular expres‐
           sion.



       r/x

           The regular expression r is matched only if it is  followed  by  an
           occurrence of regular expression x. The token returned in yytext is
           only matched r. If the trailing portion of r matches the  beginning
           of  x,  the  result is unspecified. The r expression cannot include
           further trailing context or the $ (match-end-of-line)  operator;  x
           cannot include the ^ (match-beginning-of-line) operator, nor trail‐
           ing context, nor the $ operator. That is, only  one  occurrence  of
           trailing  context is allowed in a lex regular expression, and the ^
           operator only can be used at the beginning of such an expression. A
           further restriction is that the trailing-context operator / (slash)
           cannot be grouped within parentheses.


       {name}

           When name is one of the substitution symbols from  the  Definitions
           section, the string, including the enclosing braces, is replaced by
           the substitute value.  The  substitute  value  is  treated  in  the
           extended  regular expression as if it were enclosed in parentheses.
           No substitution occurs if {name} occurs within a bracket expression
           or within double-quotes.



       Within  an  ERE, a backslash character (\\, \a, \b, \f, \n, \r, \t, \v)
       is considered to begin an escape  sequence.  In  addition,  the  escape
       sequences in the following table is recognized.


       A  literal  newline  character  cannot  occur within an ERE; the escape
       sequence \n can be used to represent a  newline  character.  A  newline
       character cannot be matched by a period operator.

       Table 2 Escape Sequences in lex


       tab()  box; lw(0.79i) lw(2.36i) lw(2.36i) lw(0.79i) lw(2.36i) lw(2.36i)
       SequenceDescription Meaning _ \digitsT{ A backslash character  followed
       by  the  longest  sequence  of one, two or three octal-digit characters
       (01234567). If all of the digits are 0, (that is, representation of the
       NUL  character),  the  behavior is undefined.  T}T{ The character whose
       encoding is represented by the one-, two- or three-digit octal integer.
       Multibyte characters require multiple, concatenated escape sequences of
       this type, including the leading \ for each byte.  T}  _  \xdigitsT{  A
       backslash  character  followed  by the longest sequence of hexadecimal-
       digit characters (01234567abcdefABCDEF). If all of the  digits  are  0,
       (that  is,  representation of the NUL character), the behavior is unde‐
       fined.  T}T{ The character whose encoding is represented by  the  hexa‐
       decimal integer.  T} _ \cT{ A backslash character followed by any char‐
       acter not described in this table. (\\, \a, \b, \f, \en, \r,  \t,  \v).
       T}The character c, unchanged.



       The  order  of precedence given to extended regular expressions for lex
       is as shown in the following table, from high to low.


       The escaped characters entry is not meant to imply that these are oper‐
       ators,  but  they are included in the table to show their relationships
       to the true  operators.  The  start  condition,  trailing  context  and
       anchoring  notations  have  been  omitted from the table because of the
       placement restrictions described in this section; they can only  appear
       at the beginning or ending of an ERE.

       Table 3 ERE Precedence in lex


       tab()  box;  lw(2.75i) lw(2.75i) collation-related bracket symbols[= =]
       [: :] [. .]  escaped characters\<special character> bracket expression[
       ]  quoting"..."  grouping() definition{name} single-character RE dupli‐
       cation* + ?  concatenation interval expression{m,n} alternation|



       The ERE anchoring operators (^ and $) do not appear in the table.  With
       lex  regular  expressions, these operators are restricted in their use:
       the ^ operator can only be used at the beginning of an  entire  regular
       expression,  and the $ operator only at the end. The operators apply to
       the  entire  regular  expression.  Thus,  for  example,   the   pattern
       (^abc)|(def$)  is  undefined; it can instead be written as two separate
       rules, one with the regular expression ^abc and one  with  def$,  which
       share a common action via the special | action (see below). If the pat‐
       tern were written ^abc|def$, it would match either of abc or def  on  a
       line by itself.


       Unlike the general ERE rules, embedded anchoring is not allowed by most
       historical lex implementations. An example of embedded anchoring  would
       be for patterns such as (^)foo($) to match foo when it exists as a com‐
       plete word. This functionality can be obtained using existing lex  fea‐
       tures:

         ^foo/[ \n]|
         " foo"/[ \n]    /* found foo as a separate word */



       Notice  also  that $ is a form of trailing context (it is equivalent to
       /\n and as such cannot be  used  with  regular  expressions  containing
       another  instance  of  the  operator  (see  the preceding discussion of
       trailing context).


       The additional regular expressions trailing-context operator /  (slash)
       can be used as an ordinary character if presented within double-quotes,
       "/"; preceded by a backslash, \/; or within a bracket expression,  [/].
       The  start-condition < and > operators are special only in a start con‐
       dition at the beginning of a regular expression; elsewhere in the regu‐
       lar expression they are treated as ordinary characters.


       The  following  examples  clarify  the  differences between lex regular
       expressions and regular expressions appearing elsewhere in  this  docu‐
       ment. For regular expressions of the form r/x, the string matching r is
       always returned; confusion can arise when the beginning  of  x  matches
       the  trailing  portion  of r. For example, given the regular expression
       a*b/cc and the input aaabcc, yytext would contain the  string  aaab  on
       this  match. But given the regular expression x*/xy and the input xxxy,
       the token xxx, not xx, is returned by some implementations because  xxx
       matches x*.


       In  the  rule ab*/bc, the b* at the end of r extends r's match into the
       beginning of the trailing context, so the  result  is  unspecified.  If
       this  rule were ab/bc, however, the rule matches the text ab when it is
       followed by the text bc. In this latter case, the matching of r  cannot
       extend into the beginning of x, so the result is specified.

   Actions in lex
       The  action to be taken when an ERE is matched can be a C program frag‐
       ment or the special actions described below; the program  fragment  can
       contain one or more C statements, and can also include special actions.
       The empty C statement ; is a valid action; any string in  the  lex.yy.c
       input  that  matches  the pattern portion of such a rule is effectively
       ignored or skipped. However, the absence of an action is not valid, and
       the action lex takes in such a condition is undefined.


       The  specification  for  an  action, including C statements and special
       actions, can extend across several lines if enclosed in braces:

         ERE <one or more blanks> { program statement
         program statement }



       The default action when a string in the input to a lex.yy.c program  is
       not  matched  by  any  expression  is to copy the string to the output.
       Because the default behavior of a program generated by lex is  to  read
       the  input and copy it to the output, a minimal lex source program that
       has just %% generates a C program that simply copies the input  to  the
       output unchanged.


       Four special actions are available:

         |       ECHO;      REJECT;      BEGIN


       |

           The  action | means that the action for the next rule is the action
           for this rule. Unlike the other three actions, | cannot be enclosed
           in  braces  or be semicolon-terminated. It must be specified alone,
           with no other actions.


       ECHO;

           Writes the contents of the string yytext on the output.


       REJECT;

           Usually only a single expression is matched by a  given  string  in
           the  input.  REJECT  means  continue  to  the  next expression that
           matches the current input, and causes whatever rule was the  second
           choice  after  the  current rule to be executed for the same input.
           Thus, multiple rules can be matched  and  executed  for  one  input
           string or overlapping input strings. For example, given the regular
           expressions xyz and xy and the input xyz, usually only the  regular
           expression  xyz  would  match. The next attempted match would start
           after z. If the last action in the xyz rule is  REJECT,  both  this
           rule  and  the  xy rule would be executed. The REJECT action can be
           implemented in such a fashion that flow of control  does  not  con‐
           tinue  after it, as if it were equivalent to a goto to another part
           of yylex. The use of REJECT  can  result  in  somewhat  larger  and
           slower scanners.


       BEGIN

           The action:

           BEGIN  newstate;

           switches  the  state  (start  condition) to newstate. If the string
           newstate has not been declared previously as a start  condition  in
           the  Definitions  in  lex section, the results are unspecified. The
           initial state is indicated by the digit 0 or the token INITIAL.



       The functions or macros described below are  accessible  to  user  code
       included in the lex input. It is unspecified whether they appear in the
       C code output of lex, or are accessible only through the -l  l  operand
       to c89 or cc (the lex library).

       int yylex(void)

           Performs  lexical  analysis on the input; this is the primary func‐
           tion generated by the lex utility. The function returns  zero  when
           the  end  of input is reached; otherwise it returns non-zero values
           (tokens) determined by the actions that are selected.


       int yymore(void)

           When called, indicates that when the next input  string  is  recog‐
           nized,  it  is to be appended to the current value of yytext rather
           than replacing it; the value in yyleng is adjusted accordingly.


       int yyless(int n)

           Retains n initial characters in yytext, NUL-terminated, and  treats
           the remaining characters as if they had not been read; the value in
           yyleng is adjusted accordingly.


       int input(void)

           Returns the next character from the input, or zero on  end-of-file.
           It  obtains  input  from the stream pointer yyin, although possibly
           via an intermediate buffer. Thus,  once  scanning  has  begun,  the
           effect  of  altering  the value of yyin is undefined. The character
           read is removed from the input stream of the  scanner  without  any
           processing by the scanner.


       int unput(int c)

           Returns  the  character c to the input; yytext and yyleng are unde‐
           fined until the next expression is matched.  The  result  of  using
           unput for more characters than have been input is unspecified.



       The  following  functions  appear  only  in  the lex library accessible
       through the -l l operand; they can therefore be redefined by a portable
       application:

       int yywrap(void)

           Called  by  yylex at end-of-file; the default yywrap always returns
           1. If the application requires yylex to  continue  processing  with
           another  source  of input, then the application can include a func‐
           tion yywrap, which associates another file with the external  vari‐
           able FILE *yyin and returns a value of zero.


       int main(int argc, char *argv[])

           Calls  yylex to perform lexical analysis, then exits. The user code
           can contain main to perform application-specific operations,  call‐
           ing yylex as applicable.



       The  reason  for  breaking  these functions into two lists is that only
       those functions in libl.a can  be  reliably  redefined  by  a  portable
       application.


       Except  for input, unput and main, all external and static names gener‐
       ated by lex begin with the prefix yy or YY.

USAGE
       Portable applications are warned that in the Rules in lex  section,  an
       ERE  without  an  action is not acceptable, but need not be detected as
       erroneous by lex. This can result in compilation or runtime errors.


       The purpose of input is to take characters off  the  input  stream  and
       discard  them as far as the lexical analysis is concerned. A common use
       is to discard the body of a comment once the beginning of a comment  is
       recognized.


       The lex utility is not fully internationalized in its treatment of reg‐
       ular expressions in the lex source code or generated lexical  analyzer.
       It would seem desirable to have the lexical analyzer interpret the reg‐
       ular expressions given in the lex source according to  the  environment
       specified when the lexical analyzer is executed, but this is not possi‐
       ble with the current lex technology. Furthermore, the  very  nature  of
       the lexical analyzers produced by lex must be closely tied to the lexi‐
       cal requirements of the input language being described, which  is  fre‐
       quently  locale-specific anyway. (For example, writing an analyzer that
       is used for French text is not automatically be useful  for  processing
       other languages.)

EXAMPLES
       Example 1 Using lex



       The following is an example of a lex program that implements a rudimen‐
       tary scanner for a Pascal-like syntax:


         %{
         /* need this for the call to atof() below */
         #include <math.h>
         /* need this for printf(), fopen() and stdin below */
         #include <stdio.h>
         %}

         DIGIT    [0-9]
         ID       [a-z][a-z0-9]*
         %%

         {DIGIT}+               {
                                    printf("An integer: %s (%d)\n", yytext,
                                        atoi(yytext));
                                }

         {DIGIT}+"."{DIGIT}*    {
                                    printf("A float: %s (%g)\n", yytext,
                                        atof(yytext));
                                }

         if|then|begin|end|procedure|function        {
                                    printf("A keyword: %s\n", yytext);
                                }

         {ID}                   printf("An identifier: %s\n", yytext);

         "+"|"-"|"*"|"/"        printf("An operator: %s\n", yytext);

         "{"[^}\n]*"}"          /* eat up one-line comments */

         [ \t\n]+               /* eat up white space */

         .                      printf("Unrecognized character: %s\n", yytext);

         %%

         int main(int argc, char *argv[])
         {
                 ++argv, --argc;  /* skip over program name */
                 if (argc > 0)
                         yyin = fopen(argv[0], "r");
                 else
                         yyin = stdin;

                 yylex();
         }


ENVIRONMENT VARIABLES
       See environ(7) for descriptions of the following environment  variables
       that  affect  the execution of lex: LANG, LC_ALL, LC_COLLATE, LC_CTYPE,
       LC_MESSAGES, and NLSPATH.

EXIT STATUS
       The following exit values are returned:

       0     Successful completion.


       >0    An error occurred.


ATTRIBUTES
       See attributes(7) for descriptions of the following attributes:


       tab() box; cw(2.75i) |cw(2.75i) lw(2.75i) |lw(2.75i) ATTRIBUTE  TYPEAT‐
       TRIBUTE VALUE _ Availabilitydeveloper/base-developer-utilities _ Inter‐
       face StabilityCommitted _ StandardSee standards(7).


SEE ALSO
       yacc(1), attributes(7), environ(7), regex(7), standards(7)

NOTES
       If routines such as yyback(), yywrap(), and yylock() in .l (ell)  files
       are  to be external C functions, the command line to compile a C++ pro‐
       gram must define the __EXTERN_C__ macro. For example:

         example%  CC -D__EXTERN_C__ ... file




Oracle Solaris 11.4               11 May 2021                           lex(1)
맨 페이지 내용의 저작권은 맨 페이지 작성자에게 있습니다.
RSS ATOM XHTML 5 CSS3