lex(1) 맨 페이지 - 윈디하나의 솔라나라

개요

섹션
맨 페이지 이름
검색(S)

lex(1)

lex(1)                           User Commands                          lex(1)



NAME
       lex - generate programs for lexical tasks

SYNOPSIS
       lex [-cntv] [-e | -w] [-V -Q [y | n]] [file]...


DESCRIPTION
       The  lex  utility generates C programs to be used in lexical processing
       of character input, and that can be used as an interface to yacc. The C
       programs  are  generated  from lex source code and conform to the ISO C
       standard. Usually, the lex utility writes the program it  generates  to
       the  file  lex.yy.c. The state of this file is unspecified if lex exits
       with a non-zero exit status. See EXTENDED DESCRIPTION  for  a  complete
       description of the lex input language.

OPTIONS
       The following options are supported:

       -c         Indicates C-language action (default option).


       -e         Generates  a  program that can handle EUC characters (cannot
                  be used with the -w option). yytext[] is  of  type  unsigned
                  char[].


       -n         Suppresses  the  summary  of statistics usually written with
                  the -v option. If no table sizes are specified  in  the  lex
                  source  code  and the -v option is not specified, then -n is
                  implied.


       -t         Writes the resulting program to standard output  instead  of
                  lex.yy.c.


       -v         Writes  a  summary  of lex statistics to the standard error.
                  (See the discussion of lex table  sizes  under  the  heading
                  Definitions in lex.) If table sizes are specified in the lex
                  source code, and if the -n option is not specified,  the  -v
                  option can be enabled.


       -w         Generates  a  program that can handle EUC characters (cannot
                  be used with the -e option). Unlike the -e option,  yytext[]
                  is of type wchar_t[].


       -V         Prints out version information on standard error.


       -Q[y|n]    Prints  out  version  information to output file lex.yy.c by
                  using -Qy. The -Qn option does not print out version  infor‐
                  mation and is the default.


OPERANDS
       The following operand is supported:

       file    A  pathname  of  an  input  file. If more than one such file is
               specified, all files is concatenated to produce  a  single  lex
               program.  If no file operands are specified, or if a file oper‐
               and is −, the standard input is used.


OUTPUT
       The lex output files are described below.

   Stdout
       If the -t option is specified, the text file of C source code output of
       lex is written to standard output.

   Stderr
       If the -t option is specified informational, error and warning messages
       concerning the contents of lex source code  input  is  written  to  the
       standard error.


       If the -t option is not specified:

           1.     Informational error and warning messages concerning the con‐
                  tents of lex source code input  is  written  to  either  the
                  standard output or standard error.

           2.     If the -v option is specified and the -n option is not spec‐
                  ified, lex statistics is also  written  to  standard  error.
                  These  statistics  can  also be generated if table sizes are
                  specified with a % operator in the Definitions in  lex  sec‐
                  tion (see EXTENDED DESCRIPTION), as long as the -n option is
                  not specified.

   Output Files
       A text file containing C source code is written to lex.yy.c, or to  the
       standard output if the -t option is present.

EXTENDED DESCRIPTION
       Each  input  file contains lex source code, which is a table of regular
       expressions with corresponding actions in the form of C  program  frag‐
       ments.


       When lex.yy.c is compiled and linked with the lex library (using the -l
       l operand with c89 or cc), the resulting program reads character  input
       from  the  standard input and partitions it into strings that match the
       given expressions.


       When an expression is matched, these actions occur:

           o      The input string that was matched is left  in  yytext  as  a
                  null-terminated string; yytext is either an external charac‐
                  ter array or a pointer to a character string.  As  explained
                  in  Definitions  in lex, the type can be explicitly selected
                  using the %array or %pointer declarations, but  the  default
                  is %array.

           o      The external int yyleng is set to the length of the matching
                  string.

           o      The expression's corresponding program fragment, or  action,
                  is executed.


       During  pattern matching, lex searches the set of patterns for the sin‐
       gle longest possible match. Among rules that match the same  number  of
       characters, the rule given first is chosen.


       The general format of lex source is:

         Definitions
         %%
         Rules
         %%
         User Subroutines



       The  first  %%  is required to mark the beginning of the rules (regular
       expressions and actions); the second %% is required only if  user  sub‐
       routines follow.


       Any line in the Definitions in lex section beginning with a blank char‐
       acter is assumed to be a C program fragment and is copied to the exter‐
       nal  definition  area  of the lex.yy.c file. Similarly, anything in the
       Definitions in lex section included between delimiter lines  containing
       only %{ and %} is also copied unchanged to the external definition area
       of the lex.yy.c file.


       Any such input (beginning with a blank character or within  %{  and  %}
       delimiter lines) appearing at the beginning of the Rules section before
       any rules are specified is written to lex.yy.c after  the  declarations
       of  variables  for the yylex function and before the first line of code
       in yylex. Thus, user variables local to yylex can be declared here,  as
       well as application code to execute upon entry to yylex.


       The  action  taken  by lex when encountering any input beginning with a
       blank character or within %{ and %} delimiter lines  appearing  in  the
       Rules  section  but  coming  after  one or more rules is undefined. The
       presence of such input can result in an  erroneous  definition  of  the
       yylex function.

   Definitions in lex
       Definitions  in  lex  appear before the first %% delimiter. Any line in
       this section not contained between %{ and %} lines  and  not  beginning
       with  a blank character is assumed to define a lex substitution string.
       The format of these lines is:

         name   substitute




       If a name does not meet the requirements for identifiers in the  ISO  C
       standard,  the  result is undefined. The string substitute replaces the
       string { name } when it is used in a rule. The name  string  is  recog‐
       nized  in  this  context  only when the braces are provided and when it
       does not appear within a bracket expression or within double-quotes.


       In the Definitions in lex section, any line beginning with a % (percent
       sign)  character  and  followed  by an alphanumeric word beginning with
       either s or S defines a set of start  conditions.  Any  line  beginning
       with  a % followed by a word beginning with either x or X defines a set
       of exclusive start conditions. When the generated scanner is  in  a  %s
       state,  patterns  with  no  state specified also active; in a %x state,
       such patterns are not active. The rest of the  line,  after  the  first
       word,  is  considered to be one or more blank-character-separated names
       of start conditions. Start condition names are constructed in the  same
       way  as  definition names. Start conditions can be used to restrict the
       matching of regular expressions to one or more states as  described  in
       Regular expressions in lex.


       Implementations  accept  either of the following two mutually exclusive
       declarations in the Definitions in lex section:

       %array      Declare the type of yytext to be a null-terminated  charac‐
                   ter array.


       %pointer    Declare the type of yytext to be a pointer to a null-termi‐
                   nated character string.



       When using the %pointer option, you cannot also use the yyless function
       to alter yytext.


       %array  is  the  default. If %array is specified (or neither %array nor
       %pointer is specified), then the correct way to make an external refer‐
       ence to yyext is with a declaration of the form:


       extern char yytext[]


       If %pointer is specified, then the correct external reference is of the
       form:


       extern char *yytext;


       lex accepts declarations in the Definitions in lex section for  setting
       certain internal table sizes. The declarations are shown in the follow‐
       ing table.


       Table Size Declaration in lex




       tab() box; cw(1.28i) cw(2.94i) cw(1.28i) lw(1.28i) lw(2.94i) lw(1.28i)
       DeclarationDescriptionDefault _ %pnNumber of positions2500 %nnNum‐
       ber of states500 %anNumber of transitions2000  %enNumber  of  parse
       tree  nodes1000  %knNumber of packed character classes10000 %onSize
       of the output array3000



       Programs generated by lex need either the -e or  -w  option  to  handle
       input that contains EUC characters from supplementary codesets. If nei‐
       ther of these options is specified, yytext is of the type  char[],  and
       the generated program can handle only ASCII characters.


       When  the  -e option is used, yytext is of the type unsigned char[] and
       yyleng gives the total number of bytes in the matched string. With this
       option,  the  macros input(), unput(c), and output(c) should do a byte-
       based I/O in the same way as with the regular ASCII lex. Two more vari‐
       ables  are  available  with  the  -e option, yywtext and yywleng, which
       behave the same as yytext and yyleng would under the -w option.


       When the -w option is used, yytext is of the type wchar_t[] and  yyleng
       gives  the  total  number  of characters in the matched string.  If you
       supply your own  input(),  unput(c),  or  output(c)  macros  with  this
       option,  they  must return or accept EUC characters in the form of wide
       character (wchar_t). This allows a  different  interface  between  your
       program and the lex internals, to expedite some programs.

   Rules in lex
       The Rules in lex source files are a table in which the left column con‐
       tains regular expressions and the right column contains actions (C pro‐
       gram fragments) to be executed when the expressions are recognized.

         ERE action
         ERE action
         ...



       The  extended  regular  expression  (ERE) portion of a row is separated
       from action by one or more blank characters. A regular expression  con‐
       taining  blank characters is recognized under one of the following con‐
       ditions:

           o      The entire expression appears within double-quotes.

           o      The blank characters appear within double-quotes  or  square
                  brackets.

           o      Each blank character is preceded by a backslash character.

   User Subroutines in lex
       Anything  in the user subroutines section is copied to lex.yy.c follow‐
       ing yylex.

   Regular Expressions in lex
       The lex utility supports the set of Extended Regular Expressions (EREs)
       described  on  regex(5)  with the following additions and exceptions to
       the syntax:

       ...          Any string enclosed in double-quotes represents the  char‐
                    acters within the double-quotes as themselves, except that
                    backslash escapes (which appear in  the  following  table)
                    are  recognized.  Any  backslash-escape sequence is termi‐
                    nated by the closing quote. For example,  "\01""1"  repre‐
                    sents  a  single string: the octal value 1 followed by the
                    character 1.



       <state>r

       <state1, state2, ...>r

           The regular expression r is matched only when the program is in one
           of  the  start conditions indicated by state, state1, and so forth.
           For more information, see Actions in lex. As an  exception  to  the
           typographical  conventions  of  the  rest of this document, in this
           case <state> does not represent a  metavariable,  but  the  literal
           angle-bracket  characters surrounding a symbol. The start condition
           is recognized as such only at the beginning of  a  regular  expres‐
           sion.


       r/x

           The  regular  expression  r is matched only if it is followed by an
           occurrence of regular expression x. The token returned in yytext is
           only  matched r. If the trailing portion of r matches the beginning
           of x, the result is unspecified. The r  expression  cannot  include
           further  trailing  context or the $ (match-end-of-line) operator; x
           cannot include the ^ (match-beginning-of-line) operator, nor trail‐
           ing  context,  nor  the $ operator. That is, only one occurrence of
           trailing context is allowed in a lex regular expression, and the  ^
           operator only can be used at the beginning of such an expression. A
           further restriction is that the trailing-context operator / (slash)
           cannot be grouped within parentheses.


       {name}

           When  name  is one of the substitution symbols from the Definitions
           section, the string, including the enclosing braces, is replaced by
           the  substitute  value.  The  substitute  value  is  treated in the
           extended regular expression as if it were enclosed in  parentheses.
           No substitution occurs if {name} occurs within a bracket expression
           or within double-quotes.



       Within an ERE, a backslash character (\\, \a, \b, \f, \n, \r,  \t,  \v)
       is  considered  to  begin  an  escape sequence. In addition, the escape
       sequences in the following table is recognized.


       A literal newline character cannot occur  within  an  ERE;  the  escape
       sequence  \n  can  be  used to represent a newline character. A newline
       character cannot be matched by a period operator.


       Escape Sequences in lex




       tab() box; cw(1.22i) cw(2.92i) cw(1.36i) cw(1.22i) cw(2.92i) cw(1.36i)
       Escape  Sequences in lex _ Escape SequenceDescription Meaning _ \dig‐
       itsT{ A backslash character followed by the longest sequence  of  one,
       two or three octal-digit characters (01234567). Ifall of the digits are
       0, (that is, representation of the  NUL  character),  the  behavior  is
       undefined.   T}T{  The  character whose encoding is represented by the
       one-, two- or three-digit octal integer. Multi-byte characters  require
       multiple,  concatenated  escape  sequences  of this type, including the
       leading \ for each byte.  T} _ \xdigitsT{ A backslash  character  fol‐
       lowed   by   the   longest  sequence  of  hexadecimal-digit  characters
       (01234567abcdefABCDEF). If all of the digits are 0, (that is, represen‐
       tation  of  the  NUL  character), the behavior is undefined.  T}T{ The
       character whose encoding is represented by the hexadecimal integer.  T}
       _  \cT{  A backslash character followed by any character not described
       in this table.  (\\, \a, \b, \f, \en, \r, \t, \v).  T}The character c,
       unchanged.



       The  order  of precedence given to extended regular expressions for lex
       is as shown in the following table, from high to low.


       The escaped characters entry is not meant to imply that these are oper‐
       ators,  but  they are included in the table to show their relationships
       to the true  operators.  The  start  condition,  trailing  context  and
       anchoring  notations  have  been  omitted from the table because of the
       placement restrictions described in this section; they can only  appear
       at the beginning or ending of an ERE.




       tab()  box;  cw(2.75i) cw(2.75i) lw(2.75i) lw(2.75i) ERE Precedence in
       lex _ collation-related bracket symbols[= =]  [: :]   [.  .]   escaped
       characters\<special  character>  bracket  expression[ ] quoting"..."
       grouping() definition{name} single-character  RE  duplication*  +  ?
       concatenation interval expression{m,n} alternation|



       The  ERE anchoring operators (^ and $) do not appear in the table. With
       lex regular expressions, these operators are restricted in  their  use:
       the  ^  operator can only be used at the beginning of an entire regular
       expression, and the $ operator only at the end. The operators apply  to
       the   entire   regular  expression.  Thus,  for  example,  the  pattern
       (^abc)|(def$) is undefined; it can instead be written as  two  separate
       rules,  one  with  the regular expression ^abc and one with def$, which
       share a common action via the special | action (see below). If the pat‐
       tern  were  written ^abc|def$, it would match either of abc or def on a
       line by itself.


       Unlike the general ERE rules, embedded anchoring is not allowed by most
       historical  lex implementations. An example of embedded anchoring would
       be for patterns such as (^)foo($) to match foo when it exists as a com‐
       plete  word. This functionality can be obtained using existing lex fea‐
       tures:

         ^foo/[ \n]|
         " foo"/[ \n]    /* found foo as a separate word */



       Notice also that $ is a form of trailing context (it is  equivalent  to
       /\n  and  as  such  cannot  be used with regular expressions containing
       another instance of the  operator  (see  the  preceding  discussion  of
       trailing context).


       The  additional regular expressions trailing-context operator / (slash)
       can be used as an ordinary character if presented within double-quotes,
       "/";  preceded by a backslash, \/; or within a bracket expression, [/].
       The start-condition < and > operators are special only in a start  con‐
       dition at the beginning of a regular expression; elsewhere in the regu‐
       lar expression they are treated as ordinary characters.


       The following examples clarify  the  differences  between  lex  regular
       expressions  and  regular expressions appearing elsewhere in this docu‐
       ment. For regular expressions of the form r/x, the string matching r is
       always  returned;  confusion  can arise when the beginning of x matches
       the trailing portion of r. For example, given  the  regular  expression
       a*b/cc  and  the  input aaabcc, yytext would contain the string aaab on
       this match. But given the regular expression x*/xy and the input  xxxy,
       the  token xxx, not xx, is returned by some implementations because xxx
       matches x*.


       In the rule ab*/bc, the b* at the end of r extends r's match  into  the
       beginning  of  the  trailing  context, so the result is unspecified. If
       this rule were ab/bc, however, the rule matches the text ab when it  is
       followed  by the text bc. In this latter case, the matching of r cannot
       extend into the beginning of x, so the result is specified.

   Actions in lex
       The action to be taken when an ERE is matched can be a C program  frag‐
       ment  or  the special actions described below; the program fragment can
       contain one or more C statements, and can also include special actions.
       The  empty  C statement ; is a valid action; any string in the lex.yy.c
       input that matches the pattern portion of such a  rule  is  effectively
       ignored or skipped. However, the absence of an action is not valid, and
       the action lex takes in such a condition is undefined.


       The specification for an action, including  C  statements  and  special
       actions, can extend across several lines if enclosed in braces:

         ERE <one or more blanks> { program statement
         program statement }




       The  default action when a string in the input to a lex.yy.c program is
       not matched by any expression is to copy  the  string  to  the  output.
       Because  the  default behavior of a program generated by lex is to read
       the input and copy it to the output, a minimal lex source program  that
       has  just  %% generates a C program that simply copies the input to the
       output unchanged.


       Four special actions are available:

         |       ECHO;      REJECT;      BEGIN



       |          The action | means that the action for the next rule is  the
                  action for this rule. Unlike the other three actions, | can‐
                  not be enclosed in braces  or  be  semicolon-terminated.  It
                  must be specified alone, with no other actions.


       ECHO;      Writes the contents of the string yytext on the output.


       REJECT;    Usually  only  a  single  expression  is  matched by a given
                  string in the input.  REJECT  means  continue  to  the  next
                  expression  that matches the current input, and causes what‐
                  ever rule was the second choice after the current rule to be
                  executed  for  the  same  input. Thus, multiple rules can be
                  matched and executed for one  input  string  or  overlapping
                  input  strings.  For  example, given the regular expressions
                  xyz and xy and the  input  xyz,  usually  only  the  regular
                  expression  xyz  would match. The next attempted match would
                  start after z. If the last action in the xyz rule is  REJECT
                  ,  both  this  rule  and  the xy rule would be executed. The
                  REJECT action can be implemented in such a fashion that flow
                  of control does not continue after it, as if it were equiva‐
                  lent to a goto to another part of yylex. The use  of  REJECT
                  can result in somewhat larger and slower scanners.


       BEGIN      The action:

                  BEGIN newstate;

                  switches  the  state  (start  condition) to newstate. If the
                  string newstate has not been declared previously as a  start
                  condition in the Definitions in lex section, the results are
                  unspecified. The initial state is indicated by the  digit  0
                  or the token INITIAL.



       The  functions  or  macros  described below are accessible to user code
       included in the lex input. It is unspecified whether they appear in the
       C  code  output of lex, or are accessible only through the -l l operand
       to c89 or cc (the lex library).

       int yylex(void)     Performs lexical analysis on the input; this is the
                           primary  function generated by the lex utility. The
                           function returns zero when  the  end  of  input  is
                           reached;   otherwise  it  returns  non-zero  values
                           (tokens)  determined  by  the  actions   that   are
                           selected.


       int yymore(void)    When  called,  indicates  that  when the next input
                           string is recognized, it is to be appended  to  the
                           current  value  of yytext rather than replacing it;
                           the value in yyleng is adjusted accordingly.


       intyyless(int n)    Retains n initial characters in yytext,  NUL-termi‐
                           nated,  and  treats  the remaining characters as if
                           they had not been read;  the  value  in  yyleng  is
                           adjusted accordingly.


       int input(void)     Returns  the next character from the input, or zero
                           on end-of-file. It obtains input  from  the  stream
                           pointer yyin, although possibly via an intermediate
                           buffer. Thus, once scanning has begun,  the  effect
                           of  altering  the  value  of yyin is undefined. The
                           character read is removed from the input stream  of
                           the scanner without any processing by the scanner.


       int unput(int c)    Returns  the  character  c to the input; yytext and
                           yyleng are undefined until the next  expression  is
                           matched. The result of using unput for more charac‐
                           ters than have been input is unspecified.



       The following functions appear  only  in  the  lex  library  accessible
       through the -l l operand; they can therefore be redefined by a portable
       application:

       int yywrap(void)

           Called by yylex at end-of-file; the default yywrap  always  returns
           1.  If  the  application requires yylex to continue processing with
           another source of input, then the application can include  a  func‐
           tion  yywrap, which associates another file with the external vari‐
           able FILE *yyin and returns a value of zero.


       int main(int argc, char *argv[])

           Calls yylex to perform lexical analysis, then exits. The user  code
           can  contain main to perform application-specific operations, call‐
           ing yylex as applicable.



       The reason for breaking these functions into two  lists  is  that  only
       those  functions  in  libl.a  can  be  reliably redefined by a portable
       application.


       Except for input, unput and main, all external and static names  gener‐
       ated by lex begin with the prefix yy or YY.

USAGE
       Portable  applications  are warned that in the Rules in lex section, an
       ERE without an action is not acceptable, but need not  be  detected  as
       erroneous by lex. This can result in compilation or run-time errors.


       The  purpose  of  input  is to take characters off the input stream and
       discard them as far as the lexical analysis is concerned. A common  use
       is  to discard the body of a comment once the beginning of a comment is
       recognized.


       The lex utility is not fully internationalized in its treatment of reg‐
       ular  expressions in the lex source code or generated lexical analyzer.
       It would seem desirable to have the lexical analyzer interpret the reg‐
       ular  expressions  given in the lex source according to the environment
       specified when the lexical analyzer is executed, but this is not possi‐
       ble  with  the  current lex technology. Furthermore, the very nature of
       the lexical analyzers produced by lex must be closely tied to the lexi‐
       cal  requirements  of the input language being described, which is fre‐
       quently locale-specific anyway. (For example, writing an analyzer  that
       is  used  for French text is not automatically be useful for processing
       other languages.)

EXAMPLES
       Example 1 Using lex


       The following is an example of a lex program that implements a rudimen‐
       tary scanner for a Pascal-like syntax:


         %{
         /* need this for the call to atof() below */
         #include <math.h>
         /* need this for printf(), fopen() and stdin below */
         #include <stdio.h>
         %}

         DIGIT    [0-9]
         ID       [a-z][a-z0-9]*
         %%

         {DIGIT}+  {
                                printf("An integer: %s (%d)\n", yytext,
                                atoi(yytext));
                                }

         {DIGIT}+"."{DIGIT}*    {
                                printf("A float: %s (%g)\n", yytext,
                                atof(yytext));
                                }

         if|then|begin|end|procedure|function        {
                                printf("A keyword: %s\n", yytext);
                                }

         {ID}                   printf("An identifier: %s\n", yytext);

         "+"|"-"|"*"|"/"        printf("An operator: %s\n", yytext);

         "{"[^}\n]*"}"         /* eat up one-line comments */

         [ \t\n]+               /* eat up white space */

         .                      printf("Unrecognized character: %s\n", yytext);

         %%

         int main(int argc, char *argv[])
         {
                               ++argv, --argc;  /* skip over program name */
                               if (argc > 0)
                                     yyin = fopen(argv[0], "r");
                               else
                               yyin = stdin;

                               yylex();
         }



ENVIRONMENT VARIABLES
       See  environ(5) for descriptions of the following environment variables
       that affect the execution of lex: LANG, LC_ALL,  LC_COLLATE,  LC_CTYPE,
       LC_MESSAGES, and NLSPATH.

EXIT STATUS
       The following exit values are returned:

       0     Successful completion.


       >0    An error occurred.


ATTRIBUTES
       See attributes(5) for descriptions of the following attributes:




       tab()   box;   cw(2.75i)  |cw(2.75i)  lw(2.75i)  |lw(2.75i)  ATTRIBUTE
       TYPEATTRIBUTE VALUE _  Availabilitydeveloper/base-developer-utilities
       _ Interface StabilityCommitted _ StandardSee standards(5).


SEE ALSO
       yacc(1), attributes(5), environ(5), regex(5), standards(5)

NOTES
       If  routines such as yyback(), yywrap(), and yylock() in .l (ell) files
       are to be external C functions, the command line to compile a C++  pro‐
       gram must define the __EXTERN_C__ macro. For example:

         example%  CC -D__EXTERN_C__ ... file





SunOS 5.11                        8 Jun 2011                            lex(1)
맨 페이지 내용의 저작권은 맨 페이지 작성자에게 있습니다.
RSS ATOM XHTML 5 CSS3