charmap(7) 맨 페이지 - 윈디하나의 솔라나라

개요

섹션
맨 페이지 이름
검색(S)

charmap(7)

Standards, Environments, Macros, Character Sets, and miscellany
                                                                    charmap(7)



NAME
       charmap - character set description file

DESCRIPTION
       A character set description file or charmap defines characteristics for
       a coded character set. Other information about the coded character  set
       may  also  be  in  the  file.  Coded character set character values are
       defined using symbolic character names followed by  character  encoding
       values.


       The character set description file provides:

           o      The capability to describe character set attributes (such as
                  collation order or character classes) independent of charac‐
                  ter  set encoding, and using only the characters in the por‐
                  table character  set.  This  makes  it  possible  to  create
                  generic  localedef(1)  source  files  for  all codesets that
                  share the portable character set.


           o      Standardized symbolic names for all characters in the porta‐
                  ble  character  set, making it possible to refer to any such
                  character regardless of encoding.


   Symbolic Names
       Each symbolic name is included in the file and is mapped  to  a  unique
       encoding  value  (except  for  those symbolic names that are shown with
       identical glyphs). If the control characters commonly  associated  with
       the  symbolic  names in the following table are supported by the imple‐
       mentation, the symbolic names and their corresponding  encoding  values
       are  included  in  the  file. Some of the encodings associated with the
       symbolic names in this table may be the same as characters in the  por‐
       table character set table.


       tab()  box; lw(0.92i) lw(0.92i) lw(0.92i) lw(0.92i) lw(0.92i) lw(0.92i)
       <ACK><DC2><ENQ><FS><IS4><SOH>              <BEL><DC3><EOT><GS><LF><STX>
       <BS><DC4><ESC><HT><NAK><SUB>              <CAN><DEL><ETB><IS1><RS><SYN>
       <CR><DLE><ETX><IS2><SI><US> <DC1><EM><FF><IS3><SO><VT>


   Declarations
       The following declarations can precede the character definitions.  Each
       must  consist  of  the  symbol shown in the following list, starting in
       column 1, including the surrounding brackets, followed by one  or  more
       blank characters, followed by the value to be assigned to the symbol.

       <code_set_name>    The  name  of  the coded character set for which the
                          character set description file is defined.


       <mb_cur_max>       The maximum number of bytes in a  multibyte  charac‐
                          ter. This defaults to 1.


       <mb_cur_min>       An  unsigned positive integer value that defines the
                          minimum number of  bytes  in  a  character  for  the
                          encoded character set.


       <escape_char>      The escape character used to indicate that the char‐
                          acters following will be interpreted  in  a  special
                          way, as defined later in this section. This defaults
                          to backslash ('\'), which  is  the  character  glyph
                          used  in all the following text and examples, unless
                          otherwise noted.


       <comment_char>     The character that when placed  in  column  1  of  a
                          charmap  line,  is used to indicate that the line is
                          to be ignored. The default character is  the  number
                          sign (#).


   Format
       The character set mapping definitions will be all the lines immediately
       following an identifier line containing the string CHARMAP starting  in
       column  1,  and  preceding  a  trailer  line  containing the string END
       CHARMAP starting in column 1. Empty lines and lines containing a  <com‐
       ment_char>  in  the first column will be ignored. Each non-comment line
       of the character set mapping definition, that is, between  the  CHARMAP
       and END CHARMAP lines of the file), must be in either of two forms:

         "%s %s %s\n",<symbolic-name>,<encoding>,<comments>



       or

         "%s...%s %s %s\n",<symbolic-name>,<symbolic-name>, <encoding>,\
                  <comments>



       In  the  first format, the line in the character set mapping definition
       defines a single symbolic name and a corresponding encoding. A  charac‐
       ter  following  an escape character is interpreted as itself; for exam‐
       ple, the sequence "<\\\>>" represents the symbolic name  "\>"  enclosed
       between angle brackets.


       In  the second format, the line in the character set mapping definition
       defines a range of one or more symbolic names. In this form,  the  sym‐
       bolic  names  must consist of zero or more non-numeric characters, fol‐
       lowed by an integer formed by one or more decimal digits.  The  charac‐
       ters preceding the integer must be identical in the two symbolic names,
       and the integer formed by the digits in the second symbolic  name  must
       be  equal  to  or  greater than the integer formed by the digits in the
       first name. This is interpreted as a series of  symbolic  names  formed
       from the common part and each of the integers between the first and the
       second integer, inclusive. As an example, <j0101>...<j0104>  is  inter‐
       preted as the symbolic names <j0101>, <j0102>, <j0103>, and <j0104>, in
       that order.


       A character set mapping definition line must  exist  for  all  symbolic
       names and must define the coded character value that corresponds to the
       character glyph indicated in the table, or the  coded  character  value
       that  corresponds with the control character symbolic name. If the con‐
       trol characters commonly associated with the symbolic  names  are  sup‐
       ported  by  the implementation, the symbolic name and the corresponding
       encoding value must be included in the file. Additional unique symbolic
       names  may  be  included. A coded character value can be represented by
       more than one symbolic name.


       The encoding part is expressed as one (for single-byte  character  val‐
       ues)  or  more  concatenated decimal, octal or hexadecimal constants in
       the following formats:

         "%cd%d",<escape_char>,<decimal byte value>

         "%cx%x",<escape_char>,<hexadecimal byte value>

         "%c%o",<escape_char>,<octal byte value>


   Decimal Constants
       Decimal constants must be represented by two or three  decimal  digits,
       preceded  by the escape character and the lowercase letter d; for exam‐
       ple, \d05, \d97, or \d143. Hexadecimal constants must be represented by
       two hexadecimal digits, preceded by the escape character and the lower‐
       case letter x; for example, \x05, \x61, or \x8f. Octal  constants  must
       be  represented  by  two  or three octal digits, preceded by the escape
       character; for example, \05, \141, or \217. In a portable charmap file,
       each  constant must represent an 8-bit byte. Implementations supporting
       other byte sizes may allow constants to represent  values  larger  than
       those  that  can be represented in 8-bit bytes, and to allow additional
       digits in constants. When  constants  are  concatenated  for  multibyte
       character  values,  they  must  be of the same type, and interpreted in
       byte order from first to last with the least significant  byte  of  the
       multibyte character specified by the last constant.

   Ranges of Symbolic Names
       In  lines  defining  ranges of symbolic names, the encoded value is the
       value for the first symbolic name in the range (the symbolic name  pre‐
       ceding  the  ellipsis).  Subsequent symbolic names defined by the range
       will have encoding values in increasing order.  Bytes  are  treated  as
       unsigned  octets and carry is propagated between the bytes as necessary
       to represent the range. However, because this causes a null byte in the
       second  or  subsequent  bytes of a character, such a declaration should
       not be specified. For example, the line

         <j0101>...<j0104>     \d129\d254



       is interpreted as:

         <j0101>                \d129\d254
         <j0102>                \d129\d255
         <j0103>                \d130\d00
         <j0104>                \d130\d01



       The expanded declaration of the symbol <j0103> in the above example  is
       an invalid specification, because it contains a null byte in the second
       byte of a character.


       The comment is optional.

   Width Specification
       The following declarations can follow the character set mapping defini‐
       tions (after the "END CHARMAP" statement). Each consists of the keyword
       shown in the following list, starting in  column  1,  followed  by  the
       value(s) to be associated to the keyword, as defined below.

       WIDTH            A non-negative integer value defining the column width
                        for the printable character in the coded character set
                        mapping  definitions.  Coded  character  set character
                        values are defined using symbolic character names fol‐
                        lowed  by  column  width  values. Defining a character
                        with more than one WIDTH produces  undefined  results.
                        The  END  WIDTH keyword is used to terminate the WIDTH
                        definitions. Specifying the width of  a  non-printable
                        character  in  a  WIDTH declaration produces undefined
                        results.


       WIDTH_DEFAULT    A non-negative integer value defining the default col‐
                        umn  width  for  any printable character not listed by
                        one of the WIDTH keywords. If no WIDTH_DEFAULT keyword
                        is  included  in  the  charmap,  the default character
                        width is 1.



       Example:


       After the "END CHARMAP" statement, a  syntax  for  a  width  definition
       would be:

         WIDTH
         <A>             1
         <B>             1
         <C>...<Z>       1
         ...
         <fool>...<foon> 2
         ...
         END WIDTH



       In  this  example,  the  numerical code point values represented by the
       symbols <A> and <B> are assigned a width of 1. The code point values  <
       C>  to  <Z>  inclusive,  that  is,  <C>,  <D>, <E>, and so on, are also
       assigned a width of 1. Using <A>. .  .<Z>  would  have  required  fewer
       lines,  but  the  alternative was shown to demonstrate flexibility. The
       keyword WIDTH_DEFAULT could have been added as appropriate.

SEE ALSO
       locale(1), localedef(1), nl_langinfo(3C), extensions(7), locale(7)



Oracle Solaris 11.4               11 May 2021                       charmap(7)
맨 페이지 내용의 저작권은 맨 페이지 작성자에게 있습니다.
RSS ATOM XHTML 5 CSS3