svcadm(1M)을 검색하려면 섹션에서 1M 을 선택하고, 맨 페이지 이름에 svcadm을 입력하고 검색을 누른다.
iconv_unicode(7)
Standards, Environments, Macros, Character Sets, and miscellany
iconv_unicode(7)
NAME
iconv_unicode - codeset conversion for Unicode
DESCRIPTION
The table below lists the names and descriptions of the supported Uni‐
code encodings or encoding schemes (byte serializations of Unicode
encoding forms) that can be used as fromcode or tocode parameters to
iconv(1), iconv_open(3C), and cconv_open(3C). There are also aliases
such as FSS-UTF, UTF8, and so on.
Available iconv and cconv conversions in the current system including
aliases and optional variant levels can be obtained by running the
iconv -l command as described in the iconv(1) manual page.
For additional information on the mappings between canonical names and
supported aliases with optional variant levels, refer to the alias(5)
manual page and also the /usr/lib/iconv/alias file.
tab() box; lw(0.92i) |lw(4.58i) lw(0.92i) |lw(4.58i) Encoding FormDe‐
scription _ UTF-8T{ Multibyte sequences of 1-4 character bytes T} _
UTF-16T{ Represented in 16-bit entity for U+0000-U+D7FF and
U+E000-U+FFFF, and two 16-bit entities for U+10000-U+10FFFF. Is in the
platforms default byte ordering and includes the Byte Order Mark (BOM).
See below for a description on the BOM. T} _ UTF-16-INTERNALUTF-16,
without BOM _ UTF-16BET{ UTF-16 in the big-endian byte ordering, with‐
out BOM T} _ UTF-16-BIG-ENDIANT{ UTF-16 in the big-endian byte order‐
ing, including BOM T} _ UTF-16LET{ UTF-16 in the little-endian byte
ordering, without BOM T} _ UTF-16-LITTLE-ENDIANT{ UTF-16 in the little-
endian byte ordering, including BOM T} _ UTF-16-SWAPPEDT{ UTF-16 with
endianness opposite to that of the local platform, without BOM T} _
UTF-32T{ Represented in 32-bit entity in platforms default byte order‐
ing and includes the BOM T} _ UTF-32-INTERNALUTF-32, without BOM _
UTF-32BET{ UTF-32 in the big-endian byte ordering, without BOM T} _
UTF-32-BIG-ENDIANT{ UTF-32 in the big-endian byte ordering, including
BOM T} _ UTF-32-SWAPPEDT{ UTF-32 with endianness opposite to that of
the local platform, without BOM T} _ UTF-32LET{ UTF-32 in the little-
endian byte ordering, without BOM T} _ UTF-32-LITTLE-ENDIANT{ UTF-32 in
the little-endian byte ordering, including BOM T} _ UCS-2T{ Represented
in 16-bit entity for U+0000-U+D7FF and U+E000-U+FFFF in the platforms
default byte ordering and includes byte order mark (BOM) T} _
UCS-2-INTERNALUCS-2, without BOM _ UCS-2BET{ UCS-2 in the big-endian
byte ordering, without BOM T} _ UCS-2-BIG-ENDIANT{ UCS-2 in the big-
endian byte ordering, including BOM T} _ UCS-2LET{ UCS-2 in the little-
endian byte ordering, without BOM T} _ UCS-2-LITTLE-ENDIANT{ UCS-2 in
the little-endian byte ordering, including BOM T} _ UCS-2-SWAPPEDT{
UCS-2 with endianness opposite to that of the local platform, without
BOM T} _ UCS-4T{ Represented in 32-bit entity in the platforms default
byte ordering and includes byte order mark (BOM) T} _ UCS-4-INTER‐
NALUCS-4, without BOM _ UCS-4BET{ UCS-4 in the big-endian byte order‐
ing, without BOM T} _ UCS-4-BIG-ENDIANT{ UCS-4 in the big-endian byte
ordering, including BOM T} _ UCS-4LET{ UCS-4 in the little-endian byte
ordering, without BOM T} _ UCS-4-LITTLE-ENDIANT{ UCS-4 in the little-
endian byte ordering, including BOM T} _ UCS-4-SWAPPEDT{ UCS-4 with
endianness opposite to that of the local platform, without BOM T}
UCS, or Universal Character Set, refers to the ISO/IEC 10646 family of
standards with character set identical to that of Unicode.
Byte Order Mark, also known as BOM (U+FEFF), is a special character in
the beginning of a file or character stream, denoting the byte order of
the subsequent characters. UCS-2, UTF-16, UTF-32, and UCS-4 files and
character streams usually start with a BOM character to indicate the
byte ordering used in the file or character stream.
UTF-8 to UTF-8 conversion simply moves bytes from input buffer to out‐
put buffer without doing any conversion. During the moves, illegal
character checking is done to screen out any potentially harmful char‐
acter bytes. Such illegal characters will cause the conversion to fail.
UTF-7, a legacy 7-bit Unicode Transformation Format, is only supported
by iconv conversions to and from UTF-8, UCS-2 and UCS-4.
UTF-EBCDIC, a legacy EBCDIC-compatible variant of UTF-8, is only sup‐
ported by iconv conversions to and from UTF-8.
NOTES
iconv also supports conversion between Unicode encodings and many dif‐
ferent codesets. The list of such codesets includes for example the ISO
8859 character sets, EBCDIC code pages, EUC (Extended UNIX Code) and
ISO 2022 encodings for Chinese, Japanese, Korean, and many others (see
iconv_extra(7), iconv_ja(7), iconv_ko(7), iconv_zh(7), iconv_zh_HK(7),
and iconv_zh_TW(7)).
If a source character code value cannot be mapped to a valid character
in target codeset, it will be considered as an illegal or a non-identi‐
cal character. In the absence of explicit information about the source
character code value, iconv code conversions uses the following rules
in determining whether a character is illegal or non-identical:
If the source character code value is not within a range defined by the
source codeset standard, it is considered as an illegal character. If
the source character code value is within the range(s) defined by the
standard, it will be considered as non-identical, even if the source
character code value maps to an undefined or a reserved location within
the valid range. The non-identical character will map to either ? (0x3f
in ASCII-compatible codesets) if the target codeset is a non-Unicode
codeset or to Unicode replacement character (U+FFFD) if the target
codeset is an Unicode codeset.
When the BOM is present as the first character in the encoding that
supports it, it will direct the way the following Unicode character
sequences are interpreted. If the BOM is not the first character for
such encodings or for Unicode encodings that do not support the BOM,
the BOM character (U+FEFF) will be interpreted as Zero Width No-Break
Space (ZWNBSP) character and will not affect the way the Unicode char‐
acters are interpreted in terms of byte ordering.
When the target codeset is one of UCS-2, UTF-16, UTF-32, UCS-4,
UCS-2-BIG-ENDIAN, UCS-2-LITTLE-ENDIAN, UTF-16-BIG-ENDIAN, UTF-16-LIT‐
TLE-ENDIAN, UCS-4-BIG-ENDIAN, UCS-4-LITTLE-ENDIAN, UTF-32-BIG-ENDIAN,
and UTF-32-LITTLE-ENDIAN, expect a BOM character in the beginning of
the iconv code conversion output buffer.
When the source codeset is UCS-2, UTF-16, UTF-32, or UCS-4 and there is
no BOM presented as the first input character, the byte ordering of the
current system is assumed on the input byte stream given to the iconv
code conversion.
EXAMPLES
Example 1 The iconv Library Module Filename
In the conversion library, /usr/lib/iconv (see iconv(3C)), the library
module filename is composed of two symbolic elements separated by the
percent sign (%). The first symbol specifies the source codeset, i.e.
the codeset that is being converted; the second symbol specifies the
target codeset, i.e. the codeset to which the first one is being con‐
verted.
For example, the library module filename to convert from the legacy
UTF-7 codeset to the UTF-8 codeset is UTF-7%UTF-8.so.
Example 2 The cconv Library Module Filename
For some conversions, iconv(3C) makes a call to the cconv(3C) inter‐
faces to perform the conversion. The cconv conversion modules are
binary tables with .bt suffix generated by geniconvtbl(1) and placed in
the same /usr/lib/iconv library. The cconv library module filename is
composed of the symbolic elements for source and target codeset sepa‐
rated by the plus sign (+). The cconv conversion is typically performed
in two steps, with UTF-32 as the intermediate encoding.
For example, the cconv library module filename to convert from the Ja‐
panese EUC codeset to the UTF-32 codeset is eucJP+UTF-32.bt.
FILES
/usr/lib/iconv/*.so
iconv conversion modules
/usr/lib/iconv/*.bt
cconv code conversion binary tables for iconv(1), cconv(3C), and
iconv(3C)
/usr/lib/iconv/geniconvtbl/binarytables/*.bt
geniconvtbl conversion binary tables
/usr/lib/iconv/alias
Alias table file of codeset names
SEE ALSO
geniconvtbl(1), iconv(1), cconv(3C), cconv_close(3C), cconv_open(3C),
cconvctl(3C), iconv(3C), iconv_close(3C), iconv_open(3C), iconvctl(3C),
alias(5), geniconvtbl-cconv(5), iconv_extra(7), iconv_ja(7),
iconv_ko(7), iconv_zh(7), iconv_zh_HK(7), iconv_zh_TW(7)
The Unicode Consortium. The Unicode Standard, Version 6.2.0, (Mountain
View, CA: The Unicode Consortium, 2012. ISBN 978-1-936213-07-8)
Yergeau, F., UTF-8, a transformation format of Unicode and ISO 10646,
RFC 2044, Alis Technologies, October 1996.
Ohta, M., Character Sets ISO-10646 and ISO-10646-J-1, RFC 1815, Tokyo
Institute of Technology, July 1995.
Simonson, K., Character Mnemonics & Character Sets, RFC 1345, Rationel
Almen Planlaegning, June 1992.
Goldsmith, D., and M. Davis, UTF-7 - A Mail-Safe Transformation Format
of Unicode, RFC 1642, Taligent, Inc., July 1994.
Oracle Solaris 11.4 11 May 2021 iconv_unicode(7)