collect(1) 맨 페이지 - 윈디하나의 솔라나라

개요

섹션
맨 페이지 이름
검색(S)

collect(1)

collect(1)                       User Commands                      collect(1)



NAME
       collect - command used to collect program performance data

SYNOPSIS
       collect collect-arguments target target-arguments
       collect
       collect -V

DESCRIPTION
       The  collect  command  runs  the target process and records performance
       data and global data for the process.  Performance  data  is  collected
       using  profiling  or  tracing techniques. The data can be examined with
       the Performance Analyzer graphical tool (analyzer)  or  a  command-line
       program  (er_print).  The  data  collection software run by the collect
       command is referred to here as the Collector.


       The data from a single run of the collect command is called an  experi‐
       ment.  The experiment is represented in the file system as a directory,
       with various files inside that directory.


       The target is the path name of the executable, Java .jar file, or  Java
       .class  file  for  which you want to collect performance data. For more
       information about Java profiling, see JAVA PROFILING, below.


       Executables that are targets for the collect command  can  be  compiled
       with any level of optimization, but must use dynamic linking. If a pro‐
       gram is statically linked, the collect command prints an error message.
       In  order  to  see annotated source using analyzer or er_print, targets
       should be compiled with the -g flag, and should not be stripped.


       The collect command uses the following strategy to find its target:

           o      If a file with the specified target name exists, has execute
                  permission  set,  and is an ELF executable, the collect com‐
                  mand verifies that it can run on  the  current  machine  and
                  then runs it. If the file is not an ELF executable, the col‐
                  lect command assumes it is a script, and runs it.


           o      If a file with the specified target name exists but does not
                  have  execute permission, collect checks whether the file is
                  a Java jar file (target name ends in  .jar)  or  class  file
                  (target  name  ends in .class). If the file is a jar file or
                  class file, collect inserts the Java virtual  machine  (JVM)
                  software  as  the target, with any necessary flags, and col‐
                  lects data on that JVM. See JAVA PROFILING below.


           o      If a file with the specified target name is not found,  col‐
                  lect  searches  your  path to find an executable; if an exe‐
                  cutable file is found,  collect  verifies  it  as  described
                  above.


           o      If a file of the target name is also not found in your path,
                  the command looks for a file with that name and  the  string
                  .class  appended;  if  a  file with the class name is found,
                  collect inserts the JVM machine with the appropriate  flags,
                  as above.


           o      If none of these procedures can find the target, the command
                  fails.


OPTIONS
       If invoked with no arguments, collect prints a usage summary, including
       the default configuration of the experiment.

   Data Specifications
       -p option

           Collect  clock-based  profiling  data. The allowed values of option
           are:


           off          Turns off clock-based profiling.


           lo[w]        Turns on clock-based profiling with a per-thread  rate
                        of approximately 10 samples per second.


           on           Turns  on clock-based profiling with a per-thread rate
                        of approximately 100 samples per second.


           hi[gh]       Turns on clock-based profiling with a per-thread  rate
                        of approximately 1000 samples per second.


           n            Turns  on  clock-based  profiling with a profile timer
                        period of n. The value n can be an integer or a float‐
                        ing-point  number,  with  a  suffix of u for values in
                        microseconds, or m for values in milliseconds.  If  no
                        suffix  is  used,  assume the value to be in millisec‐
                        onds.

                        If the value is smaller than the clock profiling mini‐
                        mum, set it to the minimum; if it is not a multiple of
                        the clock profiling  resolution,  round  down  to  the
                        nearest  multiple  of  the  clock  resolution.  If  it
                        exceeds the clock profiling maximum, report an  error.
                        If it is negative or zero, report an error.

           If  no  explicit  -p argument is given, and neither count data, nor
           race-detection or deadlock data is specified, turn  on  clock-based
           profiling. If -h high or -h low is specified requesting the default
           counter set for that chip at high- or  low-frequency,  the  default
           clock-profiling  will  also  be  set to high or low; an explicit -p
           argument will be respected.

           Clock-profiling-based dataspace and  memoryspace  profiling  is  no
           longer supported; all supported machines have hardware counters for
           memory operations.


       -h [parameter]

           Hardware counter overflow profiling.



           -h    Shows extended help for  collect  hardware  counter  overflow
                 (HWC) profiling.

                 If  the -h option is specified without a value for parameter,
                 collect prints hardware counter information. If the processor
                 supports  hardware counter overflow profiling, collect prints
                 two lists containing information about hardware counters. The
                 first  list  contains "aliased" hardware counters; the second
                 list contains "raw" hardware counters. The output  also  con‐
                 tains  the  specification  for the default HWC experiment for
                 that processor. For more details, see the  "Hardware  Counter
                 Overflow Profiling" section below.

                 If  the  processor does not support hardware counter overflow
                 processing the output says so.


           The value of parameter can be set to default counters at a specific
           rate, a specifc counter, or a set of counters.

           -h {auto | lo | on | hi}

           Turns  on  Hardware  Counter  overflow  (HWC)  profiling data for a
           default set of counters at the specified rate:



           auto

               Matches the rate used by clock-profiling. If clock-profiling is
               disabled,  use the per-thread maximum rate of approximately 100
               samples per second. auto is the default and preferred setting.


           lo|low

               Uses per-thread maximum rate of approximately  10  samples  per
               second.


           on

               Uses  per-thread  maximum rate of approximately 100 samples per
               second.


           hi|high

               Uses per-thread maximum rate of approximately 1000 samples  per
               second.


           Alternatively, you can use specific counters:

           -h  ctr_def[,ctr_def]...

           Collects hardware-counter-overflow profiles using one or more spec‐
           ified counters. The maximum number of counters supported is proces‐
           sor-dependent.  You  can see the maximum number of hardware counter
           definitions for profiling on the current machine, the full list  of
           available hardware counters, and the default counter set by running
           collect -h with no other arguments on the current machine.

           Each counter definition takes the following form:


             [+|-]ctr[~attr=val]...[~attrN=valN][/reg#],[rate]

           The meanings of the counter definition options are as follows:



           +|-

               Optional parameter that can  be  applied  to  precise,  memory-
               related  counters,  which are the counters used for memoryspace
               and dataspace profiling.

               A + is the default and is not needed.

               A - collects only normal hardware-counter information  and  not
               the  extra  information that is used for memoryspace and datas‐
               pace profiling.

               See the section "MEMORYSPACE AND DATASPACE PROFILING" below.


           ctr

               Counter name. You can see the list of counter  names  for  your
               processor  by  running the collect -h command without any other
               command-line arguments. On most  systems,  you  can  specify  a
               counter  using  a numeric value in hexadecimal (such as 0x00c3)
               or decimal even if a counter is not listed in collect  -h  out‐
               put.  The numeric values for counters are specified in the pro‐
               cessor manufacturer's manuals. The name of the relevant  manual
               is  shown  in  the  collect  -h  output. Some counters are only
               described in proprietary vendor  manuals.  On  Oracle  Solaris,
               when  a counter is specified numerically it can help to specify
               the register number also.


           ~attr=val

               Optional one or more attribute  options.  On  some  processors,
               attribute options can be associated with a hardware counter. If
               the processor supports attribute options, collect -h provides a
               list  of  attribute names to use for attr. The value val can be
               in decimal or hexadecimal format.  Hexadecimal  format  numbers
               are in C program format where the number is prepended by a zero
               and lower-case x (0xhex_number). Multiple attributes  are  con‐
               catenated  to  the counter name. The ~ tilde character in front
               of each attr name is required.


           /reg#

               On Oracle Solaris, hardware register to use for the counter. If
               not  specified,  collect attempts to place the counter into the
               first available register and as a result, might  be  unable  to
               place  subsequent  counters  due  to register conflicts. If you
               specify more than one counter, the counters must use  different
               registers.  You can see a list of allowable register numbers by
               running the collect -h command without any  other  command-line
               arguments. The / character is required if the register is spec‐
               ified.


           rate

               The sampling frequency. Valid values are as follows:



               auto         Matches the rate used by clock profiling. If clock
                            profiling  is disabled, use the per-thread maximum
                            rate of  100  samples  per  second.  auto  is  the
                            default and preferred value.


               lo           Uses  per-thread  maximum rate of approximately 10
                            samples per second.


               on           Uses per-thread maximum rate of approximately  100
                            samples per second.


               hi           Uses per-thread maximum rate of approximately 1000
                            samples per second.


               value        Specifies a fixed event interval value to  trigger
                            a sample, rather than a sampling rate. When speci‐
                            fying value, note that  the  actual  frequency  is
                            dependent  on the selected counter and the program
                            under test.

                            The event interval can be specified in decimal  or
                            hexadecimal  format. Exercise caution in setting a
                            numerical value, especially as setting the  inter‐
                            val  too low can overload your application or even
                            your entire system. As a rule of  thumb,  aim  for
                            fewer  than 1000 events per second per thread. You
                            can use the Performance Analyzer Timeline view  to
                            visually estimate the rate of samples.


               The  rate can be omitted, in which case auto will be used. Even
               when the rate is omitted, the comma in front of it is  required
               (except for the last counter in a -h parameter).


           EXAMPLES: Some valid examples of -h usage:



             -h auto
             -h lo
             -h hi
                Enable the default counters with default, low, or
                high rates, respectively

             -h cycles,,insts,,dcm
             -h cycles -h insts -h dcm
                Both have the same meaning: three counters: cycles, insts
                and D-cache misses.

             -p lo -h cycles,,insts,,dcm
                Select a low rate of profiling for clock and HWC cycles, insts
                and D-cache misses. A low rate of profiling can be used to
                reduce data collection overhead and experiment size when
                dealing with long-running or highly multi-threaded applications.

             -h cycles~system=1
               Count cycles, explicitly including cycles in system mode.

             -h 0xc0/0,10000003
             On Nehalem, that is the equivalent to
             -h inst_retired.any_p/0,10000003


           Some invalid examples of -h usage:



             -h cycles -h off
               Can't use off with any other -h arguments
             -h cycles,insts
               Missing comma, and "insts" does not parse as a number for
               <interval>


           If the -h argument specifies the use of hardware counters but hard‐
           ware counters are in use by others  at  the  time  the  command  is
           given,  the  collect command will report an error and no experiment
           will be run.

           If no -h argument is given, no HW counter profiling  data  will  be
           collected. An experiment can specify both hardware counter overflow
           profiling and clock-based profiling.  Specifying  hardware  counter
           overflow  profiling will not disable clock-profiling, even is it is
           enabled by default.

           For more  information  on  hardware  counters,  see  the  "Hardware
           Counter Overflow Profiling" section below.


       -s option[,scope]

           Collect synchronization tracing data.

           The  minimum delay threshold for tracing events is set using option
           and optionally the scope of APIs traced are set by scope.

           The allowed values of option are:



           on

               Turns on synchronization delay tracing and  set  the  threshold
               value by calibration at runtime


           calibrate

               Same as on


           off

               Turns off synchronization delay tracing


           n

               Turns  on  synchronization delay tracing with a threshold value
               of n microseconds; if n is zero, trace all events


           all

               Turns on synchronization delay tracing and trace  all  synchro‐
               nization events

               By default, turns off synchronization delay tracing.

               For  native  API tracing on Oracle Solaris, the following func‐
               tions  are  traced:  mutex_lock(),  rw_rdlock(),   rw_wrlock(),
               cond_wait(), cond_timedwait(), cond_reltimedwait(), thr_join(),
               sema_wait(),   pthread_mutex_lock(),   pthread_rwlock_rdlock(),
               pthread_rwlock_wrlock(),                   pthread_cond_wait(),
               pthread_cond_timedwait(),       pthread_cond_reltimedwait_np(),
               pthread_join(), and sem_wait().

               On    Linux,    the    following    functions    are    traced:
               pthread_mutex_lock(), pthread_cond_wait(),  pthread_cond_timed‐
               wait(), pthread_join(), and sem_wait().

               For Java programs, record synchronization events for Java moni‐
               tors in user code.


           The allowed values of scope are:



           n

               Traces native APIs.


           j

               Traces Java APIs


           nj

               Traces native and Java APIs


           By default, trace both native and Java APIs.


       -H option

           Collects heap trace data. The allowed values of option are:

           on           Turns on tracing of memory allocation requests


           off          Turns off tracing of memory allocation requests

                        By default, turns off heap tracing.

                        Records heap-tracing  events  for  any  native  calls.
                        Treat calls to mmap as memory allocations.

                        Heap profiling for Java programs traces native alloca‐
                        tions only, not Java allocations.

                        Note that heap tracing might produce very large exper‐
                        iments.  Such  experiments  are  very slow to load and
                        browse.



       -i option

           Collects I/O trace data. The allowed values of option are:


           on           Turns on tracing of I/O operations


           off          Turns off tracing of I/O operations

           By default, turns off I/O operations.

           Note that I/O tracing might produce very  large  experiments.  Such
           experiments are very slow to load and browse.


       -M option

           Specifies  collection  of  an  MPI  experiment. (See MPI PROFILING,
           below.) The target of collect should be mpirun, and  its  arguments
           should be separated from the user target (that is the programs that
           are to be run by mpirun) by an inserted -- argument. The experiment
           is  named as usual, and is referred to as the "founder experiment";
           its directory contains subexperiments for  each  of  the  MPI  pro‐
           cesses,  named  by  rank.  It  is  recommended that the -- argument
           always be used with mpirun, so that an experiment can be  collected
           by prepending collect and its options to the mpirun command line.

           The allowed values of option are:

           MPI-version

               Turns on collection of an MPI experiment, assuming the MPI ver‐
               sion named. The recognized versions of MPI are printed when you
               type  collect  with no arguments, or in response to an unrecog‐
               nized version specified with -M.


           off

               Turns off collection of an MPI experiment.

               By default, turns off collection of an MPI experiment. When  an
               MPI  experiment  is  turned on, the default setting for -m (see
               below) is changed to on.



       -m option

           Collect MPI tracing data. (See MPI PROFILING, below.)

           The allowed values of option are:


           on           Turns on MPI tracing information.


           off          Turns off MPI tracing information.

           By default, turn off MPI tracing, except if the -M flag is enabled,
           in  which  case  MPI tracing is turned on by default. Normally, MPI
           experiments are collected with -M, and no user control of MPI trac‐
           ing  is  needed.  If you want to collect an MPI experiment, but not
           collect MPI trace data, you can use the explicit flags:

             -M MPI-version -m off



       -c option

           Collects count data. The allowed values of option are:


           on           Turns on count data.


           static       Turns on simulated count data, based on the assumption
                        that every instruction was executed exactly once.


           off          Turns off count data.

           By  default,  turn  off  count data. Count data cannot be collected
           with any other type of data. For  count  data  or  simulated  count
           data,  the  executable and any shared-objects that are instrumented
           and statically linked are counted; for count data,  but  not  simu‐
           lated  count  data,  dynamically  loaded  shared  objects  are also
           instrumented and counted.

           On Oracle Solaris, no special compilation is needed,  although  the
           count option is incompatible with compile flags -p, -pg, -qp, -xpg,
           and -xlinkopt. On Linux, the executable must be compiled  with  the
           -xannotate=yes flag in order to collect count data.


       -I directory

           Specifies a directory for count data instrumentation.


       -N libname

           Specifies  a  library to be excluded from instrumentation for count
           data, whether the library is linked into the executable, or  loaded
           with dlopen(3C). Multiple -N options can be specified.


       -r option

           Collects data for data race detection or deadlock detection for the
           Thread Analyzer.

           The allowed values of option are:


           race

               Collects data for detecting data races.


           deadlock

               Collects data for detecting deadlocks and potential deadlocks.


           all

               Collects data for detecting data races, deadlocks,  and  poten‐
               tial deadlocks. Can also be specified as race,deadlock.


           off

               Turns off data collection for data races, deadlocks, and poten‐
               tial deadlocks.


           on

               Collects data for detecting data races (same as race).


           terminate

               If an unrecoverable error is detected,  terminates  the  target
               process.


           abort

               If  an  unrecoverable  error is detected, terminates the target
               process with a core dump.


           continue

               If an unrecoverable error is detected, enables the  process  to
               continue.

           By default, turn off collection of all Thread Analyzer data.

           The  terminate,  abort,  and  continue  options can be added to any
           data-collection options, and govern the behavior when an unrecover‐
           able  error,  such  as a real (not potential) deadlock. The default
           behavior is terminate.

           Thread Analyzer data cannot be collected with any tracing data, but
           can  be  collected  in  conjunction with clock- or hardware counter
           profiling data. Thread Analyzer data significantly slows  down  the
           execution  of  the  target, and profiles might not be meaningful as
           applied to the user code.

           Thread Analyzer experiments can be examined with either analyzer or
           with  tha.  The  latter displays a simplified list of default tabs,
           but is otherwise identical.

           In order to enable data-race detection, executables must be instru‐
           mented, either at compile time, or by invoking a post-processor. If
           the target is not instrumented, and none of the shared  objects  on
           its  library  list is instrumented, a warning is displayed, but the
           experiment is run.  Other  Thread  Analyzer  data  do  not  require
           instrumentation.

           See the tha(1) man page or the Oracle Developer Studio 12.6: Thread
           Analyzer User's
                                               Guide for more detail.


       -S interval

           Periodically  samples  process-wide  resource  utilization  at  the
           interval  specified  (in  seconds).  The allowed values of interval
           are:


           off          Turns off periodic sampling.


           on           Turns on periodic sampling with the  default  sampling
                        interval (1 second).


           n            Turns on periodic sampling with a sampling interval of
                        n in seconds; n must be positive.

           By default, turn on periodic sampling.


   Experiment Controls
       -L size

           Limit the amount of profiling and tracing  data  recorded  to  size
           megabytes.  The  limit applies to the sum of all profiling data and
           tracing data, but not to process-wide resource-utilization samples.
           The  limit is only approximate, and can be exceeded. When the limit
           is reached, stop profiling and tracing data, but keep  the  experi‐
           ment  open  and record samples until the target process terminates.
           The allowed values of size are:

           unlimited or none

               Do not impose a size limit on the experiment.


           n

               Imposes a limit of n megabytes. The value of n must be positive
               and greater than zero.

               By default, there is no limit on the amount of data recorded.



       -F option

           Controls  whether  descendant  processes  should  have  their  data
           recorded. (Data is always collected on the founder  process,  inde‐
           pendent of any -F setting.) The allowed values of option are:


           on | all

               Records experiments on all descendant processes.


           off

               Does not record experiments on any descendant processes.


           =<regex>

               Records  experiments  on  those descendant processes whose exe‐
               cutable name (a.out name) matches the regular expression.  Only
               the  basename  of the executable is used, not the full path. If
               the <regex> that you use contains blanks or  characters  inter‐
               preted  by  your  shell,  be  sure to enclose the full =<regex>
               argument in single quotes.

           By default, record experiment on all descendant processes. For more
           details,  read  the  sections "FOLLOWING DESCENDANT PROCESSES", and
           "PROFILING SCRIPTS" below.


       -A option

           Controls whether to perform archiving as part of  data  collection.
           Archiving is required to make an experiment self-contained and por‐
           table. The allowed values of option are:


           on

               Copies load objects (the target and any shared objects it uses)
               into  the  experiment. Also copy any ancillary files (.anc) and
               object files (.o) which have Stabs or DWARF information not  in
               the load object.


           src

               In  addition  to  copying load objects as in -A on, copies into
               the experiment all source files and ancillary files (.anc) that
               can be found.


           usedsrc

               Similar  to  -A  src,  but  only copies source files, ancillary
               files (.anc), and load objects that are  needed  for  analytics
               and can be found. This option might require additional process‐
               ing time, but might result in smaller experiment sizes.


           off

               Does not copy or archive load objects or source files into  the
               experiment.

           Archiving will not be performed in the following circumstances:


               o      A  profiled  process  is terminated before it exits nor‐
                      mally


               o      -A off is specified

           In such cases, you must  run  er_archive  explicitly  on  the  same
           machine where the profiling data was recorded.

           When  many processes are being profiled, enabling archiving as part
           of data collection can be very expensive and might change the  tim‐
           ing  of the application run. With many processes, a better strategy
           is to collect the data with -A off, and later, when  the  profiling
           is complete,archive the experiment using er_archive -s all. In this
           case all binaries and source files will be saved in the experiment.

           The minimum archiving required that enables  an  experiment  to  be
           accessed  on another machine is -A on. When using this option, note
           that -A on does not copy any sources or object files (.o's); it  is
           your  responsibility to ensure that those files are accessible from
           the machine where the experiment is being examined, and  that  they
           are not changed or rebuilt after the experiment was recorded.

           The default setting for -A is on.


       -j option

           Controls  Java  profiling  when  the  target  is a JVM machine. The
           allowed values of option are:


           on

               Records profiling data for the JVM machine, and recognize meth‐
               ods  compiled  by  the  Java  HotSpot virtual machine, and also
               record Java call stacks. This is the default.


           off

               Does not record Java profiling data. Profiling data for  native
               call stacks is still recorded.


           <path>

               Records  profiling  data  for  the  JVM,  and  use  the  JVM as
               installed in <path>.

           See the section "JAVA PROFILING", below.


       -J java_arg

           Specifies additional arguments to be passed to  the  JVM  used  for
           profiling.  If  -J  is  specified,  Java  profiling (-j on) will be
           enabled. The java_arg must be surrounded by quotes if  it  contains
           more  than  one argument. It consists of a set of tokens, separated
           by either a blank or a tab; each token  is  passed  as  a  separate
           argument to the JVM. Note that most arguments to the JVM must begin
           with a "-" character.


       -l signal

           Samples process-wide resource-utilization whenever the given signal
           is delivered to the process.

           See the section "DATA COLLECTION AND SIGNALS" below for more infor‐
           mation about choosing a signal.


       -y signal[,r]

           Controls recording of data with signal, referred to as  the  pause-
           resume  signal.  Whenever  the  given  signal  is  delivered to the
           process, switch between paused (no data is  recorded)  and  resumed
           (data  is  recorded)  states.  Start  in  the  resumed state if the
           optional ,r flag is given, otherwise start  in  the  paused  state.
           This option does not affect the recording of process-wide resource-
           utilization samples.

           One use of the pause-resume signal is to  start  a  target  without
           collecting  data,  allowing  it  to  reach  steady-state,  and then
           enabling the data.

           See the section "DATA COLLECTION AND SIGNALS" below for more infor‐
           mation about choosing a signal.


   Output Controls
       -o experiment_name

           Uses  experiment_name as the name of the experiment to be recorded.
           The experiment_name must end in the string .er; if  not,  print  an
           error message and do not run the experiment.

           If  -o  is  not  specified,  give the experiment a name of the form
           stem.n.er, where stem is a string, and n is a number.  If  a  group
           name has been specified with -g, set stem to the group name without
           the .erg suffix. If no group name has been specified, set  stem  to
           the string "test".

           If invoked from one of the commands used to run MPI jobs, for exam‐
           ple, mpirun, but without -M  MPI-versions, and -o is not specified,
           take  the value of n used in the name from the environment variable
           used to define the MPI rank of that process. Otherwise,  set  n  to
           one  greater  than  the  highest integer currently in use. (See MPI
           PROFILING, below.)

           If the name is not specified in the form stem.n.er, and  the  given
           name  is  in use, print an error message and do not run the experi‐
           ment. If the name is of the form stem.n.er and the name supplied is
           in  use,  record  the  experiment under a name corresponding to one
           greater than the highest value of n that is currently in use. Print
           a warning if the name is changed.


       -d directory_name

           Places  the experiment in directory directory_name. If no directory
           is given, place the experiment in the current working directory. If
           a  group is specified (see -g, below), the group file is also writ‐
           ten to the directory named by -d.

           For the lightest-weight data collection, it is best to record  data
           to  a  local  file, with -d used to specify a directory in which to
           put the data. However,  for  MPI  experiments  on  a  cluster,  the
           founder  experiment  must be available at the same path to all pro‐
           cesses to have all data recorded into the founder experiment.

           Experiments written to long-latency  file  systems  are  especially
           problematic, and might progress very slowly.


       -g group_name

           Adds  the  experiment  to  the  experiment  group  group_name.  The
           group_name string must end in the string .erg; if  not,  report  an
           error and do not run the experiment. The first line of a group file
           must contain the string


             #analyzer experiment group

           and each subsequent line is the name of an experiment.


       -O file

           Appends all output from collect itself to the named  file,  but  do
           not  redirect  the output from the spawned target, nor from dbx (as
           invoked with the -P argument), nor from the processes  involved  in
           recording  count data (as invoked with the -c argument). If file is
           set to /dev/null suppress all output from  collect,  including  any
           error messages.


       -t duration

           Collects  data for the specified duration. duration can be a single
           number followed by either m to specify minutes,  or  s  to  specify
           seconds  (default),  or  two such numbers separated by a - sign. If
           one number is given, data is collected from the start  of  the  run
           until  the  given time; if two numbers are given, data is collected
           from the first time to the second. If the second time is zero, data
           is  collected until the end of the run. If two non-zero numbers are
           given, the first must be less than the second.

           Although you specify duration in minutes or seconds, the start  and
           end  of  data  collection  is  recognized with greater accuracy. If
           clock profiling is enabled, the accuracy is approximately twice the
           clock  profiling  interval.  If clock profiling is not enabled, the
           accuracy is 200 milliseconds.


   Other Arguments
        .sp -C commentPuts the comment into the notes file for the  experiment.
                     Up to ten -C arguments can be supplied.


       -P <pid>

           Write a script for dbx to attach to the process with the given PID,
           and collect data from it, and then invoke  dbx  with  that  script.
           Clock  or  HW  counter profiling data may be specified, but neither
           tracing nor count data are supported. See the collector(1) man page
           for more information.

           When  attaching  to  a  process,  the directory is created with the
           umask of the user running collect -P, but the experiment is written
           as  the user running the process which is being attached to. If the
           user doing the attach is root, and  the  umask  is  not  zero,  the
           experiment will fail.

           Note -




             On  Linux,  attaching to a multithreaded process, including Java,
             will not properly collect data. Data  for  the  thread  that  was
             attached to will be captured, but not data for other threads.




       -n

           Dry  run:  do  not run the target, but print all the details of the
           experiment that would be run. Turn on -v.


       -V

           Prints the current version. Do not examine further arguments and do
           no further processing.


       -v

           Prints  the  current version and further detailed information about
           the experiment being run.


       -x

           Leaves the target process stopped on the exit from the exec  system
           call,  in  order  to  allow a debugger to attach to it. The collect
           command prints a message with the process PID.

           To attach a debugger to the target once it is stopped  by  collect,
           you can follow the procedure below.


               o      Obtain  the  PID of the process from the message printed
                      by the collect -x command


               o      Start the debugger


               o      Configure the debugger to ignore  SIGPROF  and,  if  you
                      chose  to  collect  hardware  counter  data,  SIGEMT  on
                      Solaris or SIGIO on Linux


               o      Attach to the process using dbx's attach command.


               o      Set the collector parameters for the experiment you wish
                      to collect


               o      Issue the collector enable command


               o      Issue  the  cont  command to allow the target process to
                      run

           As the process runs under the control of the debugger, the  Collec‐
           tor records an experiment.

           Alternatively, you can attach to the process and collect an experi‐
           ment using the collect -P  PID command.


FOLLOWING DESCENDANT PROCESSES
       Data from the initial process spawned by collect,  called  the  founder
       process, is always collected. Processes can create descendant processes
       by calling system library functions, including the  variants  of  fork,
       exec,  system, etc. If a -F argument is used, the collector can collect
       data for descendant processes, and it opens a new experiment  for  each
       descendant  process inside the parent experiment. These new experiments
       are named with their lineage as follows:

           o      An underscore is appended to the creator's experiment name.


           o      A code letter is added: either "f" for a fork,  or  "x"  for
                  other descendants, including exec. On Linux, "C" is used for
                  a descendant generated by clone(2).


           o      A number is added after the code letter, which is the  index
                  of the descendant.


           o      The experiment suffix, ".er" is appended to the lineage.



       For  example,  if  the  experiment  name  for  the  initial  process is
       "test.1.er", the experiment for the descendant process created  by  its
       third  fork  is  "test.1.er/_f3.er". If that descendant process execs a
       new image, the corresponding experiment name is "test.1.er/_f3_x1.er".


       If the default, -F on, is used, descendant processes initiated by calls
       to  fork(2), fork1(2), fork(3F), vfork(2), and exec(2) and its variants
       are followed. The call to vfork is replaced internally  by  a  call  to
       fork1.  Descendants created by calls to system(3C), system(3F), sh(3F),
       popen(3C) , and similar functions, and their associated descendant pro‐
       cesses,  are  also  followed.  On Linux, descendants created by clone()
       without the CLONE_VM flag are followed by default; descendants  created
       with  the  CLONE_VM flag are treated as threads, rather than processes,
       and are always followed, independent of the -F setting.


       If the -F =<regex> argument is used, all descendants whose name matches
       the  regular  expression  are  followed.  When matching names, only the
       basename of the executable is used, not the  full  path,  and  not  any
       arguments.


       For  example,  to  capture  data on the descendant process of the first
       exec from the first fork from the first call to system in the  founder,
       use:

         collect -F '=_x1_f1_x1'



       To capture data on all the variants of exec, but not fork, use:

         collect -F '=.*_x[0-9]/*'



       To  capture  data  from  a  call  to  system("echo hello") but not sys‐
       tem("goodbye"), use:

         collect -F '=echo hello'



       The Analyzer and er_print automatically read experiments for descendant
       processes  when the founder experiment is read, and the experiments for
       the descendant processes are selected for data display.


       To specifically select the data for  display  from  the  command  line,
       specify  the  path  name explicitly to either er_print or Analyzer. The
       specified path must  include  the  founder  experiment  name,  and  the
       descendant experiment's name inside the founder directory.


       For example, to see the data for the third fork of the test.1.er exper‐
       iment:

         er_print test.1.er/_f3.er
         analyzer test.1.er/_f3.er



       You can prepare an experiment group file with  the  explicit  names  of
       descendant experiments of interest.


       To  examine  descendant  processes  in  the  Analyzer, load the founder
       experiment and choose View > Filter data. The Analyzer displays a  list
       of  experiments  with  only the founder experiment checked. Uncheck the
       founder experiment and check the descendant experiment of interest.

PROFILING SCRIPTS
       By default, collect no longer requires that its target be an  ELF  exe‐
       cutable.  If  collect  is invoked on a script, data is collected on the
       program launched to execute the script,  and  on  all  descendant  pro‐
       cesses.  To collect data only on a specific process, use the -F flag to
       specify the name of the executable to follow.


       For example, to profile the script foo.sh, but collect  data  primarily
       from the executable bar, use the command:

         collect -F =bar foo.sh



       Data  will  be collected on the founder process launched to execute the
       script, and all bar processes spawned from  the  script,  but  not  for
       other processes.

JAVA PROFILING
       Java  profiling  consists of collecting a performance experiment on the
       JVM machine as it runs your .class or .jar  files.  If  possible,  call
       stacks  are  collected in both the Java model and in the machine model.
       On x86 platforms, if Java applications crash  during  data  collection,
       disabling  capture  of  machine  model  call stacks with the SP_COLLEC‐
       TOR_NATIVE_MAX_STACKDEPTH environment variable might help.  See  "Envi‐
       ronment Variables" below.


       Data  can be shown with view mode set to User, Expert, or Machine. User
       mode shows each method by name, with data for interpreted and  HotSpot-
       compiled  methods aggregated together; it also suppresses data for non-
       user-Java threads. Expert mode separates HotSpot-compiled methods  from
       interpreted  methods,  and  does  not  suppress  non-user Java threads.
       Machine mode shows data for interpreted Java methods  against  the  JVM
       machine  as  it  does the interpreting, while data for methods compiled
       with the Java HotSpot virtual machine is reported  for  named  methods.
       All  threads  are  shown.  In  all three modes, data is reported in the
       usual way for any non-OpenMP C, C++, or Fortran code called by  a  Java
       target.  Such code corresponds to Java native methods. The Analyzer and
       the er_print utility can switch between the view mode User,  view  mode
       Expert, and view mode Machine, with User being the default.


       Clock-based  profiling and hardware counter overflow profiling are sup‐
       ported. Synchronization tracing collects data only on the Java  monitor
       calls,  and synchronization calls from native code; it does not collect
       data about internal synchronization calls within the JVM.


       Heap tracing is not supported for Java, and generates an error if spec‐
       ified.


       Some  Java  codes  have shared objects contained within a jar file. The
       shared objects are extracted to a temporary directory when the applica‐
       tion runs, and are deleted when the application terminates. The shared-
       object names are recorded in the experiment map file, but the jar  file
       name is not. To read such experiments, be sure to add an addpath direc‐
       tive listing the jar file to your .er.rc file, or add the path from the
       Analyzer  GUI,  or with the addpath command in er_print. If the addpath
       directive is in  your  .er.rc  file  at  the  time  the  experiment  is
       archived, the shared objects will be archived.


       When  collect  inserts a target name of java into the argument list, it
       examines environment variables for a path to the java  target,  in  the
       order  JDK_HOME, and then JAVA_PATH. For the first of these environment
       variables that is set, the resultant target is verified as an ELF  exe‐
       cutable.  If  it  is  not, collect fails with an error indicating which
       environment variable was used, and the full path name that was tried.


       If neither of those environment variables is set, the  collect  command
       uses  the version set by your PATH. If there is no java in your PATH, a
       system default of /usr/java/bin/java is tried.

JAVA PROFILING WITH A DLOPEN
       Some applications are not pure Java, but are C or C++ applications that
       invoke dlopen to load libjvm.so, and then start the JVM by calling into
       it. The collector sets an environment variable so that  Java  profiling
       is automatically enabled.

SHARED_OBJECT HANDLING
       Normally,  the  collect  command  causes  data  to be collected for all
       shared objects in the address space of the target, whether on the  ini‐
       tial library list, or explicitly dlopen'd. However, there are some cir‐
       cumstances under which some shared objects are not profiled.


       One such scenario is when the target program is invoked with lazy-load‐
       ing.  In  such cases, the library is not loaded at startup time, and is
       not loaded by explicitly calling dlopen, so the shared object  name  is
       not  included  in the experiment, and all PCs from it are mapped to the
       <Unknown> function. The workaround is to set LD_BIND_NOW, to force  the
       library to be loaded at startup time.


       Another  such  scenario  is  when  the  executable is built with the -B
       direct linking option. In that case the object is dynamically loaded by
       a  call  specifically  to the dynamic linker entry point of dlopen, and
       the libcollector interposition is bypassed. The shared object  name  is
       not  included  in the experiment, and all PCs from it are mapped to the
       <Unknown> function. The workaround is to not use -B direct.

DATA COLLECTION AND SIGNALS
   Profiling Signals
       Signals are used for both clock- and hardware-counter-overflow  profil‐
       ing. SIGPROF is used in data collection for all experiments. The period
       for generating the signal depends on the data being  collected.  SIGEMT
       (Solaris)  or  SIGIO (Linux) is used for hardware counter overflow pro‐
       filing. The overflow interval depends on the user parameter for profil‐
       ing.  Any  user code that uses or manipulates the profiling signals may
       potentially interfere with data collection. When the Collector installs
       its  signal  handler  for a profile signal, it sets a flag that ensures
       that system calls are not interrupted to deliver signals. This  setting
       could  change  the behavior of a target program that uses the profiling
       signals for other purposes.


       When the Collector installs its signal handler for a profile signal, it
       remembers  whether  or not the target had installed its own signal han‐
       dler. The Collector also interposes on  some  signal-handling  routines
       and  does not allow the user to install a signal handler for these sig‐
       nals; it saves the user's handler, just as it does when  the  Collector
       replaces a user handler on starting the experiment.


       Profiling  signals  are  delivered by from the profiling timer or hard‐
       ware-counter-overflow handling code in the kernel, or in  response  to:
       the  kill(2),  sigsend(2),  tkill(2),  tgkill(2) or _lwp_kill(2) system
       calls; the raise(3C) or sigqueue(3C) library calls; or the kill(1) com‐
       mand.  A signal code is delivered with the signal so that the Collector
       can distinguish the origin. If it is delivered  for  profiling,  it  is
       processed by the Collector; If it is not delivered for profiling, it is
       delivered to the target signal handler.


       When the Collector is running under dbx, the profiling signal delivered
       occasionally has its signal code corrupted, and a profile signal may be
       treated as if it were generated from a system or library call or a com‐
       mand. In that case, it will be incorrectly delivered to the user's han‐
       dler. If the user handler was set to SIG_DFL, it will cause the process
       to fail core dump.


       When  the  Collector is invoked after attaching to a target process, it
       will install its signal handler, but it cannot interpose on the signal-
       handling  routines.  If those user code installs a signal handler after
       the attach, it will override the Collector's signal handler,  and  data
       will be lost.


       Note  that  any  signal, including either of the profiling signals, may
       cause premature termination of a system call, and the program  must  be
       prepared to handle that behavior. When libcollector installs the signal
       handlers for data collection,  it  specifies  restarting  those  system
       calls  that are restartable, but some, like sleep(3C) will return early
       without reporting an error.

   Process-Wide Sample and Pause-Resume Signals
       Signals can be specified by the user as  a  sample  signal  (-l)  or  a
       pause-resume  signal  (-y). SIGUSR1 or SIGUSR2 are recommended for this
       use, but any signal that is not used by the target can be used.


       The profiling signals can be used if the process does not otherwise use
       them, but they should be used only if no other signal is available. The
       Collector interposes on some  signal-handling  routines  and  does  not
       allow  the user to install a signal handler for these signals; it saves
       the user's handler, just as it does when the Collector replaces a  user
       handler on starting the experiment.


       If  the  Collector  is invoked after attaching to a target process, and
       the user code installs a signal handler for the sample or  pause-resume
       signal, those signals will no longer operate as specified.

OPENMP PROFILING
       Data collection for OpenMP programs collects data that can be displayed
       in any of the three view modes, just as  for  Java  programs.  In  User
       mode,  slave  threads  are shown as if they were really cloned from the
       master thread, and have call stacks  matching  those  from  the  master
       thread.  Frames  in  the call stack coming from the OpenMP runtime code
       (libmtsk.so) are suppressed. In Expert user mode, the master and  slave
       threads  are shown differently, and the explicit functions generated by
       the compiler are visible, and the frames from the OpenMP  runtime  code
       (libmtsk.so) are suppressed. For Machine mode, the actual native stacks
       are shown.


       In User mode, various artificial functions are introduced as  the  leaf
       function of a call stack whenever the runtime library is in one of sev‐
       eral states. These  functions  are  <OMP-overhead>,  <OMP-idle>,  <OMP-
       reduction>,   <OMP-implicit_barrier>,   <OMP-explicit_barrier>,   <OMP-
       lock_wait>,    <OMP-critical_section_wait>,    and    <OMP-ordered_sec‐
       tion_wait>.


       Three  additional  clock-profiling  metrics  are  added to the data for
       clock-profiling experiments:


         OpenMP Work (ompwork)
         OpenMP Wait (ompwait)
         Master Thread Time (masterthread)




       OpenMP Work is counted when the OpenMP runtime thinks the code is doing
       work. It includes time when the process is consuming User-CPU time, but
       it also can include time when the process is consuming System-CPU time,
       waiting  for  page faults, waiting for the CPU, etc. Hence, OpenMP Work
       can exceed User-CPU time. OpenMP Wait is accumulated  when  the  OpenMP
       runtime thinks the process is waiting. OpenMP Wait can include User-CPU
       time for busy-waits (spin-waits), but it also includes Other-Wait  time
       for sleep-waits.


       Master  Thread Time is the total time spent in the master thread. It is
       only available from Oracle Solaris experiments. It corresponds to wall-
       clock time.


       The  inclusive  metrics  are visible by default; the exclusive are not.
       Together, the sum of those two metrics equals  the  Total  Thread  Time
       metric.  These  metrics  are  added for all clock- and hardware counter
       profiling experiments.


       Collecting information for every parallel-region entry in the execution
       of  the  program  can  be very expensive. You can suppress that cost by
       setting  the  environment  variable  SP_COLLECTOR_NO_OMP.  If  you  set
       SP_COLLECTOR_NO_OMP, the program will have substantially less dilation,
       but you will not see the data from slave threads propagate up the call‐
       er,  and  eventually  to  main(), as you would when the variable is not
       set.


       A collector for OpenMP 3.0 is enabled by default in  this  release.  It
       can  profile  programs  that  use explicit tasking. Programs built with
       earlier compilers can be profiled with the  new  collector  only  if  a
       patched version of libmtsk.so is available. If it is not installed, you
       can switch data collection to use the  old  collector  by  setting  the
       environment variable SP_COLLECTOR_OLDOMP.


       Note  that  the  OpenMP  profiling  functionality is only available for
       applications compiled with the Oracle Developer Studio compilers, since
       it  depends  on  the Oracle Developer Studio compiler runtime. GNU-com‐
       piled code will only see machine-level call stacks.

MEMORYSPACE AND DATASPACE PROFILING
       A memoryspace profile is a profile in which memory-related events  such
       as  cache  misses  are  reported against the physical structures of the
       machine, such as cache-lines, memory-banks, or pages. Memoryspace  pro‐
       filing  is  available  on Oracle SPARC systems and Intel Oracle Solaris
       systems.


       A dataspace profile is a profile in which those  memory-related  events
       are  reported  against  the  data structures whose references cause the
       events rather than  just  the  instructions  where  the  memory-related
       events  occur.  Dataspace  profiling is only available on SPARC systems
       running Oracle Solaris.


       For either memoryspace or dataspace profiling, you must  collect  hard‐
       ware  counter  profiles on an Oracle Solaris system using precise, mem‐
       ory-related counters. Such counters  are  found  in  the  counter  list
       obtained  by  running the collect -h command without any other command-
       line arguments; the counters are annotated memoryspace.


       Further, in order to support dataspace profiling, executables should be
       compiled for a SPARC platform with the -xhwcprof -xdebugformat=dwarf -g
       flags.


       Memoryspace profiling data can be viewed with er_print commands or Per‐
       formance Analyzer views relating to Memory Objects.


       Dataspace  profiling  data can be viewed with the er_print utility com‐
       mands data_objects, data_single, and data_layout  or  with  Performance
       Analyzer using the data views labeled DataObjects and DataLayout.

MPI PROFILING
       The  collect command can be used for MPI profiling to manage collection
       of the data from the constituent MPI processes, collect MPI trace data,
       and  organize the data into a single "founder" experiment, with "subex‐
       periments" for each MPI process.


       The collect command can be used with MPI by simply prefacing  the  com‐
       mand that starts the MPI job and its arguments with the desired collect
       command and its arguments (assuming you have inserted the  --  argument
       to  indicate  the  end of the mpirun arguments). For example, on an SMP
       machine,

         % mpirun -np 16 -- a.out 3 5



       can be replaced by

         % collect -M OMPT mpirun -np 16 -- a.out 3 5



       This command runs an MPI tracing experiment on each of the 16 MPI  pro‐
       cesses,  collecting  them  all in an MPI experiment, named by the usual
       conventions for naming experiments. It assumes use of the  Oracle  Mes‐
       sage Passing Toolkit (previously known as Sun HPC ClusterTools) version
       of MPI.


       The initial collect process reformats the  mpirun  command  to  specify
       running  collect  with  appropriate arguments on each of the individual
       MPI processes.


       Note that the  --  argument  immediately  before  the  target  name  is
       required for MPI profiling (although it is optional for mpirun itself),
       so that collect can separate the mpirun arguments from the  target  and
       its  arguments.  If  the -- argument is not supplied, collect prints an
       error message, and no experiment is run.


       Furthermore, a -x PATH argument is added to  the  mpirun  arguments  by
       collect,  so  that  the remote collect's can find their targets. If any
       environment variables in your environment  begin  with  "VT_"  or  with
       "SP_COLLECTOR_",  they  are  passed to the remote collect with -x flags
       for each.


       MIMD MPI runs are supported, with the similar  requirement  that  there
       must  be  a  "--"  argument after each ":" (indicating a new target and
       local mpirun arguments for it). If the --  argument  is  not  supplied,
       collect prints an error message, and no experiment is run.


       Some  versions  of  Oracle Message Passing Toolkit, or Sun HPC Cluster‐
       Tools have functionality for MPI State profiling. When  clock-profiling
       data  is collected on an MPI experiment run with such a version of MPI,
       two additional metrics can be shown:


         MPI Work (mpiwork)
         MPI Wait (mpiwwait)




       MPI Work accumulates when the process is inside the MPI  runtime  doing
       work,  such  as  processing  requests or messages; MPI Wait accumulates
       when the process is inside the MPI runtime, but waiting for  an  event,
       buffer, or message.


       On  Oracle  Solaris  systems,  MPI  Wait is accumulated whether the MPI
       library sleeps or spins when waiting. On Linux  systems,  MPI  Wait  is
       accumulated  when the MPI library spins when waiting; it is not accumu‐
       lated if the MPI library sleeps (yields the CPU) when waiting, and will
       be undercounted relative to the real wait time.


       In  the Analyzer, when MPI trace data is collected, two additional tabs
       are shown, MPI Timeline and MPI Chart.


       The technique of using mpirun to spawn explicit collect commands on the
       MPI  processes  is  no  longer supported to collect MPI trace data, and
       should not be used. It can still be used for all other types of data.


       MPI profiling is based on the open source VampirTrace 5.5.3 release. It
       recognizes  several  VampirTrace  environment variables, and a new one,
       VT_STACKS, which controls whether or not call stacks  are  recorded  in
       the  data.  For  further information on the meaning of these variables,
       see the VampirTrace 5.5.3 documentation.


       The default value of the environment variable VT_BUFFER_SIZE limits the
       internal  buffer  of  the  MPI  API  trace  collector to 64 MB, and the
       default value of VT_MAX_FLUSHES limits the number  of  times  that  the
       buffer is flushed to 1. Events that are to be recorded after the limits
       have been reached are no longer written into the trace file. The  envi‐
       ronment  variables  apply  to  every process of a parallel application,
       meaning that applications with n processes will typically create  trace
       files n times the size of a serial application.


       To  remove  the  limit  and get a complete trace of an application, set
       VT_MAX_FLUSHES to 0. This setting causes the MPI API trace collector to
       flush  the  buffer  to  disk whenever the buffer is full. To change the
       size of the buffer, use the environment  variable  VT_BUFFER_SIZE.  The
       optimal  value for this variable depends on the application which is to
       be traced. Setting a small value will increase the memory available  to
       the application but will trigger frequent buffer flushes by the MPI API
       trace collector. These buffer  flushes  can  significantly  change  the
       behavior  of the application. On the other hand, setting a large value,
       like 2G, will minimize buffer flushes by the MPI API  trace  collector,
       but  decrease  the  memory  available to the application. If not enough
       memory is available to hold the buffer and the  application  data  this
       might cause parts of the application to be swapped to disk leading also
       to a significant change in the behavior of the application.


       Another important variable is VT_VERBOSE, which turns on various  error
       and  status  messages,  and setting it to 2 or higher is recommended if
       problems arise.


       Normally, MPI trace output data is post-processed when the mpirun  tar‐
       get  exits;  a  processed  data  file is written to the experiment, and
       information about the post-processing time is written into the  experi‐
       ment  header. MPI post-processing is not done if MPI tracing is explic‐
       itly disabled.


       In the event of a failure in post-processing, an error is reported, and
       no MPI Tabs or MPI tracing metrics will be available.


       If  the  mpirun target does not actually invoke MPI, an experiment will
       still be recorded, but no MPI trace data will be produced. The  experi‐
       ment  will  report an MPI post-processing error, and no MPI Tabs or MPI
       tracing metrics will be available.


       If the environment variable VT_UNIFY is set to "0", the post-processing
       routines, er_vtunify and er_mpipp will not be run by collect. They will
       be run the first time either er_print or analyzer are  invoked  on  the
       experiment.

USING COLLECT WITH PPGSZ
       The  collect command can be used with ppgsz by running the collect com‐
       mand on the ppgsz command, and specifying the -F on flag.  The  founder
       experiment  is  on  the  ppgsz executable and is uninteresting. If your
       path finds the 32-bit version of ppgsz, and the experiment is being run
       on a system that supports 64-bit processes, the first thing the collect
       command does is execute an exec function on its 64-bit version,  creat‐
       ing  _x1.er.  That executable forks, creating _x1_f1.er. The descendant
       process attempts to execute an exec function on the  named  target,  in
       the  first  directory  on  your path, then in the second, and so forth,
       until one of the exec functions succeeds. If, for  example,  the  third
       attempt  succeeds,  the  first  two  descendant  experiments  are named
       _x1_f1_x1.er and _x1_f1_x2.er,  and  both  are  completely  empty.  The
       experiment on the target is the one from the successful exec, the third
       one in the example, and is named _x1_f1_x3.er, stored under the founder
       experiment.  It  can  be processed directly by invoking the Analyzer or
       the er_print utility on test.1.er/_x1_f1_x3.er.


       If the 64-bit ppgsz is the initial process run, or if the 32-bit  ppgsz
       is  invoked  on a 32-bit kernel, the fork descendant that executes exec
       on the real target has its data in _f1.er, and the real target's exper‐
       iment  is  in  _f1_x3.er,  assuming  the same path properties as in the
       example above.


       See the section  "FOLLOWING  DESCENDANT  PROCESSES",  above.  For  more
       information  on  hardware  counters, see the "Hardware Counter Overflow
       Profiling" section below.

USING COLLECT ON SETUID/SETGID TARGETS
       The collect command operates by inserting a shared library,  libcollec‐
       tor.so,  into the target's address space (LD_PRELOAD), along with addi‐
       tional shared libraries for specific  tracing  data  collection.  Those
       shared libraries write the files that constitute the experiment.


       Several  problems might arise if collect is invoked on executables that
       call setuid or setgid, or that create descendant  processes  that  call
       setuid  or setgid. If the user running the experiment is not root, col‐
       lection fails because the shared  libraries  are  not  installed  in  a
       trusted directory. The workaround is to run the experiments as root, or
       use crle(1) to grant permission. Users should, of  course,  take  great
       care when circumventing security barriers, and do so at their own risk.


       In addition, the umask for the user running the collect command must be
       set to allow write permission for that  user,  and  for  any  users  or
       groups  that are set by the setuid/setgid attributes of a program being
       exec'd and for any user or group to which that program sets itself.  If
       the  mask  is  not set properly, some files might not be written to the
       experiment, and processing of the experiment might not be possible.  If
       the  log  file can be written, an error is shown when the user attempts
       to process the experiment.


       Note that when attaching as one user to a  process  that  is  owned  by
       another user, umask must be set to allow writing by the user owning the
       process to which you are attaching.


       Other problems can arise if the target itself makes any of  the  system
       calls  to  set UID or GID, or if it changes its umask and then forks or
       runs exec on some other process, or crle was used to configure how  the
       runtime linker searches for shared objects.


       If an experiment is started as root on a target that changes its effec‐
       tive GID, the er_archive process that is  automatically  run  when  the
       experiment  terminates fails, because it needs a shared library that is
       not marked as trusted.  In  that  case,  you  can  run  er_archive  (or
       er_print  or  Analyzer) explicitly by hand, on the machine on which the
       experiment was recorded, immediately following the termination  of  the
       experiment.

DATA COLLECTED
       Three  types  of  data are collected: profiling data, tracing data, and
       process-wide resource-utilization data. The data  packets  recorded  in
       profiling  and  tracing  include  the  callstack  of each LWP, the LWP,
       thread, and CPU IDs, and some event-specific  data.  The  data  packets
       recorded  in  process-wide  resource-utilization samples contain global
       data such as execution statistics, but no  program-specific  or  event-
       specific data. All data packets include a timestamp.


       Each  data type describes the metrics derived from that data, both as a
       name, and as the string the user would use in a metrics command looking
       at an experiment.

       Clock-based Profiling

           The  event-specific  data  recorded  in clock-based profiling is an
           array of counts for  each  accounting  microstate.  The  microstate
           array  is  incremented by the system at a prescribed frequency, and
           is recorded by the Collector when a profiling signal is processed.

           Clock-based profiling can run at a range of frequencies which  must
           be  multiples of the clock resolution used for the profiling timer.
           If you try to do high-resolution profiling on  a  machine  with  an
           operating  system  that  does  not support it, the command prints a
           warning message and uses the highest  resolution  supported.  Simi‐
           larly,  a  custom  setting that is not a multiple of the resolution
           supported by the system is rounded down  to  the  nearest  non-zero
           multiple of that resolution, and a warning message is printed.

           On Oracle Solaris, clock-based profiling data is converted into the
           following metrics:



             Total Thread Time (total) = sum over all ten microstates
             Total CPU Time (totalcpu) = user + system + trap
             User CPU Time (user)
             System CPU Time (system)
             Trap CPU Time (trap)
             User Lock Time (lock)
             Data Page Fault Time (datapfault)
             Text Page Fault Time (textpfault)
             Kernel Page Fault Time (kernelpfault)
             Stopped Time (stop)
             Wait CPU Time (wait)
             Sleep Time (sleep)


           For experiments on multithreaded applications, all of the times are
           summed across all threads in the process. Total Thread Time adds up
           to the real elapsed time,  multiplied  by  the  average  number  of
           threads in the process.

           On Linux, clock-based profiling data produces one metric: Total CPU
           Time (totalcpu).

           If clock-based profiling is performed on an OpenMP  program,  three
           additional metrics are provided:



             OpenMP Work (ompwork)
             OpenMP Wait (ompwait)
             Master Thread Time (masterthread)


           On  Oracle Solaris, OpenMP Work accumulates when work is being done
           in parallel. OpenMP Wait accumulates when  the  OpenMP  runtime  is
           waiting  for  synchronization,  and accumulates whether the wait is
           using CPU time or sleeping, or when work is being done in parallel,
           but the thread is not scheduled on a CPU. Master Thread Time repre‐
           sents time in the master thread only.

           On Linux, OpenMP Work and OpenMP Wait are accumulated only when the
           process  is  active  in either user or system mode. Unless you have
           specified that OpenMP should do a busy wait, OpenMP Wait  on  Linux
           will not be useful. Master Thread Time is not provided on Linux.

           If  clock-based profiling is performed on an MPI program, run under
           Oracle Message Passing Toolkit or Sun HPC ClusterTools release  8.1
           or later, two additional metrics are provided:



             MPI Work (mpiwork)
             MPI Wait (mpiwait)


           On  Oracle  Solaris,  MPI  Work accumulates when the MPI runtime is
           active. MPI Wait accumulates when the MPI runtime  is  waiting  for
           the  send  or  receive  of  a  message,  or when the MPI runtime is
           active, but the thread is not running on a CPU.

           On Linux, MPI Work and MPI  Wait  are  accumulated  only  when  the
           process  is  active  in either user or system mode. Unless you have
           specified that MPI should do a busy wait, MPI Wait  on  Linux  will
           not be useful.


       Hardware Counter Overflow Profiling

           Hardware  counter  overflow  profiling records the number of events
           counted by the hardware counter at the time the overflow signal was
           processed.

           The  counters  available  depend on the specific processor chip and
           operating system. Running the command  collect  -h  with  no  other
           arguments  will  describe the processor, and the number of hardware
           counters available, along with a list of all counters and a default
           hardware-counter  set  for  that  processor.  The counters that are
           aliased to common names are displayed first in the  list,  followed
           by  a  list  of  the raw hardware counters. After the list of known
           counters is printed, the name of the reference manual for the chip,
           and the default counter set defined for that chip is printed.

           If  neither  the performance counter subsystem nor collect know the
           names for the counters on a specific chip, the  tables  are  empty.
           Even  so,  the  counters  can be specified numerically as described
           above.

           The lines of output are formatted similar to the following:



             Aliases for most useful HW counters:

                 alias      raw name                     type      units regs description

                 cycles     Cycles_user                       CPU-cycles 0123 CPU Cycles
                 insts      Instr_all                             events 0123 Instructions Executed
                 c_stalls   Commit_0_cyc                      CPU-cycles 0123 Stall Cycles
                 loads      Instr_ld       memoryspace            events 0123 Load Instructions
                 stores     Instr_st       memoryspace            events 0123 Store Instructions
                 dcm        DC_miss_commit memoryspace            events 0123 L1 D-cache Misses
             ...

             Raw HW counters:

                 name                                    type      units regs description

                 Sel_pipe_drain_cyc                           CPU-cycles 0123
                 Sel_0_wait_cyc                               CPU-cycles 0123
                 Sel_0_ready_cyc                              CPU-cycles 0123
             ...


           The top section labeled Aliases for most useful  HW  counters  con‐
           tains the following columns.



           alias        Gives  a  convenient non-processor-specific alias that
                        can be used in a -h argument.


           raw name     Lists the real  unaliased  processor-specific  counter
                        name.


           type         Lists counter type information, when applicable. Coun‐
                        ters of type memoryspace can be used  for  memoryspace
                        and,  where  available, dataspace profiling. Rarely, a
                        not-program-related type appears indicating a  counter
                        that   captures   events  that  cannot  be  attributed
                        directly to your program. Specifying  such  a  counter
                        produces  a  warning  and  profiling will not record a
                        call stack; time will be attributed to  an  artificial
                        function   called  collector_not_program_related;  and
                        Thread IDs and LWP IDs will be meaningless.


           units        Shows either CPU-cycles  which  can  approximately  be
                        converted to time during analysis, or events which are
                        raw hardware counts.


           regs         Specifies which registers can be used for the counter.


           description  Provides a description of the counter


           The Raw HW counters section is similar except that no  aliases  are
           listed.  Introductory  paragraphs  describing the counters might be
           available for certain processors.

           If the two aliases cycles and insts are collected,  two  additional
           metrics  are  available,  CPI  (cycles  per  instruction)  and  IPC
           (instructions per cycle). A high CPI ratio or a low IPC ratio indi‐
           cates  code that runs inefficiently in the machine. A low CPI ratio
           or a high IPC ratio indicates code that  runs  efficiently  in  the
           pipeline.

           EXAMPLES:

           Example  1:  Using  the  aliased  counter information listed in the
           above sample output, the following command:


             collect -p hi -h cycles

           enables CPU Cycles profiling, with hi chosen  to  generate  a  peak
           event  rate  of  approximately 1000 events/second/thread. Note that
           generating too high an event rate will ultimately distort the  per‐
           formance you are trying to profile.


       Synchronization Delay Tracing

           Synchronization  delay  tracing  records  all  calls to the various
           thread synchronization routines where the real-time  delay  in  the
           call  exceeds a specified threshold. The data packet contains time‐
           stamps for entry and exit  to  the  synchronization  routines,  the
           thread  ID,  and  the  LWP ID at the time the request is initiated.
           Synchronization requests from a thread can be initiated on one LWP,
           but complete on another.

           Synchronization  delay tracing data is converted into the following
           metrics:


             Synchronization Wait Time (sync)
             Synchronization Delay Events (syncn)




       Heap Tracing

           Heap tracing records all calls to malloc, free, realloc,  memalign,
           and  valloc  with the size of the block requested, its address, and
           for realloc, the previous address. Calls to calloc are recorded  on
           Oracle Solaris but not on Linux.

           Heap tracing data is converted into the following metrics:



             Allocations (heapalloccnt)
             Bytes Allocated (heapallocbytes)
             Leaks (heapleakcnt)
             Bytes Leaked (heapleakbytes)


           Leaks  are  defined  as  allocations that are not freed. If a zero-
           length block is allocated, it counts as  an  allocation  with  zero
           bytes  allocated. If a zero-length block is not freed, it counts as
           a leak with zero bytes leaked.

           Heap tracing experiments can be very large, and might  be  slow  to
           process.


       IO Tracing

           IO tracing records all calls to the standard IO routines and all IO
           system calls.

           IO tracing data is converted into the following metrics:


             Bytes Read (ioreadbytes)
             Read Count (ioreadcnt)
             Read Time (ioreadtime)
             Bytes Written (iowritebytes)
             Write Count (iowritecnt)
             Write Time (iowritetime)
             Other IO Count (ioothrcnt)
             Other IO Time (ioothertime)
             IO Error Count (ioerrornt)
             IO Error Time (ioerrortime)




       MPI Tracing

           MPI tracing records calls to the MPI library for functions that can
           take  a  significant  amount  of  time  to complete. MPI tracing is
           implemented using the Open Source Vampir Trace code.

           MPI tracing data is converted into the following metrics:



             MPI Time (mpitime)
             MPI Sends (mpisendcnt)
             MPI Bytes Sent (mpisendbytes)
             MPI Receives (mpirecvcnt)
             MPI Bytes Received (mpirecvbytes)
             Other MPI Events (mpiothercnt)


           MPI Time is the total thread time spent in the MPI function. If MPI
           state  times  are  also collected, MPI Work Time plus MPI Wait Time
           for all MPI functions other than MPI_Init and  MPI_Finalize  should
           approximately  equal MPI Work Time. On Linux, MPI Wait and MPI Work
           are based on user+system CPU time, while MPI Time is based on  real
           time, so the numbers will not match.

           The  MPI  Bytes  Received  metric counts the actual number of bytes
           received in all messages. MPI Bytes Sent counts the  actual  number
           of  bytes sent in all messages. MPI Sends counts the number of mes‐
           sages  sent,  and  MPI  Receives  counts  the  number  of  messages
           received.  MPI_Sendrecv  counts  as  both a send and a receive. MPI
           Other Events counts the events in the trace that are neither  sends
           nor receives.


       Count Data

           Count  data is recorded by instrumenting the executable, and count‐
           ing the number of times each  instruction  was  executed.  It  also
           counts  the  number of times the first instruction in a function is
           executed, and calls that the function  execution  count.  On  SPARC
           systems  only, it also counts the number of times an instruction in
           a branch-delay slot is annulled.

           Count data is converted into the following metrics:


             Bit Func Count (bit_fcount)
             Bit Inst Exec (bit_instx)
             Bit Inst Annul (bit_annul) -- SPARC only




       Data-race Detection Data

           Data-race detection data consists of pairs  of  race-access  events
           that  constitute  a  race. The events are combined into a race, and
           races for which the call stacks for the two  access  are  identical
           are merged into a race group.

           Data-race detection data is converted into the following metric:


             Race Accesses (raccess)




       Deadlock Detection Data

           Deadlock detection data consists of pairs of threads with conflict‐
           ing locks.

           Deadlock detection data is converted into the following metric:


             Deadlocks (deadlocks)




       Process-Wide Resource-Utilization Samples

           Process-wide resource utilization can be sampled occasionally.  The
           data  is  attributed  to  the process and does not map to function-
           level metrics.

           Process-wide resource utilization is always sampled  at  the  start
           and  termination  of  the  process.  By default or if a non-zero -S
           argument is specified, samples are taken periodically at the speci‐
           fied  interval. In addition, samples can be taken by using the lib‐
           collector(3) API.

           The data recorded at each sample consists of microstate  accounting
           information  from  the  kernel, along with various other statistics
           maintained within the kernel.


ENVIRONMENT VARIABLES
       SP_COLLECTOR_JAVA_MAX_STACKDEPTH

           Set the maximum number of callstack frames captured, or set to  '0'
           to  prevent  capturing  Java callstacks. The default behavior is to
           capture up to 256 frames.


       SP_COLLECTOR_NATIVE_MAX_STACKDEPTH

           Set the maximum number of callstack frames captured, or set to  '0'
           to  prevent capturing native callstacks. The default behavior is to
           capture up to 256 frames. When profiling Java on x86 systems,  set‐
           ting  SP_COLLECTOR_NATIVE_MAX_STACKDEPTH=0 might reduce the risk of
           fatal errors related to native stack unwind. When native callstacks
           are disabled, JNI and assembly stacks will not be captured.


       SP_COLLECTOR_NO_VALIDATE

           Define this variable to disable checking hardware, system, and Java
           versions. The default is to do all checks.  Setting  this  variable
           will significantly speed up the start-up of the collect command.


       SP_COLLECTOR_OUTPUT

           Specify filename to redirect the collect output to specified file.


       SP_COLLECTOR_SIZE_LIMIT

           When  using  the  -c  on option, enables you to specify the maximum
           size of the experiment in megabytes. For all collect options except
           -c on, you can use -L to specify a maximum experiment size.


       SP_ER_PRINT_ALLOW_COREDUMP

           Define  this  variable  to allow the operating system to generate a
           core file if the analyzer back-end (er_print process) encounters  a
           fatal  error. If not defined, the analyzer back-end will not gener‐
           ate core files, but will instead create an error report located  at
           /tmp/analyzer.process-ID/crash.sigsignal.process-ID  where process-
           ID is the Process ID and signal is the signal number.


       SP_COLLECTOR_HWC_DEFAULT

           Define this variable to turn on profiling with the default hardware
           counters. This is equivalent to using the -h auto option.


       SP_COLLECTOR_NO_OMP

           Define  this variable to suppress tracking of parallel regions. The
           program will have substantially less dilation, but  the  data  from
           slave threads will not propagate to main().


       SP_COLLECTOR_OLDOMP

           Define this variable to profile a program built with compilers from
           Sun Studio 12.0 or earlier versions.



RESTRICTIONS
       Most of the Performance Analyzer binaries depend on  finding  a  shared
       library  from  the installation containing the binaries. Users must not
       set LD_LIBRARY_PATH to include any library directories from a different
       installation  of  the  tools. The binaries might fail to execute if the
       LD_LIBRARY_PATH is set to a different installation.


       By default, the Collector collects stacks that are 256 frames deep.  To
       support   deeper   stacks,  set  the  environment  variable  SP_COLLEC‐
       TOR_NATIVE_MAX_STACKDEPTH to a larger number. If you  are  profiling  a
       Java binary, set the SP_COLLECTOR_JAVA_MAX_STACKDEPTH environment vari‐
       able.


       The Collector interposes on some signal-handling  routines  to  protect
       its use of SIGPROF signals for clock-based profiling and SIGEMT (Oracle
       Solaris) or SIGIO  (Linux)  for  hardware  counter  overflow  profiling
       against disruption by the target program. See the section "DATA COLLEC‐
       TION AND SIGNALS" above.


       The Collector interposes on setitimer(2) for clock profiling,  periodic
       sampling,  and hardware counter checking. Any setitimer calls from tar‐
       get programs will fail.


       On Oracle Solaris, the Collector interposes on functions in  the  hard‐
       ware  counter  library,  libcpc.so,  so  that an application cannot use
       hardware counters while the Collector is collecting  performance  data.
       The interposed functions return a value of -1.


       Dataspace  profiling  is only available on SPARC systems running Oracle
       Solaris.


       For this release, the data from process-wide resource utilization  sam‐
       ples might not be reliable on systems running the Linux OS.


       Hardware  counter overflow profiling cannot be run on an Oracle Solaris
       system where cpustat is running, because cpustat takes control  of  the
       counters, and does not let a user process use them.


       Java Profiling requires Java 2 SDK (JDK) 7, Update 11 or later JDK 7's.


       collect  cannot  be  used  on  executables compiled with -xprofile=tcov
       flag.


       Data is not collected on descendant processes that are created  to  use
       the  setuid  attribute, nor on any descendant processes created with an
       exec call for an executable that is not  dynamically  linked.  Further‐
       more,  subsequent  descendant  processes  might  produce  corrupted  or
       unreadable experiments. The workaround is to ensure that all  processes
       spawned are dynamically-linked and do not have the setuid attribute.


       Applications  that call vfork(2) have these calls replaced by a call to
       fork1(2).


       Count data (collect -c) cannot be collected on Oracle Linux 5  systems;
       count  data cannot be collected for 32-bit binaries on any Linux system
       at all.


       On Linux systems,  data  cannot  be  collected  on  applications  using
       clone(2) with the CLONE_VM flag.

SEE ALSO
       analyzer(1),    collector(1),    dbx(1),    er_archive(1),    er_cp(1),
       er_export(1), er_mv(1) , er_print(1), er_rm(1), tha(1), libcollector(3)


       Performance Analyzer manual



Studio 12.6                        May 2017                         collect(1)
맨 페이지 내용의 저작권은 맨 페이지 작성자에게 있습니다.
RSS ATOM XHTML 5 CSS3