The timing of a serial (single processor, single threaded) program is usually measured as the ``real CPU time'' (the time which is spent executing commands of the program) by the kernel of the operating system. Thus, idle times during which the program is ``sleeping'' and waiting for other processes or i/o operations to complete are not taken into account.
On the contrary, the timing of a parallel program has to be done in terms of the elapsed time between its invocation and termination (``wall clock time''). Idle times of one of the different instances of the program indicate a load balancing problem, which may lead to a less than optimal scaling. As a result, careful profiling requires, that all instances have an equal environment on their processor (i.e. no other processes producing a high load should be running) and all timings have to made in terms of ``elapsed wall clock time''.
Accurate profiling of the application is supported by several PETSc features, which are activated with command line arguments for the PETSc application: