Nova exception and log handling policy
Exception and logging policy
=======
Followings are OpenStack exception and logging levels.
CRITICAL
- destruction of data caused by system
- stop of all function of Nova
- each component output message to log
- should specify what component caused problem
ex). when nova find conflict of data
FATAL
- invalid configuration value, abnormal action of external component
-invalid configuration(
- stop the Daemon
- component output message to log
- should specify what configuration and what environment caused problem when
component started to run
ex). full of disk
ERROR
- Limiting value and violation of API
- should continue the action of Daemon
- component output message to log
- should specify what process caused exception
- should specify why excetion is caused ex). no resource when instance starts to run
WARN
- Exception generated by input from the outside
- Validation error
- The operation of Daemon continues.
- component outputs the message to the log.
- The location for the exception can be distinguished.
- The reason for the exception can be distinguished.
ex). Execution of unauthorized API, Disagreement of input parameter
DEBUG
- if FLAGS.verbose is set, only then generate debug logs, otherwise do not generate.
- Only for diagnostic purpose - Useful for developer during bug root cause analysis(RCA).
- Should be very detailed, containing all inputs to API, intermediate data processed by method, output of API
- Use debug() to print intermediate values, and statuses.
Ex: If an API call returns without error but behaviour is not as expected, then set verbosity to debug. Debug messages should help developer see chain of events and specify where the result is unexpected.
INFO
- Use only to convey information.
- Log message string should not contain erroneous or debugging message
- Ex: “Started nova-XX service” , “Shutting down instance”, “Instance rebooted successfully”
“live migration complete” etc.
AUDIT
- Call this log method only for truly important events, for tracking purpose. Generally useful for making billing and accountability easier.
- Ex: “New instance created”, “Instance stopped” etc.
Policy of Exception Class
=======
Do not use exception.Error (This class represent no information)
AttributeError must be avoided
( It is hard to solve problems from AttributeError)
Policy of Exception Handling
=======
Wrap exceptions with more informative exception class.
The intermediate state must be cleaned up.
Exceptions must be cached especially in loop.
- Bad Example
for resource in [A,B,C]:
do_
- Good Example
for resource in [A,B,C]:
try:
except:
raise # if needed
In the bad example, if the exception raised during do_something(B), nothing done for C.
Blueprint information
- Status:
- Started
- Approver:
- Nachi Ueno
- Priority:
- Medium
- Drafter:
- Nachi Ueno
- Direction:
- Needs approval
- Assignee:
- Nachi Ueno
- Definition:
- Approved
- Series goal:
- None
- Implementation:
- Good progress
- Milestone target:
- 2012.1
- Started by
- Nachi Ueno
- Completed by
Related branches
Related bugs
Sprints
Whiteboard
Discussion about Design of Exception Class
=======
Analysis of Current Exception handling scheme in Nova (brief multilevel inheritance snapshot):
Python Base class Nova Base Class Nova Sub-class(level1) Nova Sub-class(level2)
IOError => ProcessExecutio
Exception => Error =>APIError
=
=>DBError
=
=> NovaException =>VirtualInterf
=
=
=>Invalid =>InvalidSignature
=>Duplicate =>KeyPairExists
=>VolumeService
=>ComputeServic
=>VolumeNotFoun
=>KernelNotFoun
=>RamdiskNotFou
=>NetworkNotFou
Issues and solutions:
This scheme does not let the user know which service has raised the exception. There is no service level categorization of exceptions.
Example: If an rpc.cast or rpc.call from Compute raises exception, the user is not able to identify that it is coming from compute or from another service.
Persistent error messages? Do we require storing the exception messages into peristent store so that the user can see them later in his dashboard/system logs? Something like event logs.
Eg. User wants to see historically which of his requests failed and what were those invalid parameters. Is this really useful for the user?
Feature-wise segregation of exceptions. Do we need to divide the exception classes based on features of nova -Security group, Instance, Volume etc ? Currently the division is based on the
type of error. Eg: notFound.
Create base classes based on features, and then subclass the different categories of errors such as Invalid, NotFound, Duplicate etc.
OR, Create Base classes based on the error categories, and then subclass them based on Feature.
Which of the above options are feasible? Can they add more clarity to the user than the current scheme?