Hook from Python constructor

Using the Hook

class smdebug.core.hook.BaseHook(collection_manager: smdebug.core.collection_manager.CollectionManager, default_include_collections: List[str], profiler_config_parser: smdebug.profiler.profiler_config_parser.ProfilerConfigParser, init_step: int = 0, out_dir: Optional[str] = None, export_tensorboard: bool = False, tensorboard_dir: Optional[str] = None, dry_run: bool = False, reduction_config: Optional[smdebug.core.reduction_config.ReductionConfig] = None, save_config: Optional[Union[smdebug.core.save_config.SaveConfig, Dict[smdebug.core.modes.ModeKeys, smdebug.core.save_config.SaveConfigMode]]] = None, include_regex: Optional[List[str]] = None, include_collections: Optional[List[str]] = None, save_all: bool = False, include_workers: str = 'one')

Bases: object

A class used to represent the hook which gets attached to the training process. This takes the form appropriate for the framework such as tf.train.SessionRunHook for TF, Callback for keras…

out_dir

represents a path into which outputs will be written to. The hook raises error if the ‘out_dir’ already exists. The implementation does not support merging the tensors generated in current job with tensors from previous job. Hence, ensure that the ‘out_dir’ does not exist.

Type

str

dry_run

when dry run is set, behavior is only described in the log file. tensors are not actually saved.

Type

bool

save_config

Takes save config object which is applied as default for all included tensors. A collection can optionally have its own saveconfig object which overrides this for its tensors.

Type

SaveConfig object

reduction_config

if passed, this reduction config object is used as default for all tensors included. A collection has its own saveconfig object which overrides this for its tensors. if this is not passed, tensor is saved in full.

Type

ReductionConfig object

include_regex

takes as input the list of string representing regular expressions. Tensors whose names match these regular expressions will be saved. These tensors will be available as part of the default collection.

Type

list of str

include_collections

takes as input the collections which should be saved. if this is empty, it defaults to including all collections from code

Type

list of str representing collection names

save_all

a shortcut for saving all tensors in the model. they are all saved in the collection all

Type

bool

include_workers

makes the hook save data from all workers

Type

str

profiler_config_parser

if passed, use this profiler configuration. by default, set up a new profiler configuration here.

Type

ProfilerConfigParser object

classmethod create_from_json_file(json_file_path=None)

Relies on the existence of a JSON file.

First, check json_config_path. If it’s not None,

If the file exists, use that. If the file does not exist, throw an error.

Otherwise, check the filepath set by a SageMaker environment variable.

If the file exists, use that.

Otherwise,

return None.

has_default_hook_configuration(default_saved_collections=['losses'])
has_default_configuration()
close()
log_outstanding_timeline_metrics()
should_save_tensor_or_collection(**kwargs) → bool
save_tensor(tensor_name, tensor_value, collections_to_write='default')
set_mode(mode)
export_collections()
record_trace_events(**kwargs)

Write trace events to the timeline. :param training_phase: strings like, data_iterating, forward, backward, operations etc :param op_name: more details about phase like whether dataset or iterator :param phase: this is defaulted to ‘X’ :param timestamp: start_time for the event (in seconds) :param duration: any duration manually computed (in seconds) :param kwargs: can be process id and thread id

save_scalar(name, value, sm_metric=False, timestamp: float = None)

Call save_scalar at any point in the training script to log a scalar value, such as a metric or any other value. :param name: Name of the scalar. A prefix ‘scalar/’ will be added to it :param value: Scalar value :param sm_metric: True/False. If set to True, the scalar value will be written to SageMaker

get_collection(name, create=True)
get_collections()
add_collection(collection)
hook = HookClass(
    out_dir,
    export_tensorboard = False,
    tensorboard_dir = None,
    dry_run = False,
    reduction_config = None,
    save_config = None,
    include_regex = None,
    include_collections = None,
    save_all = False,
    include_workers="one"
)

Parameters:

  • out_dir (str): Path where to save tensors and metadata. This is a required argument. Please ensure that the ‘out_dir’ does not exist.

  • export_tensorboard (bool): Whether to export TensorBoard summaries (distributions and histograms for tensors saved, and scalar summaries for scalars saved). Defaults to False. Note that when running on SageMaker this parameter will be ignored. You will need to use the TensorBoardOutputConfig section in API to enable TensorBoard summaries. Refer SageMaker page for an example.

  • tensorboard_dir (str): Path where to save TensorBoard artifacts. If this is not passed and export_tensorboard is True, then TensorBoard artifacts are saved in out_dir/tensorboard . Note that when running on SageMaker this parameter will be ignored. You will need to use the TensorBoardOutputConfig section in API to enable TensorBoard summaries. Refer SageMaker page for an example.

  • dry_run (bool): If true, don’t write any files

  • reduction_config: (ReductionConfig object) Specifies the reductions to be applied as default for tensors saved. A collection can have its own ReductionConfig object which overrides this for the tensors which belong to that collection.

  • save_config: (SaveConfig object) Specifies when to save tensors. A collection can have its own SaveConfig object which overrides this for the tensors which belong to that collection.

  • include_regex (list[str]): list of regex patterns which specify the tensors to save. Tensors whose names match these patterns will be saved

  • include_collections (list[str]): List of which collections to save specified by name

  • save_all (bool): Saves all tensors and collections. Increases the amount of disk space used, and can reduce the performance of the training job significantly, depending on the size of the model.

  • include_workers (str): Used for distributed training. It can take the values one or all. one means only the tensors from one chosen worker will be saved. This is the default behavior. all means tensors from all workers will be saved.