Hook from Python constructor¶
Using the Hook
-
class
smdebug.core.hook.
BaseHook
(collection_manager: smdebug.core.collection_manager.CollectionManager, default_include_collections: List[str], profiler_config_parser: smdebug.profiler.profiler_config_parser.ProfilerConfigParser, init_step: int = 0, out_dir: Optional[str] = None, export_tensorboard: bool = False, tensorboard_dir: Optional[str] = None, dry_run: bool = False, reduction_config: Optional[smdebug.core.reduction_config.ReductionConfig] = None, save_config: Optional[Union[smdebug.core.save_config.SaveConfig, Dict[smdebug.core.modes.ModeKeys, smdebug.core.save_config.SaveConfigMode]]] = None, include_regex: Optional[List[str]] = None, include_collections: Optional[List[str]] = None, save_all: bool = False, include_workers: str = 'one')¶ Bases:
object
A class used to represent the hook which gets attached to the training process. This takes the form appropriate for the framework such as tf.train.SessionRunHook for TF, Callback for keras…
…
-
out_dir
¶ represents a path into which outputs will be written to. The hook raises error if the ‘out_dir’ already exists. The implementation does not support merging the tensors generated in current job with tensors from previous job. Hence, ensure that the ‘out_dir’ does not exist.
- Type
str
-
dry_run
¶ when dry run is set, behavior is only described in the log file. tensors are not actually saved.
- Type
bool
-
save_config
¶ Takes save config object which is applied as default for all included tensors. A collection can optionally have its own saveconfig object which overrides this for its tensors.
- Type
SaveConfig object
-
reduction_config
¶ if passed, this reduction config object is used as default for all tensors included. A collection has its own saveconfig object which overrides this for its tensors. if this is not passed, tensor is saved in full.
- Type
ReductionConfig object
-
include_regex
¶ takes as input the list of string representing regular expressions. Tensors whose names match these regular expressions will be saved. These tensors will be available as part of the default collection.
- Type
list of str
-
include_collections
¶ takes as input the collections which should be saved. if this is empty, it defaults to including all collections from code
- Type
list of str representing collection names
-
save_all
¶ a shortcut for saving all tensors in the model. they are all saved in the collection all
- Type
bool
-
include_workers
¶ makes the hook save data from all workers
- Type
str
-
profiler_config_parser
¶ if passed, use this profiler configuration. by default, set up a new profiler configuration here.
- Type
ProfilerConfigParser object
-
classmethod
create_from_json_file
(json_file_path=None)¶ Relies on the existence of a JSON file.
- First, check json_config_path. If it’s not None,
If the file exists, use that. If the file does not exist, throw an error.
- Otherwise, check the filepath set by a SageMaker environment variable.
If the file exists, use that.
- Otherwise,
return None.
-
has_default_hook_configuration
(default_saved_collections=['losses'])¶
-
has_default_configuration
()¶
-
close
()¶
-
log_outstanding_timeline_metrics
()¶
-
should_save_tensor_or_collection
(**kwargs) → bool¶
-
save_tensor
(tensor_name, tensor_value, collections_to_write='default')¶
-
set_mode
(mode)¶
-
export_collections
()¶
-
record_trace_events
(**kwargs)¶ Write trace events to the timeline. :param training_phase: strings like, data_iterating, forward, backward, operations etc :param op_name: more details about phase like whether dataset or iterator :param phase: this is defaulted to ‘X’ :param timestamp: start_time for the event (in seconds) :param duration: any duration manually computed (in seconds) :param kwargs: can be process id and thread id
-
save_scalar
(name, value, sm_metric=False, timestamp: float = None)¶ Call save_scalar at any point in the training script to log a scalar value, such as a metric or any other value. :param name: Name of the scalar. A prefix ‘scalar/’ will be added to it :param value: Scalar value :param sm_metric: True/False. If set to True, the scalar value will be written to SageMaker
-
get_collection
(name, create=True)¶
-
get_collections
()¶
-
add_collection
(collection)¶
-
hook = HookClass(
out_dir,
export_tensorboard = False,
tensorboard_dir = None,
dry_run = False,
reduction_config = None,
save_config = None,
include_regex = None,
include_collections = None,
save_all = False,
include_workers="one"
)
Parameters:
out_dir
(str): Path where to save tensors and metadata. This is a required argument. Please ensure that the ‘out_dir’ does not exist.
export_tensorboard
(bool): Whether to export TensorBoard summaries (distributions and histograms for tensors saved, and scalar summaries for scalars saved). Defaults toFalse
. Note that when running on SageMaker this parameter will be ignored. You will need to use the TensorBoardOutputConfig section in API to enable TensorBoard summaries. Refer SageMaker page for an example.
tensorboard_dir
(str): Path where to save TensorBoard artifacts. If this is not passed andexport_tensorboard
is True, then TensorBoard artifacts are saved inout_dir/tensorboard
. Note that when running on SageMaker this parameter will be ignored. You will need to use the TensorBoardOutputConfig section in API to enable TensorBoard summaries. Refer SageMaker page for an example.
dry_run
(bool): If true, don’t write any files
reduction_config
: (ReductionConfig object) Specifies the reductions to be applied as default for tensors saved. A collection can have its ownReductionConfig
object which overrides this for the tensors which belong to that collection.
save_config
: (SaveConfig object) Specifies when to save tensors. A collection can have its ownSaveConfig
object which overrides this for the tensors which belong to that collection.
include_regex
(list[str]): list of regex patterns which specify the tensors to save. Tensors whose names match these patterns will be saved
include_collections
(list[str]): List of which collections to save specified by name
save_all
(bool): Saves all tensors and collections. Increases the amount of disk space used, and can reduce the performance of the training job significantly, depending on the size of the model.
include_workers
(str): Used for distributed training. It can take the valuesone
orall
.one
means only the tensors from one chosen worker will be saved. This is the default behavior.all
means tensors from all workers will be saved.