Configure Hook using SageMaker Python SDK¶

After you make the changes to your training script, you can configure the hook with parameters to the SageMaker Debugger API operation, DebuggerHookConfig.

from sagemaker.debugger import DebuggerHookConfig

collection_configs=[
    CollectionConfig(name="tensor_collection_1")
    CollectionConfig(name="tensor_collection_2")
    ...
    CollectionConfig(name="tensor_collection_n")
]

hook_config = DebuggerHookConfig(
    s3_output_path='s3://smdebug-dev-demo-pdx/mnist',
    collection_configs=collection_configs,
    hook_parameters={
       "parameter": "value"
    }
)

Path to SMDebug artifacts¶

To create an SMDebug trial object, you need to know where the SMDebug artifacts are saved.

1. For SageMaker training jobs¶

When running a SageMaker job, SMDebug artifacts are saved to Amazon S3. SageMaker saves data from your training job to a local path of the training container and uploads them to an S3 bucket of your account. When you start a SageMaker training job with the python SDK, you can set the path using the parameter s3_output_path of the DebuggerHookConfig object. If you don’t specify the path, SageMaker automatically sets the output path to your default S3 bucket.

Example

from sagemaker.debugger import CollectionConfig, DebuggerHookConfig

collection_configs=[
    CollectionConfig(name="weights"),
    CollectionConfig(name="gradients")
]

debugger_hook_config=DebuggerHookConfig(
  s3_output_path="specify-your-s3-bucket-uri"  # Optional
  collection_configs=collection_configs
)

For more information, see Configure Debugger Hook to Save Tensors in the Amazon SageMaker Developer Guide.

2. For non-SageMaker training jobs¶

If you are running a training job outside SageMaker, this is the path you pass as out_dir when you create an SMDebug Hook. When creating the hook, you can pass either a local path (for example, /home/ubuntu/smdebug_outputs/) or an S3 bucket path (for example, s3://bucket/prefix).

Hook Configuration Parameter Keys¶

The available hook_parameters keys are listed in the following. The meaning of these parameters will be clear as you review the sections of documentation below. Note that all parameters below have to be strings. So for any parameter which accepts a list (such as save_steps, reductions, include_regex), the value needs to be given as strings separated by a comma between them.

dry_run
save_all
include_workers
include_regex
reductions
save_raw_tensor
save_shape
save_interval
save_steps
start_step
end_step
train.save_interval
train.save_steps
train.start_step
train.end_step
eval.save_interval
eval.save_steps
eval.start_step
eval.end_step
predict.save_interval
predict.save_steps
predict.start_step
predict.end_step
global.save_interval
global.save_steps
global.start_step
global.end_step