Tensor Collections

The construct of a Collection groups tensors together. A Collection is identified by a string representing the name of the collection. It can be used to group tensors of a particular kind such as “losses”, “weights”, “biases”, or “gradients”. A Collection has its own list of tensors specified by include regex patterns, and other parameters determining how these tensors should be saved and when. Using collections enables you to save different types of tensors at different frequencies and in different forms. These collections are then also available during analysis so you can query a group of tensors at once.

There are a number of built-in collections that SageMaker Debugger manages by default. This means that the library takes care of identifying what tensors should be saved as part of that collection. You can also define custom collections, to do which there are couple of different ways.

You can specify which of these collections to save in the hook’s include_collections parameter, or through the collection_configs parameter to the DebuggerHookConfig in the SageMaker Python SDK.

Built in Collections

Below is a comprehensive list of the built-in collections that are managed by SageMaker Debugger. The Hook identifes the tensors that should be saved as part of that collection for that framework and saves them if they were requested.

The names of these collections are all lower case strings.

Name

Supported by frameworks/hooks

Description

all

all

Matches all tensors

default

all

It’s a default collection created, which matches the regex patterns passed as include_regex to the Hook

weights

TensorFlow, PyTorch, MXNet

Matches all weights of the model

biases

TensorFlow, PyTorch, MXNet

Matches all biases of the model

gradients

TensorFlow, PyTorch, MXNet

Matches all gradients of the model. In TensorFlow when not using Zero Script Change environments, must use hoo k.wrap_optimizer().

losses

TensorFlow, PyTorch, MXNet

Saves the loss for the model

metrics

TensorFlow’s KerasHook, XGBoost

For KerasHook, saves the metrics computed by Keras for the model. For XGBoost, the evaluation metrics computed by the algorithm.

outputs

TensorFlow’s KerasHook

Matches the outputs of the model

layers

TensorFlow’s KerasHook

Input and output of intermediate convolutional layers

sm_metrics

TensorFlow

You can add scalars that you want to show up in SageMaker Metrics to this collection. SageMaker Debugger will save these scalars both to the out_dir of the hook, as well as to SageMaker Metric. Note that the scalars passed here will be saved on AWS servers outside of your AWS account.

optimizer_variables

TensorFlow’s KerasHook

Matches all optimizer variables, currently only supported in Keras.

hyperparameters

XGBoost

Booster paramamete rs

predictions

XGBoost

Predictions on validation set (if provided)

labels

XGBoost

Labels on validation set (if provided)

feature_importance

XGBoost

Feature importance given by g et_score()

full_shap

XGBoost

A matrix of (nsmaple, nfeatures + 1) with each record indicating the feature contributions (SHAP valu es) for that prediction. Computed on training data with predic t()

average_shap

XGBoost

The sum of SHAP value magnitudes over all samples. Represents the impact each feature has on the model output.

trees

XGBoost

Boosted tree model given by trees_to_dataframe( )

Default collections saved

The following collections are saved regardless of the hook configuration.

Framework

Default collections saved

TensorFlow

METRICS, LOSSES, SM_METRICS

PyTorch

LOSSES

MXNet

LOSSES

XGBoost

METRICS

If for some reason, you want to disable the saving of these collections, you can do so by setting end_step to 0 in the collection’s SaveConfig. When using the SageMaker Python SDK this would look like

from sagemaker.debugger import DebuggerHookConfig, CollectionConfig

hook_config = DebuggerHookConfig(
    s3_output_path='s3://smdebug-dev-demo-pdx/mnist',
    collection_configs=[
        CollectionConfig(name="metrics", parameters={"end_step": 0})
    ]
)

When configuring the Collection in your Python script, it would be as follows:

hook.get_collection("metrics").save_config.end_step = 0

Creating or retrieving a Collection

Function

Behavior

hook. get_collection(collection_name)

Returns the collection with the given name. Creates the collection with default configuration if it doesn’t already exist. A new collection created by default does not match any tensor and is configured to save histograms and distributions along with the tensor if tensorboard support is enabled, and uses the reduction configuration and save configuration passed to the hook.

Properties of a Collection

Property

Description

tensor_names

Get or set list of tensor names as strings

include_regex

Get or set list of regexes to include. Tensors whose names match these regex patterns will be included in the collection

reduction_config

Get or set the ReductionConfig object to be used for tensors part of this collection

save_config

Get or set the SaveConfig object to be used for tensors part of this collection

save_histogram

Get or set the boolean flag which determines whether to write histograms to enable histograms and distributions in TensorBoard, for tensors part of this collection. Only applicable if TensorBoard support is enabled.

Methods on a Collection

Method

Behavior

coll.include(regex)

Takes a regex string or a list of regex strings to match tensors to include in the collection.

coll.add(tensor)

(TensorFlow only) Takes an instance or list or set of tf.Tensor/tf.Variable /tf.MirroredVariable/tf.Operation to add to the collection.

coll.add_keras_layer(lay er, inputs=False, outputs=True)

(tf.keras only) Takes an instance of a tf.keras layer and logs input/output tensors for that module. By default, only outputs are saved.

coll.add_module_tensors(modu le, inputs=False, outputs=True)

(PyTorch only) Takes an instance of a PyTorch module and logs input/output tensors for that module. By default, only outputs are saved.

coll.add_block_tensors(blo ck, inputs=False, outputs=True)

(MXNet only) Takes an instance of a Gluon block,and logs input/output tensors for that module. By default, only outputs are saved.

Configuring Collection using SageMaker Python SDK

Parameters to configure Collection are passed as below when using the SageMaker Python SDK.

from sagemaker.debugger import CollectionConfig
coll_config = CollectionConfig(
    name="weights",
    parameters={ "parameter": "value" })

The parameters can be one of the following. The meaning of these parameters will be clear as you review the sections of documentation below. Note that all parameters below have to be strings. So any parameter which accepts a list (such as save_steps, reductions, include_regex), needs to be given as strings separated by a comma between them.

include_regex
save_histogram
reductions
save_raw_tensor
save_interval
save_steps
start_step
end_step
train.save_interval
train.save_steps
train.start_step
train.end_step
eval.save_interval
eval.save_steps
eval.start_step
eval.end_step
predict.save_interval
predict.save_steps
predict.start_step
predict.end_step
global.save_interval
global.save_steps
global.start_step
global.end_step