Tensor Collections¶
The construct of a Collection groups tensors together. A Collection is identified by a string representing the name of the collection. It can be used to group tensors of a particular kind such as “losses”, “weights”, “biases”, or “gradients”. A Collection has its own list of tensors specified by include regex patterns, and other parameters determining how these tensors should be saved and when. Using collections enables you to save different types of tensors at different frequencies and in different forms. These collections are then also available during analysis so you can query a group of tensors at once.
There are a number of built-in collections that SageMaker Debugger manages by default. This means that the library takes care of identifying what tensors should be saved as part of that collection. You can also define custom collections, to do which there are couple of different ways.
You can specify which of these collections to save in the hook’s
include_collections
parameter, or through the collection_configs
parameter to the DebuggerHookConfig
in the SageMaker Python SDK.
Built in Collections¶
Below is a comprehensive list of the built-in collections that are managed by SageMaker Debugger. The Hook identifes the tensors that should be saved as part of that collection for that framework and saves them if they were requested.
The names of these collections are all lower case strings.
Name |
Supported by frameworks/hooks |
Description |
---|---|---|
|
all |
Matches all tensors |
|
all |
It’s a default
collection created,
which matches the
regex patterns passed
as |
|
TensorFlow, PyTorch, MXNet |
Matches all weights of the model |
|
TensorFlow, PyTorch, MXNet |
Matches all biases of the model |
|
TensorFlow, PyTorch, MXNet |
Matches all gradients
of the model. In
TensorFlow when not
using Zero Script
Change environments,
must use
|
|
TensorFlow, PyTorch, MXNet |
Saves the loss for the model |
|
TensorFlow’s KerasHook, XGBoost |
For KerasHook, saves the metrics computed by Keras for the model. For XGBoost, the evaluation metrics computed by the algorithm. |
|
TensorFlow’s KerasHook |
Matches the outputs of the model |
|
TensorFlow’s KerasHook |
Input and output of intermediate convolutional layers |
|
TensorFlow |
You can add scalars that you want to show up in SageMaker Metrics to this collection. SageMaker Debugger will save these scalars both to the out_dir of the hook, as well as to SageMaker Metric. Note that the scalars passed here will be saved on AWS servers outside of your AWS account. |
|
TensorFlow’s KerasHook |
Matches all optimizer variables, currently only supported in Keras. |
|
XGBoost |
|
|
XGBoost |
Predictions on validation set (if provided) |
|
XGBoost |
Labels on validation set (if provided) |
|
XGBoost |
Feature importance given by g et_score() |
|
XGBoost |
A matrix of (nsmaple, nfeatures + 1) with each record indicating the feature contributions (SHAP valu es) for that prediction. Computed on training data with predic t() |
|
XGBoost |
The sum of SHAP value magnitudes over all samples. Represents the impact each feature has on the model output. |
|
XGBoost |
Boosted tree model given by trees_to_dataframe( ) |
Default collections saved¶
The following collections are saved regardless of the hook configuration.
Framework |
Default collections saved |
---|---|
|
METRICS, LOSSES, SM_METRICS |
|
LOSSES |
|
LOSSES |
|
METRICS |
If for some reason, you want to disable the saving of these collections, you can do so by setting end_step to 0 in the collection’s SaveConfig. When using the SageMaker Python SDK this would look like
from sagemaker.debugger import DebuggerHookConfig, CollectionConfig
hook_config = DebuggerHookConfig(
s3_output_path='s3://smdebug-dev-demo-pdx/mnist',
collection_configs=[
CollectionConfig(name="metrics", parameters={"end_step": 0})
]
)
When configuring the Collection in your Python script, it would be as follows:
hook.get_collection("metrics").save_config.end_step = 0
Creating or retrieving a Collection¶
Function |
Behavior |
---|---|
|
Returns the collection with the given name. Creates the collection with default configuration if it doesn’t already exist. A new collection created by default does not match any tensor and is configured to save histograms and distributions along with the tensor if tensorboard support is enabled, and uses the reduction configuration and save configuration passed to the hook. |
Properties of a Collection¶
Property |
Description |
---|---|
|
Get or set list of tensor names as strings |
|
Get or set list of regexes to include. Tensors whose names match these regex patterns will be included in the collection |
|
Get or set the ReductionConfig object to be used for tensors part of this collection |
|
Get or set the SaveConfig object to be used for tensors part of this collection |
|
Get or set the boolean flag which determines whether to write histograms to enable histograms and distributions in TensorBoard, for tensors part of this collection. Only applicable if TensorBoard support is enabled. |
Methods on a Collection¶
Method |
Behavior |
---|---|
|
Takes a regex string or a list of regex strings to match tensors to include in the collection. |
|
(TensorFlow only) Takes an instance or list or set of tf.Tensor/tf.Variable /tf.MirroredVariable/tf.Operation to add to the collection. |
|
(tf.keras only) Takes an instance of a tf.keras layer and logs input/output tensors for that module. By default, only outputs are saved. |
|
(PyTorch only) Takes an instance of a PyTorch module and logs input/output tensors for that module. By default, only outputs are saved. |
|
(MXNet only) Takes an instance of a Gluon block,and logs input/output tensors for that module. By default, only outputs are saved. |
Configuring Collection using SageMaker Python SDK¶
Parameters to configure Collection are passed as below when using the SageMaker Python SDK.
from sagemaker.debugger import CollectionConfig
coll_config = CollectionConfig(
name="weights",
parameters={ "parameter": "value" })
The parameters can be one of the following. The meaning of these parameters will be clear as you review the sections of documentation below. Note that all parameters below have to be strings. So any parameter which accepts a list (such as save_steps, reductions, include_regex), needs to be given as strings separated by a comma between them.
include_regex
save_histogram
reductions
save_raw_tensor
save_interval
save_steps
start_step
end_step
train.save_interval
train.save_steps
train.start_step
train.end_step
eval.save_interval
eval.save_steps
eval.start_step
eval.end_step
predict.save_interval
predict.save_steps
predict.start_step
predict.end_step
global.save_interval
global.save_steps
global.start_step
global.end_step