Tensorflow¶
Contents¶
SMDebug for TensorFlow¶
Amazon SageMaker Debugger and the smdebug
client library
fully support TensorFlow framework.
Using Debugger, you can access tensors of any kind for TensorFlow models, from the Keras model zoo to your own custom model, and save them using Debugger built-in or custom tensor collections. You can run your training script on the official AWS Deep Learning Containers where Debugger can automatically capture tensors from your training job. It doesn’t matter whether your TensorFlow models use Keras API or pure TensorFlow API (in eager mode or non-eager mode), you can directly run them on the AWS Deep Learning Containers.
Debugger and its client library smdebug
support debugging your
training job on other AWS training containers and custom containers. In
this case, a hook registration process is required to manually add the
hook features to your training script. For a full list of AWS TensorFlow
containers to use Debugger, see SageMaker containers to use Debugger
with script
mode.
For a complete guide for using custom containers, see Use Debugger in
Custom Training
Containers.
Features supported by SMDebug¶
Debug training jobs with the TensorFlow framework or Keras TensorFlow
Debug training jobs with the TensorFlow eager or non-eager mode
Extended built-in tensor collections:
inputs
,outputs
,layers
, andgradients
Hook APIs to save model parameters:
save_tensors
,save_scalar
Using Debugger on AWS Deep Learning Containers with TensorFlow¶
The Debugger built-in rules and hook features are fully integrated with the AWS Deep Learning Containers. You can run your training script without any script changes. When running training jobs on those Deep Learning Containers, Debugger registers its hooks automatically to your training script in order to retrieve tensors. To find a comprehensive guide of using the high-level SageMaker TensorFlow estimator with Debugger, see Amazon SageMaker Debugger with TensorFlow in the Amazon SageMaker Developer Guide.
The following code example provides the base structure for a SageMaker TensorFlow estimator with Debugger.
from sagemaker.tensorflow import TensorFlow
from sagemaker.debugger import Rule, DebuggerHookConfig, CollectionConfig, rule_configs
tf_estimator = TensorFlow(
entry_point = "tf-train.py",
role = "SageMakerRole",
train_instance_count = 1,
train_instance_type = "ml.p2.xlarge",
framework_version = "2.2.0",
py_version = "py37"
# Debugger-specific Parameters
rules = [
Rule.sagemaker(rule_configs.vanishing_gradient()),
Rule.sagemaker(rule_configs.loss_not_decreasing()),
...
],
debugger_hook_config = DebuggerHookConfig(
CollectionConfig(name="inputs"),
CollectionConfig(name="outputs"),
CollectionConfig(name="layers"),
CollectionConfig(name="gradients")
...
)
)
tf_estimator.fit("s3://bucket/path/to/training/data")
Note
The SageMaker TensorFlow estimator and the Debugger
collections in the example are based on the SageMaker python SDK v2
and smdebug
v0.9.2. It is highly recommended to upgrade the
packages by executing the following command line.
pip install -U sagemaker
pip install -U smdebug
If you are using Jupyter Notebook, put exclamation mark at the front of the code lines and restart your kernel. For more information about the SageMaker Python SDK, see Use Version 2.x of the SageMaker Python SDK.
Debugger Built-in Tensor Collections for TensorFlow¶
The following table lists the pre-configured tensor collections for
TensorFlow models. You can pick any tensor collections by specifying the
name
parameter of CollectionConfig()
as shown in the previous
base code example. SageMaker Debugger will save these tensors to the
default out_dir of the hook.
Name |
Description |
---|---|
|
Matches all tensors. |
|
Includes |
|
For KerasHook, saves the metrics computed by Keras for the model. |
|
Saves all losses of the model. |
|
Saves scalars that you want to include in the SageMaker metrics collection. |
|
Matches all model inputs to the model. |
|
Matches all model outputs of the model, such as predictions (logits) and labels. |
|
Matches all inputs and outputs of intermediate layers. |
|
Matches all gradients of the model. |
|
Matches all weights of the model. |
|
Matches all biases of the model. |
|
Matches all optimizer variables, currently only supported for Keras. |
For more information about adjusting the tensor collection parameters, see Save Tensors Using Debugger Modified Built-in Collections.
For a full list of available tensor collection parameters, see Configuring Collection using SageMaker Python SDK.
Note
The inputs
, outputs
, gradients
, and layers
built-in collections are currently available for TensorFlow versions
<2.0 and ==2.2.0.
Using Debugger on SageMaker Training Containers and Custom Containers¶
If you want to run your own training script or custom containers other than the AWS Deep Learning Containers in the previous option, you can use any of the following options:
Option 1 - Use the SageMaker TensorFlow training containers with training script modification
Option 2 - Use your custom container with modified training script and push the container to Amazon ECR.
For both options, you need to manually register the Debugger hook to your training script. Depending on the TensorFlow and Keras API operations used to construct your model, you need to pick the right TensorFlow hook class, register the hook, and then save the tensors.
Step 1: Create a hook¶
To create the hook constructor, add the following code to your training
script. This enables the smdebug
tools for TensorFlow and creates a
TensorFlow hook
object. When you run the fit()
API for training,
specify the smdebug hook
as callbacks
, as shown in the following
subsections.
Depending on the TensorFlow versions and the Keras API that you use in
your training script, you need to choose the right hook class. The hook
constructors for TensorFlow that you can choose are smd.KerasHook
,
smd.SessionHook
, and smd.EstimatorHook
.
KerasHook¶
If you use the Keras model zoo and a Keras model.fit()
API, use
KerasHook
. KerasHook
is available for the Keras model with the
TensorFlow backend interface. KerasHook
covers the eager execution
modes and the gradient tape features that are introduced in the
TensorFlow framework version 2.0. You can set the smdebug Keras hook
constructor by adding the following code to your training script. Place
this code line before model.compile()
:
import smdebug.tensorflow as smd
hook = smd.KerasHook.create_from_json_file()
To learn how to fully implement the hook in your training script, see the Keras with the TensorFlow gradient tape and the smdebug hook example scripts.
Note: If you use the AWS Deep Learning Containers for zero script change, Debugger collects most of the tensors through its high-level API, regardless of the eager execution modes.
SessionHook¶
If your model is created in TensorFlow version 1.x with the low-level
approach (not using the Keras API), use SessionHook
. SessionHook
is for the TensorFlow 1.x monitored training session API,
tf.train.MonitoredSessions()
, as shown following:
import smdebug.tensorflow as smd
hook = smd.SessionHook.create_from_json_file()
To learn how to fully implement the hook into your training script, see the TensorFlow monitored training session with the smdebug hook example script.
Note: The official TensorFlow library deprecated the
tf.train.MonitoredSessions()
API in favor oftf.function()
in TensorFlow 2.0 and later. You can useSessionHook
fortf.function()
in TensorFlow 2.0 and later.
EstimatorHook¶
If you have a model using the tf.estimator()
API, use
EstimatorHook
. EstimatorHook
is available for any TensorFlow
framework versions that support the tf.estimator()
API, as shown
following:
import smdebug.tensorflow as smd
hook = smd.EstimatorHook.create_from_json_file()
To learn how to fully implement the hook into your training script, see the simple MNIST training script with the Tensorflow estimator.
Step 2: Wrap the optimizer and the gradient tape to retrieve gradient tensors¶
The smdebug TensorFlow hook provides tools to manually retrieve
gradients
tensors specific to the TensorFlow framework.
If you want to save gradients
(for example, from the Keras Adam
optimizer) wrap it with the hook as shown following:
optimizer = tf.keras.optimizers.Adam(learning_rate=args.lr)
optimizer = hook.wrap_optimizer(optimizer)
If you want to save gradients and outputs tensors from the TensorFlow
GradientTape
feature, wrap tf.GradientTape
with the smdebug
hook.wrap_tape
method and save using the hook.save_tensor
function. The input of hook.save_tensor
is in (tensor_name,
tensor_value, collections_to_write=“default”) format. For example:
with hook.wrap_tape(tf.GradientTape(persistent=True)) as tape:
logits = model(data, training=True)
loss_value = cce(labels, logits)
hook.save_tensor("y_labels", labels, "outputs")
hook.save_tensor("predictions", logits, "outputs")
grads = tape.gradient(loss_value, model.variables)
hook.save_tensor("grads", grads, "gradients")
These smdebug hook wrapper functions capture the gradient tensors, not affecting your optimization logic at all.
For examples of code structures that you can use to apply the hook wrappers, see the Code Examples section.
Step 3: Register the hook to model.fit()¶
To collect the tensors from the hooks that you registered, add
callbacks=[hook]
to the Keras model.fit()
API. This will pass
the SageMaker Debugger hook as a Keras callback. Similarly, add
hooks=[hook]
to the MonitoredSession()
, tf.function()
, and
tf.estimator()
APIs. For example:
model.fit(X_train, Y_train,
batch_size=batch_size,
epochs=epoch,
validation_data=(X_valid, Y_valid),
shuffle=True,
# smdebug modification: Pass the hook as a Keras callback
callbacks=[hook])
Step 4: Perform actions using the hook APIs¶
For a full list of actions that the hook APIs offer to construct hooks and save tensors, see Common hook API and TensorFlow specific hook API.
Code Examples¶
The following code examples show the base structures that you can use for hook registration in various TensorFlow training scripts. If you want to use the high-level Debugger features with zero script change on AWS Deep Learning Containers, see Use Debugger in AWS Containers.
Keras API (tf.keras)¶
The following code example shows how to register the smdebug
KerasHook
for the Keras model.fit()
. You can also set the hook
mode to track stored tensors in different phases of training job. For a
list of available hook modes, see smdebug modes.
import smdebug.tensorflow as smd
hook = smd.KerasHook.create_from_json_file()
model = tf.keras.models.Sequential([ ... ])
model.compile(
optimizer='adam',
loss='sparse_categorical_crossentropy',
)
# Add the hook as a callback
# Set hook.set_mode to set tensors to be stored in different phases of training job, such as TRAIN and EVAL
hook.set_mode(mode=smd.modes.TRAIN)
model.fit(x_train, y_train, epochs=args.epochs, callbacks=[hook])
hook.set_mode(mode=smd.modes.EVAL)
model.evaluate(x_test, y_test, callbacks=[hook])
Keras GradientTape example for TensorFlow 2.0 and later¶
The following code example shows how to register the smdebug
KerasHook
by wrapping the TensorFlow GradientTape()
with the
smdebug hook.wrap_tape()
API.
import smdebug.tensorflow as smd
hook = smd.KerasHook.create_from_json_file()
model = tf.keras.models.Sequential([ ... ])
for epoch in range(n_epochs):
for data, labels in dataset:
dataset_labels = labels
# wrap the tape to capture tensors
with hook.wrap_tape(tf.GradientTape(persistent=True)) as tape:
logits = model(data, training=True) # (32,10)
loss_value = cce(labels, logits)
grads = tape.gradient(loss_value, model.variables)
opt.apply_gradients(zip(grads, model.variables))
acc = train_acc_metric(dataset_labels, logits)
# manually save metric values
hook.save_tensor(tensor_name="accuracy", tensor_value=acc, collections_to_write="default")
Monitored Session (tf.train.MonitoredSession)¶
The following code example shows how to register the smdebug
SessionHook
.
import smdebug.tensorflow as smd
hook = smd.SessionHook.create_from_json_file()
loss = tf.reduce_mean(tf.matmul(...), name="loss")
optimizer = tf.train.AdamOptimizer(args.lr)
# Wrap the optimizer
optimizer = hook.wrap_optimizer(optimizer)
# Add the hook as a callback
sess = tf.train.MonitoredSession(hooks=[hook])
sess.run([loss, ...])
Estimator (tf.estimator.Estimator)¶
The following code example shows how to register the smdebug
EstimatorHook
. You can also set the hook mode to track stored
tensors in different phases of training job. For a list of available
hook modes, see smdebug modes.
import smdebug.tensorflow as smd
hook = smd.EstimatorHook.create_from_json_file()
train_input_fn, eval_input_fn = ...
estimator = tf.estimator.Estimator(...)
# Set hook.set_mode to set tensors to be stored in different phases of training job, such as TRAIN and EVAL
hook.set_mode(mode=smd.modes.TRAIN)
estimator.train(input_fn=train_input_fn, steps=args.steps, hooks=[hook])
hook.set_mode(mode=smd.modes.EVAL)
estimator.evaluate(input_fn=eval_input_fn, steps=args.steps, hooks=[hook])
References¶
The smdebug API for saving tensors¶
See the API for saving tensors page for details about the Hooks, Collection, SaveConfig, and ReductionConfig. See the Analysis page for details about analyzing a training job.
TensorFlow References¶
TF 1.x:
TF 2.1:
TF 2.2: