SMDebug Trial¶
An SMDebug trial is an object which lets you query for tensors for a given training job, specified by the path where SMDebug’s artifacts are saved. Trial is capable of loading new tensors as soon as they become available from the given path, allowing you to do both offline as well as real-time analysis.
Create an SMDebug trial object¶
Depending on the output path, there are two types of trials you can create: LocalTrial or S3Trial. The SMDebug library provides the following wrapper method that automatically creates the right trial.
-
class
smdebug.trials.
create_trial
(path, name=None, profiler=False, output_dir='/opt/ml/processing/outputs/', **kwargs)¶ - Parameters
path (str) – A local path or an S3 path of the form
s3://bucket/prefix
. You should see directories such ascollections
,events
andindex
at this path once the training job starts.name (str) – A name for a trial. It is to help you manage different trials. This is an optional parameter, which defaults to the basename of the path if not passed. Make sure to give it a unique name to prevent duplication.
- Returns
An SMDebug trial instance
- Return type
The following examples show how to create an SMDebug trial object.
Example: Creating an S3 trial
from smdebug.trials import create_trial trial = create_trial( path='s3://smdebug-testing-bucket/outputs/resnet', name='resnet_training_run' )
Example: Creating a local trial
from smdebug.trials import create_trial trial = create_trial( path='/home/ubuntu/smdebug_outputs/resnet', name='resnet_training_run' )
Example: Restricting analysis to a range of steps
You can optionally pass
range_steps
to restrict your analysis to a certain range of steps. Note that if you do so, Trial will not load data from other steps.range_steps=(100, None)
: This will load all steps after 100range_steps=(None, 100)
: This will load all steps before 100range_steps=(100, 200)
: This will load steps between 100 and 200range_steps=None
: This will load all steps
from smdebug.trials import create_trial trial = create_trial( path='s3://smdebug-testing-bucket/outputs/resnet', name='resnet_training', range_steps=(100, 200) )