Python Profiler Analysis API

class smdebug.profiler.analysis.python_profile_analysis.PythonProfileAnalysis(local_profile_dir='/tmp/python_stats', s3_path=None)

Bases: object

Analysis class that takes in path to the profile directory, and sets up the python stats reader, which fetches metadata of the python profiling done for each step. Also provides functions for analysis on this profiling, such as fetching stats by a specific step or time interval.

If s3_path is provided, the S3PythonStatsReader is used and local_profile_dir will represent the local directory path that the reader will create the stats directory and then download the stats to. Otherwise, LocalPythonStatsReader is used and local_profile_dir represents the path to the stats directory, which already holds the stats.

Parameters
  • python_stats_reader – PythonStatsReader The reader to use for loading the python stats.

  • python_profile_stats – list of StepPythonProfileStats List of stats for each step profiled.

name = ''
fetch_profile_stats_by_step(start_step, end_step=None, mode=<PythonProfileModes.TRAIN: 1>, start_phase=<StepPhase.STEP_START: 'stepstart'>, end_phase=<StepPhase.STEP_END: 'stepend'>, node_id='any', refresh_stats=True)

API function to fetch stats based on step interval.

fetch_profile_stats_by_time(start_time_since_epoch_in_secs, end_time_since_epoch_in_secs, node_id='any', refresh_stats=True)

API function to fetch stats based on time interval.

fetch_profile_stats_between_modes(start_mode, end_mode, node_id='any', refresh_stats=True)

API function that fetches stats with the provided start and end mode.

fetch_pre_step_zero_profile_stats(node_id='any', refresh_stats=True)

API function that fetches stats from profiling until step 0.

fetch_post_hook_close_profile_stats(node_id='any', refresh_stats=True)

API function that fetches stats from profiling after the hook is closed.

list_profile_stats(refresh_stats=True)

API function that returns a DataFrame of the python profile stats, where each row holds the metadata for each instance of profiling and the corresponding stats file (one per step).

The columns of this DataFrame include:

  • profiler_name: The name of the profiler used to generate this stats file, cProfile or pyinstrument

  • framework: The machine learning framework used in training.

  • start_time_since_epoch_in_micros: The UTC time (in microseconds) at which profiling started for this step.

  • end_time_since_epoch_in_micros: The UTC time (in microseconds) at which profiling finished for this step.

  • node_id The node ID of the node used in the session.

  • start_phase The phase at which python profiling was started.

  • start_step: The step at which python profiling was started. -1 if before step 0.

  • end_phase The phase at which python profiling was stopped.

  • end_step: The step at which python profiling was stopped.

  • stats_path The path to the dumped python stats resulting from profiling this step.

list_available_node_ids(refresh_stats=True)

API function to list the available node IDs we have python profiling stats for.

class smdebug.profiler.analysis.python_profile_analysis.cProfileAnalysis(local_profile_dir='/tmp/python_stats', s3_path=None)

Bases: smdebug.profiler.analysis.python_profile_analysis.PythonProfileAnalysis

Analysis class used specifically for python profiling with cProfile

Analysis class that takes in path to the profile directory, and sets up the python stats reader, which fetches metadata of the python profiling done for each step. Also provides functions for analysis on this profiling, such as fetching stats by a specific step or time interval.

If s3_path is provided, the S3PythonStatsReader is used and local_profile_dir will represent the local directory path that the reader will create the stats directory and then download the stats to. Otherwise, LocalPythonStatsReader is used and local_profile_dir represents the path to the stats directory, which already holds the stats.

Parameters
  • python_stats_reader – PythonStatsReader The reader to use for loading the python stats.

  • python_profile_stats – list of StepPythonProfileStats List of stats for each step profiled.

name = 'cprofile'
fetch_profile_stats_by_training_phase(node_id='any', refresh_stats=True)

API function that fetches and aggregates stats for every possible combination of start and end mode.

For example, if training and validation are done while detailed profiling is enabled, the combinations are:
  • (PRE_STEP_ZERO, TRAIN)

  • (TRAIN, TRAIN)

  • (TRAIN, EVAL)

  • (EVAL, EVAL)

  • (EVAL, POST_HOOK_CLOSE)

All stats files within each of these combinations are aggregated.

fetch_profile_stats_by_job_phase(node_id='any', refresh_stats=True)

API function that fetches and aggregates stats by job phase.

The job phases are:
  • initialization: profiling until step 0 (pre-step zero profiling)

  • training loop: training and validation

  • finalization: profiling after the hook is closed (post-hook-close profiling)

class smdebug.profiler.analysis.python_profile_analysis.PyinstrumentAnalysis(local_profile_dir='/tmp/python_stats', s3_path=None)

Bases: smdebug.profiler.analysis.python_profile_analysis.PythonProfileAnalysis

Analysis class used specifically for python profiling with pyinstrument.

Analysis class that takes in path to the profile directory, and sets up the python stats reader, which fetches metadata of the python profiling done for each step. Also provides functions for analysis on this profiling, such as fetching stats by a specific step or time interval.

If s3_path is provided, the S3PythonStatsReader is used and local_profile_dir will represent the local directory path that the reader will create the stats directory and then download the stats to. Otherwise, LocalPythonStatsReader is used and local_profile_dir represents the path to the stats directory, which already holds the stats.

Parameters
  • python_stats_reader – PythonStatsReader The reader to use for loading the python stats.

  • python_profile_stats – list of StepPythonProfileStats List of stats for each step profiled.

name = 'pyinstrument'

Python Stats Reader API

class smdebug.profiler.analysis.python_stats_reader.PythonStatsReader(profile_dir)

Bases: object

Basic framework for stats reader to retrieve stats from python profiling

Parameters

profile_dir – The path to the directory where the python profile stats are.

load_python_profile_stats()

Load the python profile stats. To be implemented in subclass.

class smdebug.profiler.analysis.python_stats_reader.S3PythonStatsReader(profile_dir, s3_path)

Bases: smdebug.profiler.analysis.python_stats_reader.PythonStatsReader

Higher level stats reader to download python stats from s3.

Parameters
  • profile_dir – The path to the directory where the profile directory is created. The stats will then be downloaded to this newly created directory.

  • s3_path – The path in s3 to the base folder of the logs.

load_python_profile_stats()

Load the stats in by creating the profile directory, downloading each stats directory from s3 to the profile directory, parsing the metadata from each stats directory name and creating a StepPythonProfileStats entry corresponding to the stats file in the stats directory.

For cProfile, the stats file name is python_stats. For pyinstrument, the stats file name python_stats.json.

class smdebug.profiler.analysis.python_stats_reader.LocalPythonStatsReader(profile_dir)

Bases: smdebug.profiler.analysis.python_stats_reader.PythonStatsReader

Higher level stats reader to load the python stats locally.

Parameters

profile_dir – The path to the directory where the python profile stats are.

load_python_profile_stats()

Load the stats in by scanning each stats directory in the profile directory, parsing the metadata from the stats directory name and creating a StepPythonProfileStats entry corresponding to the stats file in the stats directory.

For cProfile, the stats file name is python_stats. For pyinstrument, the stats file name python_stats.json or python_stats.html.