Stable Audio Tools and Apple Silicon
Stable Audio Tools will default to using the CPU if it doesn’t detect a CUDA device. But it only takes a few adjustments of the original repo to make Stable Audio Tools utilize the Apple Silicon inside your machine if you have an M1 or better. Using a finetuned version of the base model I reduced my sample generation time from 51 seconds per generation (on cpu
) to ~17 seconds per generation (on mps
). This model was finetuned to generate samples only 3 seconds long but I was happy to cut inference time in half.
Here are the commands I used to create a python environment for inferencing Stable Audio locally.
Before we get started:
- Operating System
ProductName: macOS
ProductVersion: 14.1
BuildVersion: 23B2073
Kernel: 23.1.0
- Environment
zsh: 5.9 (x86_64-apple-darwin23.0)
Homebrew: 4.3.5
Python: 3.8.19
Note: brew install python3.8
as brew is one of the easiest ways to manage multiple python version versions on the same machine.
Environment Commands
cd "$HOME/code/ml/music/generation/"
git clone https://github.com/Stability-AI/stable-audio-tools
cd stable-audio-tools/
python3.8 -m venv venv-sat
venv-sat/bin/python -m pip install --upgrade pip wheel
# Something about setuptools 70 release broke pkg_resources
venv-sat/bin/python -m pip install setuptools==69.5.1
# Activate the environment
source venv-sat/bin/activate
# Run the remaining commands within the activated environment
pip install --pre torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/nightly/cpu
pip install .
# INFERENCE
# Base Model
python run_gradio.py --ckpt-path models/base/model.ckpt --model-config models/base/model_config.json
# Finetuned Model
python run_gradio.py --ckpt-path models/finetune_banyan/model_banyan.ckpt --model-config models/finetune_banyan/model_config_banyan.json
Script Commands
These are the adjustments I made to the python scripts in the repo: >>> git diff
diff --git a/stable_audio_tools/inference/generation.py b/stable_audio_tools/inference/generation.py
index 843ab4b..74f4bb9 100644--- a/stable_audio_tools/inference/generation.py
+++ b/stable_audio_tools/inference/generation.py
@@ -14,7 +14,7 @@ def generate_diffusion_uncond(
batch_size: int = 1,
sample_size: int = 2097152,
seed: int = -1,- device: str = "cuda",
+ device: str = "mps",
init_audio: tp.Optional[tp.Tuple[int, torch.Tensor]] = None,
init_noise_level: float = 1.0,
return_latents = False,@@ -99,7 +99,7 @@ def generate_diffusion_cond(
sample_size: int = 2097152,
sample_rate: int = 48000,
seed: int = -1,- device: str = "cuda",
+ device: str = "mps",
init_audio: tp.Optional[tp.Tuple[int, torch.Tensor]] = None,
init_noise_level: float = 1.0,
mask_args: dict = None,diff --git a/stable_audio_tools/interface/gradio.py b/stable_audio_tools/interface/gradio.py
index b46c8d4..a1b8b10 100644--- a/stable_audio_tools/interface/gradio.py
+++ b/stable_audio_tools/interface/gradio.py
@@ -665,7 +665,7 @@ def create_ui(model_config_path=None, ckpt_path=None, pretrained_name=None, pret
else:
model_config = None
- device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+ device = torch.device("mps" if torch.backends.mps.is_available() else "cpu")
_, model_config = load_model(model_config, ckpt_path, pretrained_name=pretrained_name, pretransform_ckpt_path=pretransform_ckpt_path, model_half=model_half, device=device)
model_type = model_config["model_type"]
In plain english that means I edited line 90 of .py
Installing asitop for monitoring
In my journey to utilize MPS I came across asitop. It allows you to monitor what percentage of your Apple GPU is being used in the CLI, among other things. After installing asitop
into its own python venv
I created an alias for asitop
in ~/.zprofile
making it easier to quickly open the monitor: alias asitop="sudo $HOME/code/asito/venv-asi/bin/asitop"
brew install asitop
probably works just as well.