torch-l1-snr 0.0.2__tar.gz → 0.0.4__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: torch-l1-snr
3
- Version: 0.0.2
3
+ Version: 0.0.4
4
4
  Summary: L1-SNR loss functions for audio source separation in PyTorch
5
5
  Home-page: https://github.com/crlandsc/torch-l1-snr
6
6
  Author: Christopher Landscaping
@@ -32,28 +32,29 @@ Dynamic: license-file
32
32
 
33
33
  [![LICENSE](https://img.shields.io/github/license/crlandsc/torch-l1snr)](https://github.com/crlandsc/torch-l1snr/blob/main/LICENSE) [![GitHub Repo stars](https://img.shields.io/github/stars/crlandsc/torch-l1snr)](https://github.com/crlandsc/torch-l1snr/stargazers)
34
34
 
35
- # torch-l1-snr
36
-
37
35
  A PyTorch implementation of L1-based Signal-to-Noise Ratio (SNR) loss functions for audio source separation. This package provides implementations and novel extensions based on concepts from recent academic papers, offering flexible and robust loss functions that can be easily integrated into any PyTorch-based audio separation pipeline.
38
36
 
39
- The core `L1SNRLoss` is based on the loss function described in [1], while `L1SNRDBLoss` and `STFTL1SNRDBLoss` are extensions of the adaptive level-matching regularization technique proposed in [2].
37
+ The core `L1SNRLoss` is based on the loss function described in [[1]](https://arxiv.org/abs/2309.02539), while `L1SNRDBLoss` and `STFTL1SNRDBLoss` are extensions of the adaptive level-matching regularization technique proposed in [[2]](https://arxiv.org/abs/2501.16171).
40
38
 
41
39
  ## Features
42
40
 
43
- - **Time-Domain L1SNR Loss**: A basic, time-domain L1-SNR loss, based on [1].
44
- - **Regularized Time-Domain L1SNRDBLoss**: An extension of the L1SNR loss with adaptive level-matching regularization from [2], plus an optional L1 loss component.
45
- - **Multi-Resolution STFT L1SNRDBLoss**: A spectrogram-domain version of the loss from [2], calculated over multiple STFT resolutions.
41
+ - **Time-Domain L1SNR Loss**: A basic, time-domain L1-SNR loss, based on [[1]](https://arxiv.org/abs/2309.02539).
42
+ - **Regularized Time-Domain L1SNRDBLoss**: An extension of the L1SNR loss with adaptive level-matching regularization from [[2]](https://arxiv.org/abs/2501.16171), plus an optional L1 loss component.
43
+ - **Multi-Resolution STFT L1SNRDBLoss**: A spectrogram-domain version of the loss from [[2]](https://arxiv.org/abs/2501.16171), calculated over multiple STFT resolutions.
46
44
  - **Modular Stem-based Loss**: A wrapper that combines time and spectrogram domain losses and can be configured to run on specific stems.
47
45
  - **Efficient & Robust**: Includes optimizations for pure L1 loss calculation and robust handling of `NaN`/`inf` values and short audio segments.
48
46
 
49
47
  ## Installation
50
48
 
51
- <!-- Add PyPI badges once the package is published -->
52
- <!-- [![PyPI - Python Version](https://img.shields.io/pypi/pyversions/torch-l1snr)](https://pypi.org/project/torch-l1snr/) -->
53
- <!-- [![PyPI - Version](https://img.shields.io/pypi/v/torch-l1snr)](https://pypi.org/project/torch-l1snr/) -->
54
- <!-- [![Number of downloads from PyPI per month](https://img.shields.io/pypi/dm/torch-l1snr)](https://pypi.org/project/torch-l1snr/) -->
49
+ [![PyPI - Python Version](https://img.shields.io/pypi/pyversions/torch-l1-snr)](https://pypi.org/project/torch-l1-snr/) [![PyPI - Version](https://img.shields.io/pypi/v/torch-l1-snr)](https://pypi.org/project/torch-l1-snr/) [![Number of downloads from PyPI per month](https://img.shields.io/pypi/dm/torch-l1-snr)](https://pypi.org/project/torch-l1-snr/)
50
+
51
+ ## Install from PyPI
52
+
53
+ ```bash
54
+ pip install torch-l1-snr
55
+ ```
55
56
 
56
- You can install the package directly from GitHub:
57
+ ## Install from GitHub
57
58
 
58
59
  ```bash
59
60
  pip install git+https://github.com/crlandsc/torch-l1snr.git
@@ -90,9 +91,13 @@ from torch_l1snr import L1SNRDBLoss
90
91
  estimates = torch.randn(4, 32000) # Batch of 4, 32000 samples
91
92
  actuals = torch.randn(4, 32000)
92
93
 
93
- # Initialize the loss function
94
- # l1_weight=0.1 blends L1SNR with 10% L1 loss
95
- loss_fn = L1SNRDBLoss(l1_weight=0.1)
94
+ # Initialize the loss function with regularization enabled
95
+ # l1_weight=0.1 blends L1SNR+Regularization with 10% L1 loss
96
+ loss_fn = L1SNRDBLoss(
97
+ name="l1_snr_db_loss",
98
+ use_regularization=True, # Enable adaptive level-matching regularization
99
+ l1_weight=0.1 # 10% L1 loss, 90% L1SNR + regularization
100
+ )
96
101
 
97
102
  # Calculate loss
98
103
  loss = loss_fn(estimates, actuals)
@@ -112,8 +117,11 @@ estimates = torch.randn(4, 32000)
112
117
  actuals = torch.randn(4, 32000)
113
118
 
114
119
  # Initialize the loss function
115
- # Uses multiple STFT resolutions by default
116
- loss_fn = STFTL1SNRDBLoss(l1_weight=0.0) # Pure L1SNR + Regularization
120
+ # Uses multiple STFT resolutions by default: [512, 1024, 2048] FFT sizes
121
+ loss_fn = STFTL1SNRDBLoss(
122
+ name="stft_l1_snr_db_loss",
123
+ l1_weight=0.0 # Pure L1SNR (no regularization, no L1)
124
+ )
117
125
 
118
126
  # Calculate loss
119
127
  loss = loss_fn(estimates, actuals)
@@ -137,9 +145,12 @@ actuals = torch.randn(2, 2, 44100)
137
145
 
138
146
  # --- Configuration ---
139
147
  loss_fn = MultiL1SNRDBLoss(
140
- weight=1.0, # Overall weight for this loss
141
- spec_weight=0.7, # 70% spectrogram loss, 30% time-domain loss
142
- l1_weight=0.1, # Use 10% L1, 90% L1SNR+Reg
148
+ name="multi_l1_snr_db_loss",
149
+ weight=1.0, # Overall weight for this loss
150
+ spec_weight=0.6, # 60% spectrogram loss, 40% time-domain loss
151
+ l1_weight=0.1, # Use 10% L1, 90% L1SNR+Reg in both domains
152
+ use_time_regularization=True, # Enable regularization in time domain
153
+ use_spec_regularization=False # Disable regularization in spec domain
143
154
  )
144
155
  loss = loss_fn(estimates, actuals)
145
156
  print(f"Multi-domain Loss: {loss.item()}")
@@ -155,7 +166,7 @@ The goal of these loss functions is to provide a perceptually-informed and robus
155
166
 
156
167
  #### Level-Matching Regularization
157
168
 
158
- A key feature of `L1SNRDBLoss` is the adaptive regularization term, as described in [2]. This component calculates the difference in decibel-scaled root-mean-square (dBRMS) levels between the estimated and actual signals. An adaptive weight (`lambda`) is applied to this difference, which increases when the model incorrectly silences a non-silent target. This encourages the model to learn the correct output level and specifically avoids the model collapsing to a trivial silent solution when uncertain.
169
+ A key feature of `L1SNRDBLoss` is the adaptive regularization term, as described in [[2]](https://arxiv.org/abs/2501.16171). This component calculates the difference in decibel-scaled root-mean-square (dBRMS) levels between the estimated and actual signals. An adaptive weight (`lambda`) is applied to this difference, which increases when the model incorrectly silences a non-silent target. This encourages the model to learn the correct output level and specifically avoids the model collapsing to a trivial silent solution when uncertain.
159
170
 
160
171
  #### Multi-Resolution Spectrogram Analysis
161
172
 
@@ -194,8 +205,8 @@ The loss functions implemented here are based on the work of the authors of the
194
205
 
195
206
  ## References
196
207
 
197
- [1] K. N. Watcharasupat, C.-W. Wu, Y. Ding, I. Orife, A. J. Hipple, P. A. Williams, S. Kramer, A. Lerch, and W. Wolcott, "A Generalized Bandsplit Neural Network for Cinematic Audio Source Separation," IEEE Open Journal of Signal Processing, 2023. (arXiv:2309.02539)
208
+ [1] K. N. Watcharasupat, C.-W. Wu, Y. Ding, I. Orife, A. J. Hipple, P. A. Williams, S. Kramer, A. Lerch, and W. Wolcott, "A Generalized Bandsplit Neural Network for Cinematic Audio Source Separation," IEEE Open Journal of Signal Processing, 2023. [arXiv:2309.02539](https://arxiv.org/abs/2309.02539)
198
209
 
199
- [2] K. N. Watcharasupat and A. Lerch, "Separate This, and All of these Things Around It: Music Source Separation via Hyperellipsoidal Queries," arXiv:2501.16171.
210
+ [2] K. N. Watcharasupat and A. Lerch, "Separate This, and All of these Things Around It: Music Source Separation via Hyperellipsoidal Queries," [arXiv:2501.16171](https://arxiv.org/abs/2501.16171).
200
211
 
201
- [3] K. N. Watcharasupat and A. Lerch, "A Stem-Agnostic Single-Decoder System for Music Source Separation Beyond Four Stems," Proceedings of the 25th International Society for Music Information Retrieval Conference, 2024. (arXiv:2406.18747)
212
+ [3] K. N. Watcharasupat and A. Lerch, "A Stem-Agnostic Single-Decoder System for Music Source Separation Beyond Four Stems," Proceedings of the 25th International Society for Music Information Retrieval Conference, 2024. [arXiv:2406.18747](https://arxiv.org/abs/2406.18747)
@@ -4,28 +4,29 @@
4
4
 
5
5
  [![LICENSE](https://img.shields.io/github/license/crlandsc/torch-l1snr)](https://github.com/crlandsc/torch-l1snr/blob/main/LICENSE) [![GitHub Repo stars](https://img.shields.io/github/stars/crlandsc/torch-l1snr)](https://github.com/crlandsc/torch-l1snr/stargazers)
6
6
 
7
- # torch-l1-snr
8
-
9
7
  A PyTorch implementation of L1-based Signal-to-Noise Ratio (SNR) loss functions for audio source separation. This package provides implementations and novel extensions based on concepts from recent academic papers, offering flexible and robust loss functions that can be easily integrated into any PyTorch-based audio separation pipeline.
10
8
 
11
- The core `L1SNRLoss` is based on the loss function described in [1], while `L1SNRDBLoss` and `STFTL1SNRDBLoss` are extensions of the adaptive level-matching regularization technique proposed in [2].
9
+ The core `L1SNRLoss` is based on the loss function described in [[1]](https://arxiv.org/abs/2309.02539), while `L1SNRDBLoss` and `STFTL1SNRDBLoss` are extensions of the adaptive level-matching regularization technique proposed in [[2]](https://arxiv.org/abs/2501.16171).
12
10
 
13
11
  ## Features
14
12
 
15
- - **Time-Domain L1SNR Loss**: A basic, time-domain L1-SNR loss, based on [1].
16
- - **Regularized Time-Domain L1SNRDBLoss**: An extension of the L1SNR loss with adaptive level-matching regularization from [2], plus an optional L1 loss component.
17
- - **Multi-Resolution STFT L1SNRDBLoss**: A spectrogram-domain version of the loss from [2], calculated over multiple STFT resolutions.
13
+ - **Time-Domain L1SNR Loss**: A basic, time-domain L1-SNR loss, based on [[1]](https://arxiv.org/abs/2309.02539).
14
+ - **Regularized Time-Domain L1SNRDBLoss**: An extension of the L1SNR loss with adaptive level-matching regularization from [[2]](https://arxiv.org/abs/2501.16171), plus an optional L1 loss component.
15
+ - **Multi-Resolution STFT L1SNRDBLoss**: A spectrogram-domain version of the loss from [[2]](https://arxiv.org/abs/2501.16171), calculated over multiple STFT resolutions.
18
16
  - **Modular Stem-based Loss**: A wrapper that combines time and spectrogram domain losses and can be configured to run on specific stems.
19
17
  - **Efficient & Robust**: Includes optimizations for pure L1 loss calculation and robust handling of `NaN`/`inf` values and short audio segments.
20
18
 
21
19
  ## Installation
22
20
 
23
- <!-- Add PyPI badges once the package is published -->
24
- <!-- [![PyPI - Python Version](https://img.shields.io/pypi/pyversions/torch-l1snr)](https://pypi.org/project/torch-l1snr/) -->
25
- <!-- [![PyPI - Version](https://img.shields.io/pypi/v/torch-l1snr)](https://pypi.org/project/torch-l1snr/) -->
26
- <!-- [![Number of downloads from PyPI per month](https://img.shields.io/pypi/dm/torch-l1snr)](https://pypi.org/project/torch-l1snr/) -->
21
+ [![PyPI - Python Version](https://img.shields.io/pypi/pyversions/torch-l1-snr)](https://pypi.org/project/torch-l1-snr/) [![PyPI - Version](https://img.shields.io/pypi/v/torch-l1-snr)](https://pypi.org/project/torch-l1-snr/) [![Number of downloads from PyPI per month](https://img.shields.io/pypi/dm/torch-l1-snr)](https://pypi.org/project/torch-l1-snr/)
22
+
23
+ ## Install from PyPI
24
+
25
+ ```bash
26
+ pip install torch-l1-snr
27
+ ```
27
28
 
28
- You can install the package directly from GitHub:
29
+ ## Install from GitHub
29
30
 
30
31
  ```bash
31
32
  pip install git+https://github.com/crlandsc/torch-l1snr.git
@@ -62,9 +63,13 @@ from torch_l1snr import L1SNRDBLoss
62
63
  estimates = torch.randn(4, 32000) # Batch of 4, 32000 samples
63
64
  actuals = torch.randn(4, 32000)
64
65
 
65
- # Initialize the loss function
66
- # l1_weight=0.1 blends L1SNR with 10% L1 loss
67
- loss_fn = L1SNRDBLoss(l1_weight=0.1)
66
+ # Initialize the loss function with regularization enabled
67
+ # l1_weight=0.1 blends L1SNR+Regularization with 10% L1 loss
68
+ loss_fn = L1SNRDBLoss(
69
+ name="l1_snr_db_loss",
70
+ use_regularization=True, # Enable adaptive level-matching regularization
71
+ l1_weight=0.1 # 10% L1 loss, 90% L1SNR + regularization
72
+ )
68
73
 
69
74
  # Calculate loss
70
75
  loss = loss_fn(estimates, actuals)
@@ -84,8 +89,11 @@ estimates = torch.randn(4, 32000)
84
89
  actuals = torch.randn(4, 32000)
85
90
 
86
91
  # Initialize the loss function
87
- # Uses multiple STFT resolutions by default
88
- loss_fn = STFTL1SNRDBLoss(l1_weight=0.0) # Pure L1SNR + Regularization
92
+ # Uses multiple STFT resolutions by default: [512, 1024, 2048] FFT sizes
93
+ loss_fn = STFTL1SNRDBLoss(
94
+ name="stft_l1_snr_db_loss",
95
+ l1_weight=0.0 # Pure L1SNR (no regularization, no L1)
96
+ )
89
97
 
90
98
  # Calculate loss
91
99
  loss = loss_fn(estimates, actuals)
@@ -109,9 +117,12 @@ actuals = torch.randn(2, 2, 44100)
109
117
 
110
118
  # --- Configuration ---
111
119
  loss_fn = MultiL1SNRDBLoss(
112
- weight=1.0, # Overall weight for this loss
113
- spec_weight=0.7, # 70% spectrogram loss, 30% time-domain loss
114
- l1_weight=0.1, # Use 10% L1, 90% L1SNR+Reg
120
+ name="multi_l1_snr_db_loss",
121
+ weight=1.0, # Overall weight for this loss
122
+ spec_weight=0.6, # 60% spectrogram loss, 40% time-domain loss
123
+ l1_weight=0.1, # Use 10% L1, 90% L1SNR+Reg in both domains
124
+ use_time_regularization=True, # Enable regularization in time domain
125
+ use_spec_regularization=False # Disable regularization in spec domain
115
126
  )
116
127
  loss = loss_fn(estimates, actuals)
117
128
  print(f"Multi-domain Loss: {loss.item()}")
@@ -127,7 +138,7 @@ The goal of these loss functions is to provide a perceptually-informed and robus
127
138
 
128
139
  #### Level-Matching Regularization
129
140
 
130
- A key feature of `L1SNRDBLoss` is the adaptive regularization term, as described in [2]. This component calculates the difference in decibel-scaled root-mean-square (dBRMS) levels between the estimated and actual signals. An adaptive weight (`lambda`) is applied to this difference, which increases when the model incorrectly silences a non-silent target. This encourages the model to learn the correct output level and specifically avoids the model collapsing to a trivial silent solution when uncertain.
141
+ A key feature of `L1SNRDBLoss` is the adaptive regularization term, as described in [[2]](https://arxiv.org/abs/2501.16171). This component calculates the difference in decibel-scaled root-mean-square (dBRMS) levels between the estimated and actual signals. An adaptive weight (`lambda`) is applied to this difference, which increases when the model incorrectly silences a non-silent target. This encourages the model to learn the correct output level and specifically avoids the model collapsing to a trivial silent solution when uncertain.
131
142
 
132
143
  #### Multi-Resolution Spectrogram Analysis
133
144
 
@@ -166,8 +177,8 @@ The loss functions implemented here are based on the work of the authors of the
166
177
 
167
178
  ## References
168
179
 
169
- [1] K. N. Watcharasupat, C.-W. Wu, Y. Ding, I. Orife, A. J. Hipple, P. A. Williams, S. Kramer, A. Lerch, and W. Wolcott, "A Generalized Bandsplit Neural Network for Cinematic Audio Source Separation," IEEE Open Journal of Signal Processing, 2023. (arXiv:2309.02539)
180
+ [1] K. N. Watcharasupat, C.-W. Wu, Y. Ding, I. Orife, A. J. Hipple, P. A. Williams, S. Kramer, A. Lerch, and W. Wolcott, "A Generalized Bandsplit Neural Network for Cinematic Audio Source Separation," IEEE Open Journal of Signal Processing, 2023. [arXiv:2309.02539](https://arxiv.org/abs/2309.02539)
170
181
 
171
- [2] K. N. Watcharasupat and A. Lerch, "Separate This, and All of these Things Around It: Music Source Separation via Hyperellipsoidal Queries," arXiv:2501.16171.
182
+ [2] K. N. Watcharasupat and A. Lerch, "Separate This, and All of these Things Around It: Music Source Separation via Hyperellipsoidal Queries," [arXiv:2501.16171](https://arxiv.org/abs/2501.16171).
172
183
 
173
- [3] K. N. Watcharasupat and A. Lerch, "A Stem-Agnostic Single-Decoder System for Music Source Separation Beyond Four Stems," Proceedings of the 25th International Society for Music Information Retrieval Conference, 2024. (arXiv:2406.18747)
184
+ [3] K. N. Watcharasupat and A. Lerch, "A Stem-Agnostic Single-Decoder System for Music Source Separation Beyond Four Stems," Proceedings of the 25th International Society for Music Information Retrieval Conference, 2024. [arXiv:2406.18747](https://arxiv.org/abs/2406.18747)
@@ -1,6 +1,6 @@
1
1
  [metadata]
2
2
  name = torch-l1-snr
3
- version = 0.0.2
3
+ version = 0.0.4
4
4
  author = Christopher Landscaping
5
5
  author_email = crlandschoot@gmail.com
6
6
  description = L1-SNR loss functions for audio source separation in PyTorch
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: torch-l1-snr
3
- Version: 0.0.2
3
+ Version: 0.0.4
4
4
  Summary: L1-SNR loss functions for audio source separation in PyTorch
5
5
  Home-page: https://github.com/crlandsc/torch-l1-snr
6
6
  Author: Christopher Landscaping
@@ -32,28 +32,29 @@ Dynamic: license-file
32
32
 
33
33
  [![LICENSE](https://img.shields.io/github/license/crlandsc/torch-l1snr)](https://github.com/crlandsc/torch-l1snr/blob/main/LICENSE) [![GitHub Repo stars](https://img.shields.io/github/stars/crlandsc/torch-l1snr)](https://github.com/crlandsc/torch-l1snr/stargazers)
34
34
 
35
- # torch-l1-snr
36
-
37
35
  A PyTorch implementation of L1-based Signal-to-Noise Ratio (SNR) loss functions for audio source separation. This package provides implementations and novel extensions based on concepts from recent academic papers, offering flexible and robust loss functions that can be easily integrated into any PyTorch-based audio separation pipeline.
38
36
 
39
- The core `L1SNRLoss` is based on the loss function described in [1], while `L1SNRDBLoss` and `STFTL1SNRDBLoss` are extensions of the adaptive level-matching regularization technique proposed in [2].
37
+ The core `L1SNRLoss` is based on the loss function described in [[1]](https://arxiv.org/abs/2309.02539), while `L1SNRDBLoss` and `STFTL1SNRDBLoss` are extensions of the adaptive level-matching regularization technique proposed in [[2]](https://arxiv.org/abs/2501.16171).
40
38
 
41
39
  ## Features
42
40
 
43
- - **Time-Domain L1SNR Loss**: A basic, time-domain L1-SNR loss, based on [1].
44
- - **Regularized Time-Domain L1SNRDBLoss**: An extension of the L1SNR loss with adaptive level-matching regularization from [2], plus an optional L1 loss component.
45
- - **Multi-Resolution STFT L1SNRDBLoss**: A spectrogram-domain version of the loss from [2], calculated over multiple STFT resolutions.
41
+ - **Time-Domain L1SNR Loss**: A basic, time-domain L1-SNR loss, based on [[1]](https://arxiv.org/abs/2309.02539).
42
+ - **Regularized Time-Domain L1SNRDBLoss**: An extension of the L1SNR loss with adaptive level-matching regularization from [[2]](https://arxiv.org/abs/2501.16171), plus an optional L1 loss component.
43
+ - **Multi-Resolution STFT L1SNRDBLoss**: A spectrogram-domain version of the loss from [[2]](https://arxiv.org/abs/2501.16171), calculated over multiple STFT resolutions.
46
44
  - **Modular Stem-based Loss**: A wrapper that combines time and spectrogram domain losses and can be configured to run on specific stems.
47
45
  - **Efficient & Robust**: Includes optimizations for pure L1 loss calculation and robust handling of `NaN`/`inf` values and short audio segments.
48
46
 
49
47
  ## Installation
50
48
 
51
- <!-- Add PyPI badges once the package is published -->
52
- <!-- [![PyPI - Python Version](https://img.shields.io/pypi/pyversions/torch-l1snr)](https://pypi.org/project/torch-l1snr/) -->
53
- <!-- [![PyPI - Version](https://img.shields.io/pypi/v/torch-l1snr)](https://pypi.org/project/torch-l1snr/) -->
54
- <!-- [![Number of downloads from PyPI per month](https://img.shields.io/pypi/dm/torch-l1snr)](https://pypi.org/project/torch-l1snr/) -->
49
+ [![PyPI - Python Version](https://img.shields.io/pypi/pyversions/torch-l1-snr)](https://pypi.org/project/torch-l1-snr/) [![PyPI - Version](https://img.shields.io/pypi/v/torch-l1-snr)](https://pypi.org/project/torch-l1-snr/) [![Number of downloads from PyPI per month](https://img.shields.io/pypi/dm/torch-l1-snr)](https://pypi.org/project/torch-l1-snr/)
50
+
51
+ ## Install from PyPI
52
+
53
+ ```bash
54
+ pip install torch-l1-snr
55
+ ```
55
56
 
56
- You can install the package directly from GitHub:
57
+ ## Install from GitHub
57
58
 
58
59
  ```bash
59
60
  pip install git+https://github.com/crlandsc/torch-l1snr.git
@@ -90,9 +91,13 @@ from torch_l1snr import L1SNRDBLoss
90
91
  estimates = torch.randn(4, 32000) # Batch of 4, 32000 samples
91
92
  actuals = torch.randn(4, 32000)
92
93
 
93
- # Initialize the loss function
94
- # l1_weight=0.1 blends L1SNR with 10% L1 loss
95
- loss_fn = L1SNRDBLoss(l1_weight=0.1)
94
+ # Initialize the loss function with regularization enabled
95
+ # l1_weight=0.1 blends L1SNR+Regularization with 10% L1 loss
96
+ loss_fn = L1SNRDBLoss(
97
+ name="l1_snr_db_loss",
98
+ use_regularization=True, # Enable adaptive level-matching regularization
99
+ l1_weight=0.1 # 10% L1 loss, 90% L1SNR + regularization
100
+ )
96
101
 
97
102
  # Calculate loss
98
103
  loss = loss_fn(estimates, actuals)
@@ -112,8 +117,11 @@ estimates = torch.randn(4, 32000)
112
117
  actuals = torch.randn(4, 32000)
113
118
 
114
119
  # Initialize the loss function
115
- # Uses multiple STFT resolutions by default
116
- loss_fn = STFTL1SNRDBLoss(l1_weight=0.0) # Pure L1SNR + Regularization
120
+ # Uses multiple STFT resolutions by default: [512, 1024, 2048] FFT sizes
121
+ loss_fn = STFTL1SNRDBLoss(
122
+ name="stft_l1_snr_db_loss",
123
+ l1_weight=0.0 # Pure L1SNR (no regularization, no L1)
124
+ )
117
125
 
118
126
  # Calculate loss
119
127
  loss = loss_fn(estimates, actuals)
@@ -137,9 +145,12 @@ actuals = torch.randn(2, 2, 44100)
137
145
 
138
146
  # --- Configuration ---
139
147
  loss_fn = MultiL1SNRDBLoss(
140
- weight=1.0, # Overall weight for this loss
141
- spec_weight=0.7, # 70% spectrogram loss, 30% time-domain loss
142
- l1_weight=0.1, # Use 10% L1, 90% L1SNR+Reg
148
+ name="multi_l1_snr_db_loss",
149
+ weight=1.0, # Overall weight for this loss
150
+ spec_weight=0.6, # 60% spectrogram loss, 40% time-domain loss
151
+ l1_weight=0.1, # Use 10% L1, 90% L1SNR+Reg in both domains
152
+ use_time_regularization=True, # Enable regularization in time domain
153
+ use_spec_regularization=False # Disable regularization in spec domain
143
154
  )
144
155
  loss = loss_fn(estimates, actuals)
145
156
  print(f"Multi-domain Loss: {loss.item()}")
@@ -155,7 +166,7 @@ The goal of these loss functions is to provide a perceptually-informed and robus
155
166
 
156
167
  #### Level-Matching Regularization
157
168
 
158
- A key feature of `L1SNRDBLoss` is the adaptive regularization term, as described in [2]. This component calculates the difference in decibel-scaled root-mean-square (dBRMS) levels between the estimated and actual signals. An adaptive weight (`lambda`) is applied to this difference, which increases when the model incorrectly silences a non-silent target. This encourages the model to learn the correct output level and specifically avoids the model collapsing to a trivial silent solution when uncertain.
169
+ A key feature of `L1SNRDBLoss` is the adaptive regularization term, as described in [[2]](https://arxiv.org/abs/2501.16171). This component calculates the difference in decibel-scaled root-mean-square (dBRMS) levels between the estimated and actual signals. An adaptive weight (`lambda`) is applied to this difference, which increases when the model incorrectly silences a non-silent target. This encourages the model to learn the correct output level and specifically avoids the model collapsing to a trivial silent solution when uncertain.
159
170
 
160
171
  #### Multi-Resolution Spectrogram Analysis
161
172
 
@@ -194,8 +205,8 @@ The loss functions implemented here are based on the work of the authors of the
194
205
 
195
206
  ## References
196
207
 
197
- [1] K. N. Watcharasupat, C.-W. Wu, Y. Ding, I. Orife, A. J. Hipple, P. A. Williams, S. Kramer, A. Lerch, and W. Wolcott, "A Generalized Bandsplit Neural Network for Cinematic Audio Source Separation," IEEE Open Journal of Signal Processing, 2023. (arXiv:2309.02539)
208
+ [1] K. N. Watcharasupat, C.-W. Wu, Y. Ding, I. Orife, A. J. Hipple, P. A. Williams, S. Kramer, A. Lerch, and W. Wolcott, "A Generalized Bandsplit Neural Network for Cinematic Audio Source Separation," IEEE Open Journal of Signal Processing, 2023. [arXiv:2309.02539](https://arxiv.org/abs/2309.02539)
198
209
 
199
- [2] K. N. Watcharasupat and A. Lerch, "Separate This, and All of these Things Around It: Music Source Separation via Hyperellipsoidal Queries," arXiv:2501.16171.
210
+ [2] K. N. Watcharasupat and A. Lerch, "Separate This, and All of these Things Around It: Music Source Separation via Hyperellipsoidal Queries," [arXiv:2501.16171](https://arxiv.org/abs/2501.16171).
200
211
 
201
- [3] K. N. Watcharasupat and A. Lerch, "A Stem-Agnostic Single-Decoder System for Music Source Separation Beyond Four Stems," Proceedings of the 25th International Society for Music Information Retrieval Conference, 2024. (arXiv:2406.18747)
212
+ [3] K. N. Watcharasupat and A. Lerch, "A Stem-Agnostic Single-Decoder System for Music Source Separation Beyond Four Stems," Proceedings of the 25th International Society for Music Information Retrieval Conference, 2024. [arXiv:2406.18747](https://arxiv.org/abs/2406.18747)
File without changes