@dust-tt/sparkle 0.4.9 → 0.4.11

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -112,10 +112,30 @@ footnote [^1]
112
112
 
113
113
 
114
114
 
115
- ### Some lateX
115
+ ### Some LaTeX
116
116
 
117
117
  $$ \\sigma(z_i) = \\frac{e^{z_{i}}}{\\sum_{j=1}^K e^{z_{j}}} \\ \\ \\ for\\ i=1,2,\\dots,K $$
118
118
 
119
+ ### Some inline LaTeX
120
+
121
+ **Example**: Linear attention is a 2-level optimization:
122
+ - Inner level: Memory matrix $\\mathcal{M}_t = \\mathcal{M}_{t-1} + \\mathbf{v}_t \\mathbf{k}_t^\\top$ (updates every token)
123
+ - Outer level: Projection matrices $W_k, W_v, W_q$ (updates during pre-training)
124
+
125
+ Even **optimizers** are associative memories. Momentum with gradient descent is 2-level:
126
+ - Momentum $\\mathbf{m}_t$ compresses past gradients
127
+ - Weights $W_t$ are updated by momentum
128
+
129
+ The result is $a=2+t$
130
+
131
+ ### Some text with dollars signs:
132
+
133
+ One want to import $USER_WORKSPACE but it will cost them $3.5 or $100 $1000
134
+
135
+ -> The EF for this code is 0.49059 kgCO2e per $ (2018 USD).
136
+ -> This code is 0.54895 kgCO2e per $ (2018 USD) more.
137
+ -> This thing is $5-$10 range.
138
+
119
139
  ### This is a CSV:
120
140
 
121
141
  \`\`\`csv