@heylemon/lemonade 0.1.6 → 0.1.7
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/dist/build-info.json +3 -3
- package/dist/canvas-host/a2ui/.bundle.hash +1 -1
- package/package.json +1 -1
- package/skills/docx/SKILL.md +595 -22
- package/skills/docx/references/templates.md +669 -33
- package/skills/docx/scripts/create_doc.py +289 -52
- package/skills/docx/scripts/validate.py +237 -0
- package/skills/docx/scripts/validate_doc.py +103 -22
- package/skills/pptx/SKILL.md +169 -12
- package/skills/pptx/editing.md +270 -0
- package/skills/pptx/pptxgenjs.md +624 -0
- package/skills/pptx/references/spec-format.md +106 -31
- package/skills/pptx/scripts/create_pptx.js +419 -186
- package/skills/xlsx/SKILL.md +502 -14
- package/skills/xlsx/references/spec-format.md +238 -40
- package/skills/xlsx/scripts/create_xlsx.py +130 -54
- package/skills/xlsx/scripts/recalc.py +157 -147
- package/skills/xlsx/scripts/validate_xlsx.py +31 -6
package/skills/xlsx/SKILL.md
CHANGED
|
@@ -1,23 +1,511 @@
|
|
|
1
1
|
---
|
|
2
|
-
name:
|
|
3
|
-
description: "Create professional spreadsheets
|
|
4
|
-
license: Proprietary. LICENSE.txt has complete terms
|
|
2
|
+
name: pro-sheets
|
|
3
|
+
description: "Create professional, formatted spreadsheets — dashboards, data tables, financial models, trackers, and reports. Use this skill whenever the user wants to create or edit a spreadsheet, Excel file, tracker, financial model, budget, forecast, data table, or dashboard. Trigger on 'spreadsheet', 'xlsx', 'excel', 'workbook', 'financial model', 'budget', 'forecast', 'tracker', 'dashboard'."
|
|
5
4
|
---
|
|
6
5
|
|
|
7
|
-
# Pro Sheets
|
|
6
|
+
# Pro Sheets — Professional Excel Development
|
|
8
7
|
|
|
9
|
-
|
|
8
|
+
Build polished, formula-driven Excel workbooks using openpyxl. The AI writes Python code directly each time—no wrapper scripts—tailoring each implementation to the user's request. All formulas are calculated via LibreOffice, ensuring accuracy.
|
|
9
|
+
|
|
10
|
+
## Key Architecture
|
|
11
|
+
|
|
12
|
+
- **Direct openpyxl coding**: Write formulas inline, customize formatting, control layout without abstraction layers
|
|
13
|
+
- **Mandatory LibreOffice recalculation**: All formulas evaluated before delivering the file
|
|
14
|
+
- **Industry-standard financial modeling**: Color coding, number formatting, documentation standards
|
|
15
|
+
- **pandas for data prep**: Load, analyze, reshape before building the spreadsheet
|
|
16
|
+
|
|
17
|
+
## Requirements for Outputs
|
|
18
|
+
|
|
19
|
+
Every Excel file must meet these baseline standards:
|
|
20
|
+
|
|
21
|
+
### All Spreadsheets
|
|
22
|
+
- **Font**: Arial or similar professional font, consistent throughout
|
|
23
|
+
- **Formula integrity**: Zero formula errors—no #REF!, #DIV/0!, #VALUE!, #N/A, #NAME?, #NULL!, #NUM!
|
|
24
|
+
- **Template preservation**: When editing existing files, maintain their structure and style
|
|
25
|
+
|
|
26
|
+
### Financial Models (strict standards)
|
|
27
|
+
|
|
28
|
+
**Color Coding** (Excel RGB values):
|
|
29
|
+
- **Blue text (0,0,255)**: Hardcoded input values—numbers the user will change for scenarios
|
|
30
|
+
- **Black text (0,0,0)**: Formula cells (calculations, SUM, cross-references)
|
|
31
|
+
- **Green text (0,128,0)**: Cross-sheet references (formulas that link between sheets)
|
|
32
|
+
- **Red text (255,0,0)**: External references (data pulled from outside sources)
|
|
33
|
+
- **Yellow background**: Key assumptions—the 5-10 drivers that change model outputs most
|
|
34
|
+
|
|
35
|
+
**Number Formatting**:
|
|
36
|
+
- Currency: `$#,##0` or `$#,##0,"K"` for thousands (e.g., $45K)
|
|
37
|
+
- Percentages: `0.0%` (e.g., 15.2%)
|
|
38
|
+
- Multiples/ratios: `0.0x` (e.g., 3.2x)
|
|
39
|
+
- Years: Text format (2026, not 2,026)
|
|
40
|
+
- Zero values: Display as "–" not 0
|
|
41
|
+
- Negative numbers: Parentheses `($#,##0)` not minus sign
|
|
42
|
+
|
|
43
|
+
**Formula Rules**:
|
|
44
|
+
- Assumptions always in separate cells, never hardcoded in formulas
|
|
45
|
+
- ❌ Bad: `=Revenue * 0.15` (hardcoded margin)
|
|
46
|
+
- ✅ Good: `=Revenue * B5` where B5 contains 0.15 with cell comment "Gross Margin"
|
|
47
|
+
- Cell references, not hardcoded values: formulas link cells, not embed numbers
|
|
48
|
+
- Document data sources: If a cell contains a hardcoded number, add a comment explaining its origin
|
|
49
|
+
|
|
50
|
+
**Documentation**:
|
|
51
|
+
- Complex formulas: Add Excel comments explaining logic
|
|
52
|
+
- Data sources: Cite external sources for hardcoded values
|
|
53
|
+
- Assumptions sheet: Keep all key inputs in one place, clearly labeled
|
|
54
|
+
|
|
55
|
+
## Critical: Use Formulas, Not Hardcoded Values
|
|
56
|
+
|
|
57
|
+
This is the core difference between dynamic spreadsheets and static reports. The spreadsheet must recalculate automatically when users change inputs.
|
|
58
|
+
|
|
59
|
+
### Wrong Approach
|
|
60
|
+
```python
|
|
61
|
+
import pandas as pd
|
|
62
|
+
df = pd.read_csv('sales.csv')
|
|
63
|
+
total = df['Sales'].sum()
|
|
64
|
+
sheet['B10'] = total # ❌ Hardcoded—won't update if sales change
|
|
65
|
+
```
|
|
66
|
+
|
|
67
|
+
### Correct Approach
|
|
68
|
+
```python
|
|
69
|
+
# Load data, analyze it to understand structure
|
|
70
|
+
df = pd.read_excel('sales.csv')
|
|
71
|
+
print(df.shape, df.columns)
|
|
72
|
+
|
|
73
|
+
# But put the data in the spreadsheet
|
|
74
|
+
for idx, row in df.iterrows():
|
|
75
|
+
sheet[f'A{idx+2}'] = row['Product']
|
|
76
|
+
sheet[f'B{idx+2}'] = row['Sales']
|
|
77
|
+
|
|
78
|
+
# Use formulas for calculations
|
|
79
|
+
sheet['B10'] = '=SUM(B2:B9)' # ✅ Dynamic—updates when sales change
|
|
80
|
+
```
|
|
81
|
+
|
|
82
|
+
## Reading and Analyzing Data
|
|
83
|
+
|
|
84
|
+
Use pandas to understand data before building the spreadsheet:
|
|
85
|
+
|
|
86
|
+
```python
|
|
87
|
+
import pandas as pd
|
|
88
|
+
|
|
89
|
+
# Single sheet
|
|
90
|
+
df = pd.read_excel('file.xlsx')
|
|
91
|
+
|
|
92
|
+
# All sheets
|
|
93
|
+
all_sheets = pd.read_excel('file.xlsx', sheet_name=None)
|
|
94
|
+
for sheet_name, data in all_sheets.items():
|
|
95
|
+
print(f"{sheet_name}: {data.shape}")
|
|
96
|
+
|
|
97
|
+
# Specify dtypes and ranges
|
|
98
|
+
df = pd.read_excel('file.xlsx', sheet_name='Sales',
|
|
99
|
+
dtype={'Year': str, 'Amount': float},
|
|
100
|
+
usecols=['Product', 'Amount', 'Date'])
|
|
101
|
+
```
|
|
102
|
+
|
|
103
|
+
## Common Workflow
|
|
104
|
+
|
|
105
|
+
1. **Choose the tool**: pandas for data analysis, openpyxl for formulas and formatting
|
|
106
|
+
2. **Load/create workbook**: `from openpyxl import Workbook` or `load_workbook()`
|
|
107
|
+
3. **Modify data, formulas, formatting**: Write cells with openpyxl
|
|
108
|
+
4. **Save**: `wb.save('output.xlsx')`
|
|
109
|
+
5. **Recalculate formulas** (MANDATORY): `python scripts/recalc.py output.xlsx`
|
|
110
|
+
6. **Verify and fix**: Check recalc output for errors
|
|
111
|
+
|
|
112
|
+
## Dashboard Design
|
|
113
|
+
|
|
114
|
+
Dashboards are high-visibility outputs. They must look polished when opened — not cramped, clipped, or hard to read.
|
|
115
|
+
|
|
116
|
+
### KPI Cards That Breathe
|
|
117
|
+
|
|
118
|
+
The #1 mistake: cramming large KPI values into tiny merged cells. Excel clips text that doesn't fit, making stats invisible.
|
|
119
|
+
|
|
120
|
+
**Rules for KPI cards:**
|
|
121
|
+
- **Column width ≥ 18** per card (not 10-14). If merging 2 columns, each should be ≥ 18
|
|
122
|
+
- **Row height for large fonts**: `ws.row_dimensions[row].height = font_size * 2.2` minimum
|
|
123
|
+
- **Explicit row heights** for every KPI row (value, label, change indicator)
|
|
124
|
+
- **Spacer rows** (height 8-12) above and below the KPI band to separate visually
|
|
125
|
+
- **Max 4 KPIs per row** — more than that and they crowd. Use a second row if needed
|
|
126
|
+
|
|
127
|
+
**KPI card pattern** (2 merged columns per card, no gutter columns — keeps table alignment simple):
|
|
128
|
+
```python
|
|
129
|
+
# 4 KPIs = 8 columns (2 per card). Tables below use columns A-D within the same grid.
|
|
130
|
+
kpis = [
|
|
131
|
+
("12,400", "Current MAU", "+34% QoQ", True),
|
|
132
|
+
("$48K", "MRR", "+52% QoQ", True),
|
|
133
|
+
("62", "NPS", "+8 pts", True),
|
|
134
|
+
("$42", "CAC", "↓ from $58", True),
|
|
135
|
+
]
|
|
136
|
+
|
|
137
|
+
kpi_start = 3
|
|
138
|
+
ws.row_dimensions[kpi_start - 1].height = 10 # Spacer above
|
|
139
|
+
|
|
140
|
+
for i, (value, label, change, is_positive) in enumerate(kpis):
|
|
141
|
+
col = i * 2 + 1 # 2 columns per card, no gutter
|
|
142
|
+
|
|
143
|
+
# Merge 2 columns for each row
|
|
144
|
+
for dr in range(3):
|
|
145
|
+
ws.merge_cells(start_row=kpi_start + dr, start_column=col,
|
|
146
|
+
end_row=kpi_start + dr, end_column=col + 1)
|
|
147
|
+
|
|
148
|
+
# Value — large, bold, colored
|
|
149
|
+
ws.cell(kpi_start, col, value).font = Font(name="Arial", size=22, bold=True, color="2E86AB")
|
|
150
|
+
ws.cell(kpi_start, col).alignment = Alignment(horizontal="center", vertical="center")
|
|
151
|
+
|
|
152
|
+
# Label — muted
|
|
153
|
+
ws.cell(kpi_start + 1, col, label).font = Font(name="Arial", size=9, color="7F8C8D")
|
|
154
|
+
ws.cell(kpi_start + 1, col).alignment = Alignment(horizontal="center")
|
|
155
|
+
|
|
156
|
+
# Change — green/red
|
|
157
|
+
color = "27AE60" if is_positive else "E74C3C"
|
|
158
|
+
ws.cell(kpi_start + 2, col, change).font = Font(name="Arial", size=9, bold=True, color=color)
|
|
159
|
+
ws.cell(kpi_start + 2, col).alignment = Alignment(horizontal="center")
|
|
160
|
+
|
|
161
|
+
# Card background + border
|
|
162
|
+
card_fill = PatternFill("solid", fgColor="FFF8E1")
|
|
163
|
+
for dr in range(3):
|
|
164
|
+
for dc in range(2):
|
|
165
|
+
ws.cell(kpi_start + dr, col + dc).fill = card_fill
|
|
166
|
+
ws.cell(kpi_start + dr, col + dc).border = thin_border
|
|
167
|
+
|
|
168
|
+
# Column widths — 18 minimum per column in the card
|
|
169
|
+
ws.column_dimensions[get_column_letter(col)].width = 18
|
|
170
|
+
ws.column_dimensions[get_column_letter(col + 1)].width = 18
|
|
171
|
+
|
|
172
|
+
# ROW HEIGHTS — the critical fix for merged cells
|
|
173
|
+
ws.row_dimensions[kpi_start].height = 48 # Value row — must fit size-22 font
|
|
174
|
+
ws.row_dimensions[kpi_start + 1].height = 20 # Label row
|
|
175
|
+
ws.row_dimensions[kpi_start + 2].height = 20 # Change row
|
|
176
|
+
|
|
177
|
+
# Spacer rows below KPIs before tables start
|
|
178
|
+
ws.row_dimensions[kpi_start + 3].height = 12
|
|
179
|
+
ws.row_dimensions[kpi_start + 4].height = 6
|
|
180
|
+
```
|
|
181
|
+
|
|
182
|
+
**Why no gutter columns:** Adding narrow spacer columns between KPI cards creates misalignment with the data tables below (which use columns A-D). Keep the column grid consistent — KPI cards and tables should share the same column structure where possible.
|
|
183
|
+
|
|
184
|
+
**Common dashboard mistakes to avoid:**
|
|
185
|
+
- Column width under 16 for KPI cards (text clips silently in merged cells)
|
|
186
|
+
- No `row_dimensions[].height` set (Excel auto-height fails with merged cells — this is the #1 cause of hidden stats)
|
|
187
|
+
- KPI cards touching the tables below (add spacer rows, height 10-16)
|
|
188
|
+
- Gutter columns between cards that break table alignment below
|
|
189
|
+
- Too many merged cells in data tables (breaks sorting and filtering)
|
|
190
|
+
- Using merged cells for data tables at all — only use them for titles and KPI cards
|
|
191
|
+
|
|
192
|
+
### Section Titles and Table Spacing
|
|
193
|
+
- Section titles: 1 empty spacer row above (height 16-20), bold font size 13-14
|
|
194
|
+
- Tables: Start on the row immediately after the header row, no gap
|
|
195
|
+
- Between tables: 2 empty rows minimum (or 1 spacer row with height 24+)
|
|
196
|
+
|
|
197
|
+
## Creating New Files
|
|
198
|
+
|
|
199
|
+
Use openpyxl directly. Here's a typical workflow:
|
|
200
|
+
|
|
201
|
+
```python
|
|
202
|
+
from openpyxl import Workbook
|
|
203
|
+
from openpyxl.styles import Font, PatternFill, Alignment, numbers
|
|
204
|
+
from openpyxl.utils import get_column_letter
|
|
205
|
+
|
|
206
|
+
wb = Workbook()
|
|
207
|
+
ws = wb.active
|
|
208
|
+
ws.title = "Sales"
|
|
209
|
+
|
|
210
|
+
# Headers with styling
|
|
211
|
+
headers = ["Product", "Q1 Sales", "Q2 Sales", "Total"]
|
|
212
|
+
for col, header in enumerate(headers, 1):
|
|
213
|
+
cell = ws.cell(1, col)
|
|
214
|
+
cell.value = header
|
|
215
|
+
cell.font = Font(bold=True, color="FFFFFF", name="Arial")
|
|
216
|
+
cell.fill = PatternFill(start_color="1B3A5C", end_color="1B3A5C", fill_type="solid")
|
|
217
|
+
cell.alignment = Alignment(horizontal="center")
|
|
218
|
+
|
|
219
|
+
# Data rows
|
|
220
|
+
products = [("Widget A", 10000, 12000), ("Widget B", 8000, 9500)]
|
|
221
|
+
for row_idx, (product, q1, q2) in enumerate(products, 2):
|
|
222
|
+
ws.cell(row_idx, 1).value = product
|
|
223
|
+
ws.cell(row_idx, 2).value = q1
|
|
224
|
+
ws.cell(row_idx, 3).value = q2
|
|
225
|
+
# Formula for total
|
|
226
|
+
ws.cell(row_idx, 4).value = f"=B{row_idx}+C{row_idx}"
|
|
227
|
+
# Number formatting
|
|
228
|
+
ws.cell(row_idx, 2).number_format = "$#,##0"
|
|
229
|
+
ws.cell(row_idx, 3).number_format = "$#,##0"
|
|
230
|
+
ws.cell(row_idx, 4).number_format = "$#,##0"
|
|
231
|
+
|
|
232
|
+
# Totals row
|
|
233
|
+
total_row = len(products) + 2
|
|
234
|
+
ws.cell(total_row, 1).value = "TOTAL"
|
|
235
|
+
ws.cell(total_row, 2).value = f"=SUM(B2:B{total_row-1})"
|
|
236
|
+
ws.cell(total_row, 3).value = f"=SUM(C2:C{total_row-1})"
|
|
237
|
+
ws.cell(total_row, 4).value = f"=SUM(D2:D{total_row-1})"
|
|
238
|
+
|
|
239
|
+
# Column widths
|
|
240
|
+
ws.column_dimensions['A'].width = 15
|
|
241
|
+
ws.column_dimensions['B'].width = 12
|
|
242
|
+
ws.column_dimensions['C'].width = 12
|
|
243
|
+
ws.column_dimensions['D'].width = 12
|
|
244
|
+
|
|
245
|
+
# Freeze header
|
|
246
|
+
ws.freeze_panes = "A2"
|
|
247
|
+
|
|
248
|
+
# Auto-filter
|
|
249
|
+
ws.auto_filter.ref = f"A1:D{total_row}"
|
|
250
|
+
|
|
251
|
+
wb.save('output.xlsx')
|
|
252
|
+
```
|
|
253
|
+
|
|
254
|
+
### Common Formatting Tasks
|
|
255
|
+
|
|
256
|
+
**Font colors for financial models**:
|
|
257
|
+
```python
|
|
258
|
+
cell = ws['B5']
|
|
259
|
+
cell.font = Font(color="0000FF") # Blue for inputs
|
|
260
|
+
```
|
|
261
|
+
|
|
262
|
+
**Cell number formats**:
|
|
263
|
+
```python
|
|
264
|
+
ws['B2'].number_format = '$#,##0' # Currency
|
|
265
|
+
ws['C3'].number_format = '0.0%' # Percentage
|
|
266
|
+
ws['D4'].number_format = '0.0x' # Multiple
|
|
267
|
+
ws['E5'].number_format = 'YYYY' # Year as text
|
|
268
|
+
```
|
|
269
|
+
|
|
270
|
+
**Borders**:
|
|
271
|
+
```python
|
|
272
|
+
from openpyxl.styles import Border, Side
|
|
273
|
+
|
|
274
|
+
thin_border = Border(
|
|
275
|
+
left=Side(style='thin'),
|
|
276
|
+
right=Side(style='thin'),
|
|
277
|
+
top=Side(style='thin'),
|
|
278
|
+
bottom=Side(style='thin')
|
|
279
|
+
)
|
|
280
|
+
ws['B2'].border = thin_border
|
|
281
|
+
```
|
|
282
|
+
|
|
283
|
+
**Merged cells** (use sparingly, never in data tables):
|
|
284
|
+
```python
|
|
285
|
+
ws.merge_cells('A1:D1')
|
|
286
|
+
ws['A1'] = 'Q1 2026 Performance'
|
|
287
|
+
```
|
|
288
|
+
|
|
289
|
+
**Row heights for merged cells** (critical — Excel auto-height fails with merges):
|
|
290
|
+
```python
|
|
291
|
+
# ALWAYS set explicit row heights when using large fonts or merged cells
|
|
292
|
+
ws.row_dimensions[1].height = 36 # Title row
|
|
293
|
+
ws.row_dimensions[3].height = 52 # Large KPI values (size 20-26 font)
|
|
294
|
+
ws.row_dimensions[4].height = 22 # KPI labels
|
|
295
|
+
```
|
|
296
|
+
|
|
297
|
+
## Editing Existing Files
|
|
298
|
+
|
|
299
|
+
```python
|
|
300
|
+
from openpyxl import load_workbook
|
|
301
|
+
|
|
302
|
+
# Load without calculating (preserves formulas)
|
|
303
|
+
wb = load_workbook('existing.xlsx')
|
|
304
|
+
ws = wb['Sales']
|
|
305
|
+
|
|
306
|
+
# Modify cells
|
|
307
|
+
ws['B2'].value = 50000
|
|
308
|
+
|
|
309
|
+
# Insert rows (shifts everything below)
|
|
310
|
+
ws.insert_rows(5, 3) # Insert 3 rows at row 5
|
|
311
|
+
|
|
312
|
+
# Delete rows
|
|
313
|
+
ws.delete_rows(5, 2) # Delete 2 rows starting at row 5
|
|
314
|
+
|
|
315
|
+
# Add new sheet
|
|
316
|
+
new_sheet = wb.create_sheet('Assumptions', 0) # Insert at position 0
|
|
317
|
+
|
|
318
|
+
# Save
|
|
319
|
+
wb.save('existing.xlsx')
|
|
320
|
+
```
|
|
321
|
+
|
|
322
|
+
### Critical Warning: data_only=True Destroys Formulas
|
|
323
|
+
|
|
324
|
+
```python
|
|
325
|
+
# ❌ WRONG: Loads calculated values, strips formulas
|
|
326
|
+
wb = load_workbook('file.xlsx', data_only=True)
|
|
327
|
+
|
|
328
|
+
# ✅ RIGHT: Preserves formulas for editing
|
|
329
|
+
wb = load_workbook('file.xlsx', data_only=False) # or just omit
|
|
330
|
+
```
|
|
331
|
+
|
|
332
|
+
## Formula Recalculation
|
|
333
|
+
|
|
334
|
+
After creating or editing any file with formulas, **always recalculate**:
|
|
10
335
|
|
|
11
336
|
```bash
|
|
12
|
-
|
|
13
|
-
|
|
14
|
-
|
|
337
|
+
python scripts/recalc.py output.xlsx [timeout_seconds]
|
|
338
|
+
```
|
|
339
|
+
|
|
340
|
+
### What It Does
|
|
341
|
+
- Sets up LibreOffice macro (RecalculateAndSave) on first run
|
|
342
|
+
- Runs soffice headless to open the file and recalculate all formulas
|
|
343
|
+
- Scans all cells for Excel error values: #VALUE!, #DIV/0!, #REF!, #NAME?, #NULL!, #NUM!, #N/A
|
|
344
|
+
- Returns JSON with status, error count, and locations
|
|
345
|
+
|
|
346
|
+
### Output Format
|
|
347
|
+
```json
|
|
348
|
+
{
|
|
349
|
+
"status": "success",
|
|
350
|
+
"total_errors": 0,
|
|
351
|
+
"total_formulas": 145,
|
|
352
|
+
"error_summary": {},
|
|
353
|
+
"error_details": [],
|
|
354
|
+
"file": "output.xlsx"
|
|
355
|
+
}
|
|
356
|
+
```
|
|
357
|
+
|
|
358
|
+
If errors exist:
|
|
359
|
+
```json
|
|
360
|
+
{
|
|
361
|
+
"status": "errors_found",
|
|
362
|
+
"total_errors": 2,
|
|
363
|
+
"error_summary": {
|
|
364
|
+
"#DIV/0!": 1,
|
|
365
|
+
"#REF!": 1
|
|
366
|
+
},
|
|
367
|
+
"error_details": [
|
|
368
|
+
{"cell": "C5", "error": "#DIV/0!", "sheet": "Sales"},
|
|
369
|
+
{"cell": "F12", "error": "#REF!", "sheet": "Assumptions"}
|
|
370
|
+
]
|
|
371
|
+
}
|
|
15
372
|
```
|
|
16
373
|
|
|
17
|
-
|
|
18
|
-
|
|
19
|
-
|
|
20
|
-
-
|
|
21
|
-
-
|
|
374
|
+
## Formula Verification Checklist
|
|
375
|
+
|
|
376
|
+
### Essential Tests
|
|
377
|
+
- **Cell references**: Verify all cell addresses are correct (B2, not B2:B2)
|
|
378
|
+
- **Column mapping**: Ensure formula references match data layout
|
|
379
|
+
- **Row offsets**: Check that formulas adjust correctly when rows are inserted/deleted
|
|
380
|
+
- **Sheet references**: Confirm cross-sheet formulas use correct syntax: `='Sheet Name'!A1`
|
|
381
|
+
|
|
382
|
+
### Common Pitfalls
|
|
383
|
+
- **NaN in formulas**: If source data contains empty cells, wrap with IF: `=IF(ISBLANK(B2),0,B2*C2)`
|
|
384
|
+
- **Far-right columns**: Column Z, AA, AB—verify Excel alphabet mapping
|
|
385
|
+
- **Division by zero**: Always guard: `=IF(B2=0,0,A2/B2)`
|
|
386
|
+
- **Wrong sheet references**: `=SUM(Data!B:B)` not `=SUM(Data.B:B)`
|
|
387
|
+
- **Cross-sheet format mismatch**: If Formula sheet references Data sheet, ensure same row structure
|
|
388
|
+
|
|
389
|
+
### Testing Strategy
|
|
390
|
+
1. **Start small**: Create 5-row test version, verify formulas calculate
|
|
391
|
+
2. **Verify dependencies**: If C = A + B, test that changing A updates C
|
|
392
|
+
3. **Test edge cases**: Empty cells, zeros, negative numbers, very large numbers
|
|
393
|
+
4. **Run recalc.py**: Always validate before delivering
|
|
394
|
+
|
|
395
|
+
## Best Practices
|
|
396
|
+
|
|
397
|
+
### Choosing Tools
|
|
398
|
+
|
|
399
|
+
**Use pandas when**:
|
|
400
|
+
- Loading CSV, Excel, database data
|
|
401
|
+
- Data cleaning: pivots, filters, groupby
|
|
402
|
+
- Analysis: aggregations, calculations on entire dataset
|
|
403
|
+
- Output: single DataFrame to spreadsheet
|
|
404
|
+
|
|
405
|
+
**Use openpyxl when**:
|
|
406
|
+
- Building custom layouts (merged cells, KPI cards)
|
|
407
|
+
- Complex formatting (fonts, colors, borders)
|
|
408
|
+
- Formulas and dynamic calculations
|
|
409
|
+
- Editing existing files
|
|
410
|
+
- Multi-sheet workbooks with cross-references
|
|
411
|
+
|
|
412
|
+
### openpyxl Tips
|
|
413
|
+
|
|
414
|
+
**1-based indexing**: Row and column numbers start at 1
|
|
415
|
+
```python
|
|
416
|
+
ws.cell(1, 1) # A1
|
|
417
|
+
ws.cell(5, 3) # C5
|
|
418
|
+
ws['A1'] # Also A1
|
|
419
|
+
```
|
|
420
|
+
|
|
421
|
+
**Preserve formulas when loading**:
|
|
422
|
+
```python
|
|
423
|
+
wb = load_workbook('file.xlsx') # Preserves formulas
|
|
424
|
+
# NOT: load_workbook('file.xlsx', data_only=True)
|
|
425
|
+
```
|
|
426
|
+
|
|
427
|
+
**Large files**: Use read_only or write_only mode
|
|
428
|
+
```python
|
|
429
|
+
wb = load_workbook('huge.xlsx', read_only=True) # For reading only
|
|
430
|
+
ws = wb.active
|
|
431
|
+
for row in ws.iter_rows():
|
|
432
|
+
# Process
|
|
433
|
+
```
|
|
434
|
+
|
|
435
|
+
**Iterating ranges**:
|
|
436
|
+
```python
|
|
437
|
+
# By row
|
|
438
|
+
for row in ws.iter_rows(min_row=2, max_row=100, values_only=False):
|
|
439
|
+
for cell in row:
|
|
440
|
+
print(cell.value)
|
|
441
|
+
|
|
442
|
+
# By column
|
|
443
|
+
for col in ws.iter_cols(min_col=1, max_col=5):
|
|
444
|
+
for cell in col:
|
|
445
|
+
print(cell.value)
|
|
446
|
+
```
|
|
447
|
+
|
|
448
|
+
### pandas Tips
|
|
449
|
+
|
|
450
|
+
**Specify data types** to avoid parsing errors:
|
|
451
|
+
```python
|
|
452
|
+
df = pd.read_excel('file.xlsx', dtype={'Year': str, 'Amount': float})
|
|
453
|
+
```
|
|
454
|
+
|
|
455
|
+
**Load specific columns** to reduce memory:
|
|
456
|
+
```python
|
|
457
|
+
df = pd.read_excel('file.xlsx', usecols=['Product', 'Sales', 'Date'])
|
|
458
|
+
```
|
|
459
|
+
|
|
460
|
+
**Parse dates**:
|
|
461
|
+
```python
|
|
462
|
+
df = pd.read_excel('file.xlsx', parse_dates=['Close Date'])
|
|
463
|
+
```
|
|
464
|
+
|
|
465
|
+
## Code Style Guidelines
|
|
466
|
+
|
|
467
|
+
Write minimal, concise Python. Avoid unnecessary comments—the code should be clear. But DO add Excel cell comments for complex formulas.
|
|
468
|
+
|
|
469
|
+
```python
|
|
470
|
+
# Good: Clear variable names, no fluff
|
|
471
|
+
df = pd.read_excel('sales.csv')
|
|
472
|
+
for idx, row in df.iterrows():
|
|
473
|
+
ws[f'A{idx+2}'] = row['Product']
|
|
474
|
+
ws[f'B{idx+2}'] = row['Amount']
|
|
475
|
+
|
|
476
|
+
# Bad: Excessive comments
|
|
477
|
+
# Load the Excel file from disk
|
|
478
|
+
df = pd.read_excel('sales.csv') # Read the CSV
|
|
479
|
+
# Loop through each row
|
|
480
|
+
for idx, row in df.iterrows(): # Iterate rows
|
|
481
|
+
ws[f'A{idx+2}'] = row['Product'] # Set product name
|
|
482
|
+
```
|
|
483
|
+
|
|
484
|
+
**For complex formulas, add Excel comments**:
|
|
485
|
+
```python
|
|
486
|
+
# In Python, calculate but explain in the sheet
|
|
487
|
+
ws['E5'].value = '=IF(D5=0,0,(C5-B5)/B5)' # YoY growth
|
|
488
|
+
ws['E5'].comment = Comment("YoY growth % = (Current - Prior) / Prior", author="Model")
|
|
489
|
+
```
|
|
490
|
+
|
|
491
|
+
## Financial Model Structure Example
|
|
492
|
+
|
|
493
|
+
```
|
|
494
|
+
Row 1: [Metric] [2024A] [2025E] [2026E] [2027E]
|
|
495
|
+
Row 2: Revenue ($K) [1000] [1500] [2100] [2800] <- Blue inputs
|
|
496
|
+
Row 3: Growth Rate — 50% 40% 33% <- Blue inputs, yellow bg
|
|
497
|
+
Row 4: COGS ($K) [400] [600] [820] [1090] <- Formulas (black)
|
|
498
|
+
Row 5: Gross Margin % 40.0% 40.0% 39.0% 39.0% <- Blue inputs, yellow bg
|
|
499
|
+
Row 6: OpEx ($K) [300] [350] [420] [500] <- Blue inputs
|
|
500
|
+
Row 7: EBITDA ($K) [300] [550] [860] [1210] <- Formulas (black)
|
|
501
|
+
|
|
502
|
+
Formulas:
|
|
503
|
+
C2 = B2 * (1 + C3) [2024 * (1 + growth)]
|
|
504
|
+
C4 = C2 * (1 - C5) [Revenue * (1 - COGS margin)]
|
|
505
|
+
C7 = C2 - C4 - C6 [Revenue - COGS - OpEx]
|
|
506
|
+
```
|
|
22
507
|
|
|
23
|
-
|
|
508
|
+
Cell colors:
|
|
509
|
+
- B2, C3, D3, E3: Blue text (inputs)
|
|
510
|
+
- C3, D3, E3: Yellow background (key assumptions)
|
|
511
|
+
- C2, C4, C7: Black text (formulas)
|