@yottagraph-app/data-model-skill 0.0.31 → 0.0.33

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@yottagraph-app/data-model-skill",
3
- "version": "0.0.31",
3
+ "version": "0.0.33",
4
4
  "description": "Data model skill documentation for AI agents - entity types, properties, and schemas from Lovelace fetch sources",
5
5
  "repository": {
6
6
  "type": "git",
@@ -0,0 +1,145 @@
1
+ # Data Dictionary: Companies House Accounts
2
+
3
+ ## Source Overview
4
+
5
+ UK Companies House Accounts Bulk Data Product — daily ZIP files containing iXBRL (.html) and XBRL (.xml) annual accounts filings submitted by UK-registered companies.
6
+
7
+ - Data source: `https://download.companieshouse.gov.uk/en_accountsdata.html`
8
+ - Publisher: Companies House (UK government registrar)
9
+ - Cadence: daily bulk ZIPs, each containing filings processed that day
10
+ - Coverage: all companies filing annual accounts with Companies House (primarily small and micro companies using iXBRL templates; large companies with bespoke filings may have sparse financial data)
11
+
12
+ Financial figures are extracted from iXBRL/XBRL taxonomies (UK GAAP, FRS 102, IFRS). Not all companies report all fields — micro accounts may only include balance sheet totals, while full accounts include profit/loss and employee data. All monetary values are in GBP (£).
13
+
14
+ | Pipeline | `Record.Source` |
15
+ |----------|----------------|
16
+ | Accounts filings | `companieshouse` |
17
+
18
+ ---
19
+
20
+ ## Entity Types
21
+
22
+ ### `organization`
23
+
24
+ A UK-registered company that has filed annual accounts with Companies House.
25
+
26
+ - Primary key: `companies_house_number` (8-character alphanumeric, e.g. `"00000006"`, `"SC123456"`)
27
+ - Entity resolver: named entity, mergeable. Strong ID = `companies_house_number`. Disambiguation via company name and registered address.
28
+
29
+ ### `companieshouse::accounts_filing`
30
+
31
+ An annual accounts filing document submitted to Companies House, identified by the combination of company number and balance sheet date.
32
+
33
+ - Primary key: `filing_id` (format: `"{company_number}-{YYYY-MM-DD}"`)
34
+ - Entity resolver: named entity, not mergeable. Strong ID = `filing_id`.
35
+
36
+ ---
37
+
38
+ ## Properties
39
+
40
+ ### Organization Properties
41
+
42
+ #### Identity
43
+
44
+ * `companies_house_number`
45
+ * Definition: Companies House registered company number uniquely identifying a UK company.
46
+ * Examples: `"00000006"`, `"12345678"`, `"SC123456"`
47
+ * Derivation: extracted from the `UKCompaniesHouseRegisteredNumber` or `EntityRegistrationNumber` XBRL concept in the filing, or parsed from the bulk data filename pattern `Prod{NNN}_{NNN}_{CCCCCCCC}_{YYYYMMDD}` where `CCCCCCCC` is the company number.
48
+
49
+ * `registered_address`
50
+ * Definition: registered office address of the company as reported in the accounts filing.
51
+ * Examples: `"10 Downing Street, London, SW1A 2AA"`
52
+ * Derivation: composed from XBRL address concepts `AddressLine1`, `AddressLine2`, `AddressLine3`, `PrincipalLocation-CityOrTown`, `CountyRegion`, and `PostalCodeZip` (from the `bus:` namespace). Empty components are omitted; the result is a comma-separated string.
53
+
54
+ ### Accounts Filing Properties
55
+
56
+ #### Filing Metadata
57
+
58
+ * `filing_id`
59
+ * Definition: unique identifier for the accounts filing, composed of the company number and balance sheet date.
60
+ * Examples: `"12345678-2024-12-31"`, `"SC123456-2025-03-31"`
61
+ * Derivation: constructed as `"{company_number}-{balance_sheet_date}"`. The balance sheet date is extracted from the bulk data filename.
62
+
63
+ * `balance_sheet_date`
64
+ * Definition: balance sheet date of the filed accounts, as YYYY-MM-DD.
65
+ * Examples: `"2025-03-31"`, `"2024-12-31"`
66
+ * Derivation: extracted from the bulk data filename pattern `Prod{NNN}_{NNN}_{CCCCCCCC}_{YYYYMMDD}`.
67
+
68
+ #### Balance Sheet
69
+
70
+ * `total_assets`
71
+ * Definition: total assets reported on the balance sheet.
72
+ * Examples: `1500000.0` (£1,500,000)
73
+ * Derivation: XBRL concept `TotalAssets`. Unit: GBP.
74
+
75
+ * `total_liabilities`
76
+ * Definition: total liabilities reported on the balance sheet.
77
+ * Examples: `800000.0` (£800,000)
78
+ * Derivation: XBRL concept `TotalLiabilities`. Unit: GBP.
79
+
80
+ * `net_assets`
81
+ * Definition: net assets or liabilities, equal to total assets minus total liabilities.
82
+ * Examples: `700000.0` (£700,000)
83
+ * Derivation: XBRL concepts `NetAssetsLiabilities`, `NetAssets`, or `TotalAssetsLessCurrentLiabilities` (first available). Unit: GBP.
84
+
85
+ * `fixed_assets`
86
+ * Definition: total fixed (non-current) assets including tangible, intangible, and investment assets.
87
+ * Examples: `350000.0` (£350,000)
88
+ * Derivation: XBRL concepts `FixedAssets`, `TotalFixedAssets`, `NonCurrentAssets`, or `TotalNonCurrentAssets` (first available). Unit: GBP.
89
+
90
+ * `current_assets`
91
+ * Definition: total current assets including cash, debtors, and stock.
92
+ * Examples: `250000.0` (£250,000)
93
+ * Derivation: XBRL concepts `CurrentAssets` or `TotalCurrentAssets` (first available). Unit: GBP.
94
+
95
+ * `shareholders_equity`
96
+ * Definition: total stockholders' or shareholders' equity.
97
+ * Examples: `500000.0` (£500,000)
98
+ * Derivation: XBRL concepts `ShareholderFunds`, `TotalShareholdersFunds`, `ShareholdersFunds`, or `Equity` (first available). Unit: GBP.
99
+
100
+ * `creditors_due_within_one_year`
101
+ * Definition: total creditors falling due within one year (current liabilities).
102
+ * Examples: `120000.0` (£120,000)
103
+ * Derivation: XBRL concepts `CreditorsDueWithinOneYear` or `CurrentLiabilities` (first available). Unit: GBP.
104
+
105
+ * `creditors_due_after_one_year`
106
+ * Definition: total creditors falling due after more than one year (non-current liabilities).
107
+ * Examples: `200000.0` (£200,000)
108
+ * Derivation: XBRL concepts `CreditorsDueAfterOneYear` or `NonCurrentLiabilities` (first available). Unit: GBP.
109
+
110
+ * `called_up_share_capital`
111
+ * Definition: called up share capital of the company.
112
+ * Examples: `100.0` (£100)
113
+ * Derivation: XBRL concepts `CalledUpShareCapital` or `CalledUpShareCapitalNotPaid` (first available). Unit: GBP.
114
+
115
+ #### Profit and Loss
116
+
117
+ * `revenue`
118
+ * Definition: total revenue or turnover for the reporting period.
119
+ * Examples: `2000000.0` (£2,000,000)
120
+ * Derivation: XBRL concepts `TurnoverRevenue`, `Turnover`, `TurnoverGrossOperatingRevenue`, or `Revenue` (first available). Unit: GBP.
121
+ * Note: many small company filings do not include a profit and loss account, so this field is frequently absent.
122
+
123
+ * `net_income`
124
+ * Definition: net income or loss for the reporting period.
125
+ * Examples: `150000.0` (£150,000), `-50000.0` (£-50,000 loss)
126
+ * Derivation: XBRL concepts `ProfitLoss`, `ProfitLossOnOrdinaryActivitiesBeforeTax`, `ProfitLossForPeriod`, or `ProfitLossForFinancialYear` (first available). Unit: GBP.
127
+ * Note: frequently absent for micro and abbreviated accounts.
128
+
129
+ #### Workforce
130
+
131
+ * `average_number_of_employees`
132
+ * Definition: average number of employees during the reporting period (headcount average, not FTE).
133
+ * Examples: `42.0`, `3.0`, `1250.0`
134
+ * Derivation: XBRL concepts `AverageNumberEmployeesDuringPeriod` or `EmployeesTotal` (first available).
135
+ * Note: only disclosed in filings that include employee information per FRS 102 / Companies Act 2006 requirements. Small companies are often exempt.
136
+
137
+ ---
138
+
139
+ ## Entity Relationships Summary
140
+
141
+ ```
142
+ organization ──[companieshouse::filed]──→ companieshouse::accounts_filing
143
+ ```
144
+
145
+ The `filed` relationship links a company to each of its annual accounts filings. One organization may have multiple `filed` edges (one per annual filing). The relationship is namespaced to `companieshouse` to distinguish it from the SEC `filed` relationship at the storage layer.
@@ -0,0 +1,190 @@
1
+ # Dataset schema for Companies House Accounts (UK).
2
+ #
3
+ # Architecture:
4
+ # Each iXBRL/XBRL accounts filing produces TWO records:
5
+ # 1. An organization record (the company) with identity properties
6
+ # (company number, name) and a "filed" relationship to the accounts.
7
+ # 2. An accounts_filing record (the document) carrying the balance sheet
8
+ # date and financial figures.
9
+ #
10
+ # All elements are passive — created by the atomizer from parsed iXBRL/XBRL
11
+ # filings, not by LLM extraction.
12
+ #
13
+ # Source identifier: "companieshouse-source"
14
+ name: "companieshouse"
15
+ description: "UK company financial accounts filed with Companies House — balance sheet data, profit/loss, and company identity extracted from filings"
16
+
17
+ extraction:
18
+ flavors: closed
19
+ properties: closed
20
+ relationships: closed
21
+ attributes: closed
22
+ events: closed
23
+
24
+ flavors:
25
+ - name: "organization"
26
+ description: "A particular business, institution, or organization such as a corporation, university, government agency, or non-profit"
27
+ display_name: "Organization"
28
+ mergeability: not_mergeable
29
+ strong_id_properties: ["companies_house_number"]
30
+ passive: true
31
+
32
+ - name: "accounts_filing"
33
+ namespace: "companieshouse"
34
+ description: "An annual accounts filing submitted to Companies House, identified by company number and balance sheet date"
35
+ display_name: "Accounts Filing"
36
+ mergeability: not_mergeable
37
+ strong_id_properties: ["filing_id"]
38
+ passive: true
39
+
40
+ # --- Identity properties (on organization) ---
41
+
42
+ properties:
43
+ - name: "companies_house_number"
44
+ type: string
45
+ description: "Companies House registered company number uniquely identifying a UK company"
46
+ display_name: "Companies House Number"
47
+ mergeability: not_mergeable
48
+ domain_flavors: ["organization"]
49
+ examples: ["00000006", "12345678", "SC123456"]
50
+ passive: true
51
+
52
+ - name: "registered_address"
53
+ type: string
54
+ description: "Registered office address of the company as reported in the accounts filing"
55
+ display_name: "Registered Address"
56
+ mergeability: not_mergeable
57
+ domain_flavors: ["organization"]
58
+ examples: ["10 Downing Street, London, SW1A 2AA"]
59
+ passive: true
60
+
61
+ # --- Filing metadata (on accounts_filing) ---
62
+
63
+ - name: "filing_id"
64
+ type: string
65
+ description: "Unique identifier for the accounts filing, composed of the company number and balance sheet date (e.g. '12345678-2024-12-31')"
66
+ display_name: "Filing ID"
67
+ mergeability: not_mergeable
68
+ domain_flavors: ["companieshouse::accounts_filing"]
69
+ examples: ["12345678-2024-12-31", "SC123456-2025-03-31"]
70
+ passive: true
71
+
72
+ - name: "balance_sheet_date"
73
+ type: string
74
+ description: "Balance sheet date of the filed accounts, as YYYY-MM-DD"
75
+ display_name: "Balance Sheet Date"
76
+ mergeability: not_mergeable
77
+ domain_flavors: ["companieshouse::accounts_filing"]
78
+ examples: ["2025-03-31", "2024-12-31"]
79
+ passive: true
80
+
81
+ # --- Financial properties (on accounts_filing) ---
82
+
83
+ - name: "total_assets"
84
+ type: float
85
+ description: "Total assets"
86
+ display_name: "Total Assets"
87
+ mergeability: not_mergeable
88
+ domain_flavors: ["companieshouse::accounts_filing"]
89
+ passive: true
90
+
91
+ - name: "total_liabilities"
92
+ type: float
93
+ description: "Total liabilities"
94
+ display_name: "Total Liabilities"
95
+ mergeability: not_mergeable
96
+ domain_flavors: ["companieshouse::accounts_filing"]
97
+ passive: true
98
+
99
+ - name: "shareholders_equity"
100
+ type: float
101
+ description: "Total stockholders' or shareholders' equity"
102
+ display_name: "Shareholders Equity"
103
+ mergeability: not_mergeable
104
+ domain_flavors: ["companieshouse::accounts_filing"]
105
+ passive: true
106
+
107
+ - name: "net_assets"
108
+ type: float
109
+ description: "Net assets or liabilities of the company, equal to total assets minus total liabilities"
110
+ display_name: "Net Assets"
111
+ mergeability: not_mergeable
112
+ domain_flavors: ["companieshouse::accounts_filing"]
113
+ passive: true
114
+
115
+ - name: "fixed_assets"
116
+ type: float
117
+ description: "Total fixed (non-current) assets including tangible, intangible, and investment assets"
118
+ display_name: "Fixed Assets"
119
+ mergeability: not_mergeable
120
+ domain_flavors: ["companieshouse::accounts_filing"]
121
+ passive: true
122
+
123
+ - name: "current_assets"
124
+ type: float
125
+ description: "Total current assets including cash, debtors, and stock"
126
+ display_name: "Current Assets"
127
+ mergeability: not_mergeable
128
+ domain_flavors: ["companieshouse::accounts_filing"]
129
+ passive: true
130
+
131
+ - name: "creditors_due_within_one_year"
132
+ type: float
133
+ description: "Total creditors falling due within one year (current liabilities)"
134
+ display_name: "Creditors Due Within One Year"
135
+ mergeability: not_mergeable
136
+ domain_flavors: ["companieshouse::accounts_filing"]
137
+ passive: true
138
+
139
+ - name: "creditors_due_after_one_year"
140
+ type: float
141
+ description: "Total creditors falling due after more than one year (non-current liabilities)"
142
+ display_name: "Creditors Due After One Year"
143
+ mergeability: not_mergeable
144
+ domain_flavors: ["companieshouse::accounts_filing"]
145
+ passive: true
146
+
147
+ - name: "revenue"
148
+ type: float
149
+ description: "Total revenue or turnover for the reporting period"
150
+ display_name: "Revenue"
151
+ mergeability: not_mergeable
152
+ domain_flavors: ["companieshouse::accounts_filing"]
153
+ passive: true
154
+
155
+ - name: "net_income"
156
+ type: float
157
+ description: "Net income or loss for the reporting period"
158
+ display_name: "Net Income"
159
+ mergeability: not_mergeable
160
+ domain_flavors: ["companieshouse::accounts_filing"]
161
+ passive: true
162
+
163
+ - name: "called_up_share_capital"
164
+ type: float
165
+ description: "Called up share capital of the company"
166
+ display_name: "Called Up Share Capital"
167
+ mergeability: not_mergeable
168
+ domain_flavors: ["companieshouse::accounts_filing"]
169
+ passive: true
170
+
171
+ - name: "average_number_of_employees"
172
+ type: float
173
+ description: "Average number of employees during the reporting period"
174
+ display_name: "Number of Employees"
175
+ mergeability: not_mergeable
176
+ domain_flavors: ["companieshouse::accounts_filing"]
177
+ passive: true
178
+
179
+ # --- Relationships ---
180
+
181
+ relationships:
182
+ - name: "filed"
183
+ description: "Link from a company or person to a regulatory filing document they filed, or from a sub-record to its parent filing"
184
+ display_name: "Filed"
185
+ mergeability: not_mergeable
186
+ domain_flavors: ["organization"]
187
+ target_flavors: ["companieshouse::accounts_filing"]
188
+ passive: true
189
+
190
+ attributes: []
@@ -0,0 +1,99 @@
1
+ # Data Dictionary: Companies House PSC
2
+
3
+ ## Source Overview
4
+
5
+ UK Companies House People with Significant Control (PSC) Snapshot — a daily bulk download containing the full register of persons and entities with significant control over UK-registered companies.
6
+
7
+ - Data source: `https://download.companieshouse.gov.uk/en_pscdata.html`
8
+ - Publisher: Companies House (UK government registrar)
9
+ - Cadence: daily full snapshot, updated before 10am GMT
10
+ - Format: newline-delimited JSON (JSONL) inside ZIP archives; available as a single large file or ~32 smaller part files
11
+ - Coverage: all PSC records held by Companies House, including active and ceased entries; approximately 11 million records across ~8 million companies
12
+ - Limitations: "super-secure" PSC entries (protected persons) are redacted and skipped. Address data is a service address, not necessarily a residential address. Date of birth is limited to month and year.
13
+
14
+ | Pipeline | `Record.Source` |
15
+ |----------|----------------|
16
+ | PSC snapshot | `companieshousepsc` |
17
+
18
+ ---
19
+
20
+ ## Entity Types
21
+
22
+ ### `organization`
23
+
24
+ A UK-registered company that has one or more persons with significant control, or a corporate entity that itself has significant control over another company.
25
+
26
+ - Primary key: `companies_house_number` (8-character alphanumeric, e.g. `"09145694"`, `"SC123456"`) for controlled companies; `registration_number` for corporate PSC entities.
27
+ - Entity resolver: named entity, mergeable. Strong ID = `companies_house_number` or `registration_number`. Controlled companies merge with the same company from the `companieshouse` accounts dataset.
28
+
29
+ ### `person`
30
+
31
+ An individual person with significant control over a UK company.
32
+
33
+ - Primary key: none (resolved by name). Persons are mergeable entities identified by name and contextual snippets (address).
34
+ - Entity resolver: named entity, mergeable. No strong ID — relies on name-based resolution.
35
+
36
+ ---
37
+
38
+ ## Properties
39
+
40
+ ### Organization Properties (controlled company)
41
+
42
+ #### Identity
43
+
44
+ * `companies_house_number`
45
+ * Definition: Companies House registered company number uniquely identifying a UK company.
46
+ * Examples: `"09145694"`, `"00001234"`, `"SC123456"`
47
+ * Derivation: `company_number` field from the top-level JSON record.
48
+
49
+ ### Organization Properties (corporate PSC)
50
+
51
+ #### Identity
52
+
53
+ * `registration_number`
54
+ * Definition: registration number of a corporate entity PSC as provided in their identification details.
55
+ * Examples: `"98765432"`, `"HRB 12345"`
56
+ * Derivation: `data.identification.registration_number` field from the JSON record. Only present for corporate-entity PSCs that provide identification.
57
+
58
+ ### Person Properties (individual PSC)
59
+
60
+ #### Identity and Background
61
+
62
+ * `nationality`
63
+ * Definition: nationality of an individual person with significant control.
64
+ * Examples: `"British"`, `"American"`, `"German"`
65
+ * Derivation: `data.nationality` field from the JSON record. Only present for individual PSCs.
66
+
67
+ * `country_of_residence`
68
+ * Definition: country of residence of an individual person with significant control.
69
+ * Examples: `"England"`, `"Scotland"`, `"United States"`
70
+ * Derivation: `data.country_of_residence` field from the JSON record. Only present for individual PSCs.
71
+
72
+ ---
73
+
74
+ ## Entity Relationships Summary
75
+
76
+ ```
77
+ person/organization (PSC) ──[controls]──→ organization (company)
78
+ ```
79
+
80
+ The `controls` relationship links a person or corporate entity directly to the company they have significant control over. One PSC may control multiple companies (multiple `controls` edges from the same entity). One company may be controlled by multiple PSCs (multiple `controls` edges from different entities pointing to the same company).
81
+
82
+ ---
83
+
84
+ ## Attributes (on the `controls` relationship)
85
+
86
+ * `natures_of_control`
87
+ * Definition: semicolon-separated list describing the nature of significant control, such as share ownership bands, voting rights, or the right to appoint/remove directors.
88
+ * Examples: `"ownership of shares 50 to 75 percent"`, `"ownership of shares 75 to 100 percent; right to appoint and remove directors"`
89
+ * Derivation: `data.natures_of_control` array from the JSON record. Each entry has hyphens replaced with spaces; multiple entries are joined with `"; "`.
90
+
91
+ * `notified_on`
92
+ * Definition: date on which Companies House was notified of this person with significant control, as YYYY-MM-DD.
93
+ * Examples: `"2016-04-06"`, `"2023-01-15"`
94
+ * Derivation: `data.notified_on` field from the JSON record. The PSC regime started on 2016-04-06, so most entries are notified on or after that date.
95
+
96
+ * `ceased_on`
97
+ * Definition: date on which this person ceased to have significant control, as YYYY-MM-DD. Absent for active PSCs.
98
+ * Examples: `"2024-06-01"`, `"2023-12-31"`
99
+ * Derivation: `data.ceased_on` field from the JSON record. Only present for ceased PSCs.