@hiiretail/gcp-infra-cli 0.80.1 → 0.81.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
|
@@ -2,47 +2,88 @@
|
|
|
2
2
|
|
|
3
3
|
## General
|
|
4
4
|
|
|
5
|
-
<!-- Describe in short what the
|
|
5
|
+
<!-- Describe, in short, what the system does and the interactions with external and/or third party systems.
|
|
6
|
+
|
|
7
|
+
Example:
|
|
8
|
+
|
|
9
|
+
Transaction Repository is a centralized product which receives transactions from many parts of Hii Retail. The intention is to store transactions long term and to expose a search API to be able to search for transactions.
|
|
10
|
+
-->
|
|
6
11
|
|
|
7
12
|
## Architecture
|
|
8
13
|
|
|
9
|
-
<!-- Include C4 diagrams
|
|
14
|
+
<!-- Include the C4 diagrams and a link to the Software Guidebook. -->
|
|
10
15
|
|
|
11
16
|
## Business Continuity and Disaster Recovery Plan
|
|
12
17
|
|
|
13
|
-
<!--
|
|
18
|
+
<!-- Add a link to the Business Continuity and Disaster Recovery Plan. -->
|
|
14
19
|
|
|
15
20
|
## Services
|
|
16
21
|
|
|
17
|
-
<!--
|
|
22
|
+
<!-- List all internal services that are a part of the system with a short explanation of their purpose and a link to the logs of the service.
|
|
23
|
+
-->
|
|
18
24
|
|
|
19
25
|
## Dashboard
|
|
20
26
|
|
|
21
|
-
<!--
|
|
27
|
+
<!-- Add link(s) to the dashboard(s) that are setup. Write a short explanation what the dashboard displays and the purpose of it.
|
|
28
|
+
-->
|
|
22
29
|
|
|
23
30
|
## Service Level Objectives
|
|
24
31
|
|
|
25
|
-
<!--
|
|
32
|
+
<!-- Add a link or include the SLOs for each service that are defined. -->
|
|
26
33
|
|
|
27
34
|
## Alerts
|
|
28
35
|
|
|
29
|
-
<!--
|
|
36
|
+
<!-- List the alerts according to the format:
|
|
37
|
+
|
|
38
|
+
Alert name
|
|
39
|
+
* Description
|
|
40
|
+
* Notification channels
|
|
41
|
+
* Remediation steps
|
|
42
|
+
|
|
43
|
+
Example:
|
|
44
|
+
|
|
45
|
+
[P1] che.checkout-engine-isrg-nl-checkout-api - Service is offline
|
|
46
|
+
|
|
47
|
+
Description: Triggers when the uptime check fails
|
|
48
|
+
|
|
49
|
+
Notification channels:
|
|
50
|
+
* Slack, #monitoring-channel
|
|
51
|
+
* SMS
|
|
52
|
+
* Jira
|
|
53
|
+
|
|
54
|
+
Remediation steps:
|
|
55
|
+
1. Check if the memory usage (Link to where to check that) is higher than usual (Is there a threshold?)
|
|
56
|
+
2. Check if the number of requests (Link to where to check that) are higher than usual
|
|
57
|
+
3. Follow the Contact & Escalation Matrix
|
|
58
|
+
-->
|
|
30
59
|
|
|
31
60
|
## Health Checks
|
|
32
61
|
|
|
33
|
-
<!--
|
|
62
|
+
<!-- Add links to the configured health checks in GCP -->
|
|
63
|
+
|
|
64
|
+
## Accessibility (GCP)
|
|
65
|
+
|
|
66
|
+
<!-- What permissions are required to access the GCP resources that are used by the system?
|
|
67
|
+
|
|
68
|
+
Example:
|
|
69
|
+
* Cloud SQL - roles/cloudsql.editor (txengine-prod-1c85)
|
|
70
|
+
* Secret Manager - roles/secretmanager.admin (cardpayment-prod-d5b4)
|
|
71
|
+
|
|
72
|
+
Add examples on how to use the Just-In-Time Access system (https://jit-access.retailsvc.com/)
|
|
73
|
+
-->
|
|
34
74
|
|
|
35
75
|
## How do I..?
|
|
36
76
|
|
|
37
|
-
<!-- Good to know things
|
|
77
|
+
<!-- Good to know things, such as "How do I connect to the database?", "How do I find a specific item?" -->
|
|
38
78
|
|
|
39
79
|
## Known Issues
|
|
40
80
|
|
|
41
|
-
<!-- Are there any known issues?
|
|
81
|
+
<!-- Are there any known issues that requires manual intervention? Is there a workaround for the issue? There should be a short description of the issue and a link to the Jira where more details can be found.
|
|
82
|
+
-->
|
|
42
83
|
|
|
43
84
|
## Contact & Escalation Matrix
|
|
44
85
|
|
|
45
|
-
<!-- If the team is unable to resolve
|
|
86
|
+
<!-- If the team is unable to resolve or need to escalate an incident, who is the first to contact?
|
|
46
87
|
|
|
47
88
|
| # | Name | Role | E-Mail | Phone number |
|
|
48
89
|
| --- | --- | --- | --- | --- |
|
|
@@ -85,6 +85,23 @@ cloud_sql:
|
|
|
85
85
|
group_by_fields: ["resource.label.database_id"]
|
|
86
86
|
documentation:
|
|
87
87
|
content: <% if (runbookLink) { %>[Runbook](<%-runbookLink%>)<%} else { %> <% } %>
|
|
88
|
+
memory_over_90:
|
|
89
|
+
display_name: "[P2] <%-clan%> - CloudSQL | Memory utilization above 90%"
|
|
90
|
+
conditions:
|
|
91
|
+
- display_name: Cloud SQL Database - Memory utilization above 90% for 5 min
|
|
92
|
+
condition_threshold:
|
|
93
|
+
filter: |
|
|
94
|
+
resource.type = "cloudsql_database"
|
|
95
|
+
metric.type = "cloudsql.googleapis.com/database/memory/utilization"
|
|
96
|
+
resource.labels.project_id="<%-projectId%>"
|
|
97
|
+
threshold_value: 0.9
|
|
98
|
+
duration: 300s
|
|
99
|
+
aggregations:
|
|
100
|
+
- alignment_period: 60s
|
|
101
|
+
per_series_aligner: ALIGN_MAX
|
|
102
|
+
group_by_fields: ["resource.label.database_id"]
|
|
103
|
+
documentation:
|
|
104
|
+
content: <% if (runbookLink) { %>[Runbook](<%-runbookLink%>)<%} else { %> <% } %>
|
|
88
105
|
query_over_1s:
|
|
89
106
|
display_name: "[P4] <%-clan%> - CloudSQL | Query resolve time"
|
|
90
107
|
conditions:
|