cassback 0.1.8 → 0.1.9
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/README.md +30 -3
- data/bin/cassback +2 -1
- data/lib/cassandra.rb +10 -1
- data/lib/cassback/version.rb +1 -1
- metadata +4 -13
- data/scripts/deploy.sh +0 -3
- data/scripts/manualbackups/ansible.cfg +0 -12
- data/scripts/manualbackups/inventory.txt +0 -18
- data/scripts/manualbackups/play_book.sh +0 -13
- data/scripts/manualbackups/playbooks/backups.yml +0 -6
- data/scripts/manualbackups/roles/planb/files/backup.sh +0 -27
- data/scripts/manualbackups/roles/planb/files/httpfs.sh +0 -27
- data/scripts/manualbackups/roles/planb/files/krb5.conf +0 -26
- data/scripts/manualbackups/roles/planb/tasks/main.yml +0 -34
- /data/{.rubocop.yml_disabled → .rubocop.yml} +0 -0
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 493265549ae69ac85882959fd0751368fcf1e940
|
4
|
+
data.tar.gz: 1f7e6550297e0451c35c62a52b8002883d343e81
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: b03e231ff64cf1b6ef5d72c6bf28aef338ecab1f90fac64d75c2a6e811d914edb9df4d83fb959bf2383d22ee0e5b118dc2a02c76e60626a12dfa8b2323b4eb36
|
7
|
+
data.tar.gz: cf5ba0d94bdda1a1cbf490e6d54a8413fa28e000e61b3d4fabe55528ba690390c8e55376a4a82f2d51ef12216052516834ed8e412c169533f6422adc45914583
|
data/README.md
CHANGED
@@ -2,6 +2,7 @@
|
|
2
2
|
|
3
3
|
Welcome to your Cassback!
|
4
4
|
This is a project that aims backup Cassandra SSTables and load them into HDFS for further usage.
|
5
|
+
It intended to be used as a command line tool - for now it can be triggered only by commands.
|
5
6
|
|
6
7
|
## Installation
|
7
8
|
|
@@ -40,6 +41,31 @@ A simple command that you can use for starting a backup is :
|
|
40
41
|
|
41
42
|
cassback -S -C path_to_some_config_file.yml
|
42
43
|
|
44
|
+
## Structure of the backup
|
45
|
+
|
46
|
+
A backup, that has been pushed from multiple Cassandra nodes to one HDFS location, has the following structure :
|
47
|
+
|
48
|
+
**a) Backup files**
|
49
|
+
|
50
|
+
<hadoop_root_folder>/<cluster_name>/<node_name>/<keyspace>/<table>/<snapshot_path>/<backup_file>
|
51
|
+
|
52
|
+
|
53
|
+
**b) Backup complete flags** (stored at cluster level in the metadata folder) :
|
54
|
+
|
55
|
+
<hadoop_root_folder>/<cass_snap_metadata>/<cluster_name>/BACKUP_COMPLETE_<date>
|
56
|
+
|
57
|
+
|
58
|
+
**c) Metadata files** (stored at node level in metadata folder) :
|
59
|
+
|
60
|
+
<hadoop_root_folder>/cass_snap_metadata/<cluster_name>/<node_name>/cass_snap_<date>
|
61
|
+
|
62
|
+
## Incremental or full backups ?
|
63
|
+
|
64
|
+
**Backups are done incrementally, but published as full backups** - the tool checks locally which files will have to be uploaded to HDFS and checks
|
65
|
+
if those files are already present in HDFS (because Cassandra files are immutable we don't have to risk to have two files
|
66
|
+
with same name but different content). However when the metadata file is published it points to all the files that
|
67
|
+
compose the backup so it basically publishes it as being a full backup.
|
68
|
+
|
43
69
|
## Configuration
|
44
70
|
|
45
71
|
The application has some default configuration defined.
|
@@ -50,6 +76,8 @@ You can overwrite the default configuration using two meanings :
|
|
50
76
|
2. Using individual configuration properties passed as parameters on the command line.
|
51
77
|
The command line parameters have precedence over the configuration file.
|
52
78
|
|
79
|
+
An example of configuration file is provided under conf/local.yml.
|
80
|
+
|
53
81
|
## Orchestration
|
54
82
|
|
55
83
|
The tool is designed to do snapshots at **node level** (and not at **cluster level**) - basically it has to be installed
|
@@ -94,13 +122,12 @@ The command for triggering a cleanup is :
|
|
94
122
|
cassback -A -C conf/path_to_some_config_file.yml
|
95
123
|
|
96
124
|
# Unit tests
|
125
|
+
|
97
126
|
Unit tests can be executed locally by running the following command :
|
98
127
|
|
99
128
|
rake test
|
100
129
|
|
101
130
|
## Contributing
|
102
131
|
|
103
|
-
|
104
|
-
|
105
|
-
Issue reports and merge requests are welcome on Criteo's GitLab at : https://gitlab.criteois.com/ruby-gems/cassback
|
132
|
+
Bug reports and pull requests are welcome on GitHub at : https://github.com/criteo/cassback
|
106
133
|
|
data/bin/cassback
CHANGED
@@ -49,6 +49,7 @@ command_line_config = {
|
|
49
49
|
options = {
|
50
50
|
'cassandra' => {
|
51
51
|
'config' => '/etc/cassandra/conf/cassandra.yaml',
|
52
|
+
'disk_threshold' => 75
|
52
53
|
},
|
53
54
|
'hadoop' => {
|
54
55
|
'hostname' => 'localhost',
|
@@ -165,7 +166,7 @@ begin
|
|
165
166
|
retry_interval: options['hadoop']['retryInterval'], read_timeout: options['hadoop']['readTimeout'])
|
166
167
|
|
167
168
|
# Create the Cassandra object
|
168
|
-
cassandra = Cassandra.new(options['cassandra']['config'], logger)
|
169
|
+
cassandra = Cassandra.new(options['cassandra']['config'], options['cassandra']['disk_threshold'], logger)
|
169
170
|
|
170
171
|
# Create the backup object
|
171
172
|
bck = BackupTool.new(cassandra, hadoop, logger)
|
data/lib/cassandra.rb
CHANGED
@@ -5,7 +5,7 @@ require 'yaml'
|
|
5
5
|
class Cassandra
|
6
6
|
attr_reader :data_path, :cluster_name, :node_name
|
7
7
|
|
8
|
-
def initialize(config_file, logger)
|
8
|
+
def initialize(config_file, disk_threshold, logger)
|
9
9
|
@logger = logger
|
10
10
|
|
11
11
|
read_config_file(config_file)
|
@@ -15,6 +15,8 @@ class Cassandra
|
|
15
15
|
@logger.info("Cassandra cluster name = #{@cluster_name}")
|
16
16
|
@logger.info("Cassandra node name = #{@node_name}")
|
17
17
|
@logger.info("Cassandra data path = #{@data_path}")
|
18
|
+
|
19
|
+
@disk_threshold = disk_threshold
|
18
20
|
end
|
19
21
|
|
20
22
|
def read_config_file(config_file)
|
@@ -45,6 +47,13 @@ class Cassandra
|
|
45
47
|
# First delete the snapshot if it exists.
|
46
48
|
nodetool_clearsnapshot(name)
|
47
49
|
|
50
|
+
# Check if we have enough disk space left
|
51
|
+
m = /\ ([0-9]+)%\ /.match(IO.popen("df #{@data_path}").readlines[1])
|
52
|
+
used = Integer(m[1])
|
53
|
+
if used > @disk_threshold
|
54
|
+
raise("Not enough disk space remaining for snapshot (#{used}% used > #{@disk_threshold}% required)")
|
55
|
+
end
|
56
|
+
|
48
57
|
# Then trigger it.
|
49
58
|
@logger.debug("Starting a new Cassandra snapshot #{name}")
|
50
59
|
begin
|
data/lib/cassback/version.rb
CHANGED
metadata
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: cassback
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.1.
|
4
|
+
version: 0.1.9
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Vincent Van Hollebeke
|
@@ -9,7 +9,7 @@ authors:
|
|
9
9
|
autorequire:
|
10
10
|
bindir: bin
|
11
11
|
cert_chain: []
|
12
|
-
date: 2016-
|
12
|
+
date: 2016-05-18 00:00:00.000000000 Z
|
13
13
|
dependencies:
|
14
14
|
- !ruby/object:Gem::Dependency
|
15
15
|
name: bundler
|
@@ -130,7 +130,7 @@ extensions: []
|
|
130
130
|
extra_rdoc_files: []
|
131
131
|
files:
|
132
132
|
- ".gitignore"
|
133
|
-
- ".rubocop.
|
133
|
+
- ".rubocop.yml"
|
134
134
|
- Gemfile
|
135
135
|
- LICENSE
|
136
136
|
- README.md
|
@@ -144,15 +144,6 @@ files:
|
|
144
144
|
- lib/cassandra.rb
|
145
145
|
- lib/cassback/version.rb
|
146
146
|
- lib/hadoop.rb
|
147
|
-
- scripts/deploy.sh
|
148
|
-
- scripts/manualbackups/ansible.cfg
|
149
|
-
- scripts/manualbackups/inventory.txt
|
150
|
-
- scripts/manualbackups/play_book.sh
|
151
|
-
- scripts/manualbackups/playbooks/backups.yml
|
152
|
-
- scripts/manualbackups/roles/planb/files/backup.sh
|
153
|
-
- scripts/manualbackups/roles/planb/files/httpfs.sh
|
154
|
-
- scripts/manualbackups/roles/planb/files/krb5.conf
|
155
|
-
- scripts/manualbackups/roles/planb/tasks/main.yml
|
156
147
|
- scripts/pre-push
|
157
148
|
- test/cassandra_stub.rb
|
158
149
|
- test/hadoop_stub.rb
|
@@ -177,7 +168,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
|
|
177
168
|
version: '0'
|
178
169
|
requirements: []
|
179
170
|
rubyforge_project:
|
180
|
-
rubygems_version: 2.
|
171
|
+
rubygems_version: 2.6.3
|
181
172
|
signing_key:
|
182
173
|
specification_version: 4
|
183
174
|
summary: Cassandra backup to HDFS.
|
data/scripts/deploy.sh
DELETED
@@ -1,18 +0,0 @@
|
|
1
|
-
[cstars02-par]
|
2
|
-
cstars02e01-par ansible_ssh_host="cstars02e01-par.storage.criteo.prod"
|
3
|
-
cstars02e02-par ansible_ssh_host="cstars02e02-par.storage.criteo.prod"
|
4
|
-
cstars02e03-par ansible_ssh_host="cstars02e03-par.storage.criteo.prod"
|
5
|
-
cstars02e04-par ansible_ssh_host="cstars02e04-par.storage.criteo.prod"
|
6
|
-
cstars02e05-par ansible_ssh_host="cstars02e05-par.storage.criteo.prod"
|
7
|
-
cstars02e06-par ansible_ssh_host="cstars02e06-par.storage.criteo.prod"
|
8
|
-
cstars02e07-par ansible_ssh_host="cstars02e07-par.storage.criteo.prod"
|
9
|
-
cstars02e08-par ansible_ssh_host="cstars02e08-par.storage.criteo.prod"
|
10
|
-
cstars02e09-par ansible_ssh_host="cstars02e09-par.storage.criteo.prod"
|
11
|
-
cstars02e10-par ansible_ssh_host="cstars02e10-par.storage.criteo.prod"
|
12
|
-
cstars02e11-par ansible_ssh_host="cstars02e11-par.storage.criteo.prod"
|
13
|
-
cstars02e12-par ansible_ssh_host="cstars02e12-par.storage.criteo.prod"
|
14
|
-
cstars02e13-par ansible_ssh_host="cstars02e13-par.storage.criteo.prod"
|
15
|
-
cstars02e14-par ansible_ssh_host="cstars02e14-par.storage.criteo.prod"
|
16
|
-
cstars02e15-par ansible_ssh_host="cstars02e15-par.storage.criteo.prod"
|
17
|
-
cstars02e16-par ansible_ssh_host="cstars02e16-par.storage.criteo.prod"
|
18
|
-
cstars02e17-par ansible_ssh_host="cstars02e17-par.storage.criteo.prod"
|
@@ -1,27 +0,0 @@
|
|
1
|
-
#!/bin/bash
|
2
|
-
|
3
|
-
kinit v.vanhollebeke@CRITEOIS.LAN -k -t ~/keytab
|
4
|
-
|
5
|
-
date=`date +%Y_%m_%d`
|
6
|
-
|
7
|
-
nodetool clearsnapshot
|
8
|
-
|
9
|
-
snapdir=$(nodetool snapshot| grep directory| awk '{print $NF}')
|
10
|
-
echo "Snapshot is $snapdir"
|
11
|
-
|
12
|
-
for dir in $(find /var/opt/cassandra/data -type d |grep snapshots/$snapdir); do
|
13
|
-
kok=$(klist -l|grep v.vanhollebeke@CRITEOIS.LAN|grep -v Expired|wc -l)
|
14
|
-
if [ $kok == 0 ]; then
|
15
|
-
echo "Must renew Kerberos ticket"
|
16
|
-
kinit v.vanhollebeke@CRITEOIS.LAN -k -t ~/keytab
|
17
|
-
else
|
18
|
-
echo "Kerberos ticket OK"
|
19
|
-
fi
|
20
|
-
keyspace=`echo $dir|awk -F\/ '{print $6}'`
|
21
|
-
table=`echo $dir|awk -F\/ '{print $7}'`
|
22
|
-
echo "Saving $keyspace $table"
|
23
|
-
./httpfs.sh /var/opt/cassandra/data/$keyspace/$table/snapshots/$snapdir tmp/cassandrabackups/prod/cstars02/$date/$HOSTNAME/$table
|
24
|
-
|
25
|
-
done
|
26
|
-
|
27
|
-
echo "FINISHED !!!!"
|
@@ -1,27 +0,0 @@
|
|
1
|
-
#!/bin/sh
|
2
|
-
|
3
|
-
BASE='http://0.httpfs.hpc.criteo.prod:14000/webhdfs/v1'
|
4
|
-
#BASE='http://httpfs.pa4.hpc.criteo.prod:14000'
|
5
|
-
|
6
|
-
IN=$1
|
7
|
-
OUT=$2
|
8
|
-
|
9
|
-
echo "Creating destination directory: $OUT"
|
10
|
-
curl --negotiate -u : "$BASE/$OUT?op=MKDIRS&permission=0777" -X PUT -s > /dev/null
|
11
|
-
|
12
|
-
for p in $(find $IN -type f)
|
13
|
-
do
|
14
|
-
f=$(basename $p)
|
15
|
-
echo "$IN/$f"
|
16
|
-
|
17
|
-
# Create file
|
18
|
-
dest=$(curl --negotiate -u : "$BASE/$OUT/$f?op=CREATE&overwrite=true&permission=0777" -i -X PUT -s | grep Location | tail -n1 | cut -d\ -f2 | tr -d '\r\n')
|
19
|
-
[ $? != 0 ] && echo "ERROR"
|
20
|
-
|
21
|
-
echo "DEST IS ${dest}"
|
22
|
-
|
23
|
-
# Upload file
|
24
|
-
curl --negotiate -u : "$dest" -i -X PUT -T "$IN/$f" -H 'Content-Type: application/octet-stream' > /dev/null
|
25
|
-
[ $? != 0 ] && echo "ERROR"
|
26
|
-
|
27
|
-
done
|
@@ -1,26 +0,0 @@
|
|
1
|
-
[libdefaults]
|
2
|
-
dns_lookup_realm = true
|
3
|
-
dns_lookup_kdc = true
|
4
|
-
ticket_lifetime = 24h
|
5
|
-
renew_lifetime = 7d
|
6
|
-
forwardable = true
|
7
|
-
default_realm = CRITEOIS.LAN
|
8
|
-
udp_preference_limit = 1
|
9
|
-
realm_try_domains = 1
|
10
|
-
permitted_enctypes = aes128-cts-hmac-sha1-96 des3-cbc-sha1 arcfour-hmac
|
11
|
-
default_tkt_enctypes = aes128-cts-hmac-sha1-96 des3-cbc-sha1 arcfour-hmac
|
12
|
-
[domain_realm]
|
13
|
-
.hpc.criteo.preprod = HPC.CRITEO.PREPROD
|
14
|
-
.hpc.criteo.prod = AMS.HPC.CRITEO.PROD
|
15
|
-
.pa4.hpc.criteo.prod = PA4.HPC.CRITEO.PROD
|
16
|
-
.as.hpc.criteo.prod = AS.HPC.CRITEO.PROD
|
17
|
-
.na.hpc.criteo.prod = NA.HPC.CRITEO.PROD
|
18
|
-
.cn.hpc.criteo.prod = CN.HPC.CRITEO.PROD
|
19
|
-
[capaths]
|
20
|
-
CRITEOIS.LAN = {
|
21
|
-
AMS.HPC.CRITEO.PROD = .
|
22
|
-
PA4.HPC.CRITEO.PROD = AMS.HPC.CRITEO.PROD
|
23
|
-
AS.HPC.CRITEO.PROD = AMS.HPC.CRITEO.PROD
|
24
|
-
NA.HPC.CRITEO.PROD = AMS.HPC.CRITEO.PROD
|
25
|
-
CN.HPC.CRITEO.PROD = AMS.HPC.CRITEO.PROD
|
26
|
-
}
|
@@ -1,34 +0,0 @@
|
|
1
|
-
---
|
2
|
-
|
3
|
-
- name: Copy krb5.conf into /etc
|
4
|
-
copy: src=krb5.conf dest=/etc/krb5.conf
|
5
|
-
sudo: yes
|
6
|
-
tags: keytab
|
7
|
-
|
8
|
-
- name: Copy my keytab
|
9
|
-
copy: src=keytab dest=~/keytab
|
10
|
-
tags: keytab
|
11
|
-
|
12
|
-
- name: Check if keytab works
|
13
|
-
command: kinit $USER@CRITEOIS.LAN -k -t ~/keytab
|
14
|
-
tags: keytab
|
15
|
-
|
16
|
-
- name: Copy httpfs.sh script
|
17
|
-
copy: src=httpfs.sh dest=~/httpfs.sh mode=750
|
18
|
-
tags: backup
|
19
|
-
|
20
|
-
- name: Copy backup.sh script
|
21
|
-
copy: src=backup.sh dest=~/backup.sh mode=750
|
22
|
-
tags: backup
|
23
|
-
|
24
|
-
- name: Start Backup
|
25
|
-
shell: ./backup.sh >logfile 2>&1 chdir=~
|
26
|
-
tags: backup
|
27
|
-
|
28
|
-
- name: Clear snapshots
|
29
|
-
shell: sudo nodetool clearsnapshot
|
30
|
-
tags: clear
|
31
|
-
|
32
|
-
- name: Verify if snapshots are REALLY deleted
|
33
|
-
shell: "[ $(find /var/opt/cassandra -type d |grep snap|wc -l) == 0 ]"
|
34
|
-
tags: verify
|
File without changes
|