cassback 0.1.8 → 0.1.9
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/README.md +30 -3
- data/bin/cassback +2 -1
- data/lib/cassandra.rb +10 -1
- data/lib/cassback/version.rb +1 -1
- metadata +4 -13
- data/scripts/deploy.sh +0 -3
- data/scripts/manualbackups/ansible.cfg +0 -12
- data/scripts/manualbackups/inventory.txt +0 -18
- data/scripts/manualbackups/play_book.sh +0 -13
- data/scripts/manualbackups/playbooks/backups.yml +0 -6
- data/scripts/manualbackups/roles/planb/files/backup.sh +0 -27
- data/scripts/manualbackups/roles/planb/files/httpfs.sh +0 -27
- data/scripts/manualbackups/roles/planb/files/krb5.conf +0 -26
- data/scripts/manualbackups/roles/planb/tasks/main.yml +0 -34
- /data/{.rubocop.yml_disabled → .rubocop.yml} +0 -0
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 493265549ae69ac85882959fd0751368fcf1e940
|
4
|
+
data.tar.gz: 1f7e6550297e0451c35c62a52b8002883d343e81
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: b03e231ff64cf1b6ef5d72c6bf28aef338ecab1f90fac64d75c2a6e811d914edb9df4d83fb959bf2383d22ee0e5b118dc2a02c76e60626a12dfa8b2323b4eb36
|
7
|
+
data.tar.gz: cf5ba0d94bdda1a1cbf490e6d54a8413fa28e000e61b3d4fabe55528ba690390c8e55376a4a82f2d51ef12216052516834ed8e412c169533f6422adc45914583
|
data/README.md
CHANGED
@@ -2,6 +2,7 @@
|
|
2
2
|
|
3
3
|
Welcome to your Cassback!
|
4
4
|
This is a project that aims backup Cassandra SSTables and load them into HDFS for further usage.
|
5
|
+
It intended to be used as a command line tool - for now it can be triggered only by commands.
|
5
6
|
|
6
7
|
## Installation
|
7
8
|
|
@@ -40,6 +41,31 @@ A simple command that you can use for starting a backup is :
|
|
40
41
|
|
41
42
|
cassback -S -C path_to_some_config_file.yml
|
42
43
|
|
44
|
+
## Structure of the backup
|
45
|
+
|
46
|
+
A backup, that has been pushed from multiple Cassandra nodes to one HDFS location, has the following structure :
|
47
|
+
|
48
|
+
**a) Backup files**
|
49
|
+
|
50
|
+
<hadoop_root_folder>/<cluster_name>/<node_name>/<keyspace>/<table>/<snapshot_path>/<backup_file>
|
51
|
+
|
52
|
+
|
53
|
+
**b) Backup complete flags** (stored at cluster level in the metadata folder) :
|
54
|
+
|
55
|
+
<hadoop_root_folder>/<cass_snap_metadata>/<cluster_name>/BACKUP_COMPLETE_<date>
|
56
|
+
|
57
|
+
|
58
|
+
**c) Metadata files** (stored at node level in metadata folder) :
|
59
|
+
|
60
|
+
<hadoop_root_folder>/cass_snap_metadata/<cluster_name>/<node_name>/cass_snap_<date>
|
61
|
+
|
62
|
+
## Incremental or full backups ?
|
63
|
+
|
64
|
+
**Backups are done incrementally, but published as full backups** - the tool checks locally which files will have to be uploaded to HDFS and checks
|
65
|
+
if those files are already present in HDFS (because Cassandra files are immutable we don't have to risk to have two files
|
66
|
+
with same name but different content). However when the metadata file is published it points to all the files that
|
67
|
+
compose the backup so it basically publishes it as being a full backup.
|
68
|
+
|
43
69
|
## Configuration
|
44
70
|
|
45
71
|
The application has some default configuration defined.
|
@@ -50,6 +76,8 @@ You can overwrite the default configuration using two meanings :
|
|
50
76
|
2. Using individual configuration properties passed as parameters on the command line.
|
51
77
|
The command line parameters have precedence over the configuration file.
|
52
78
|
|
79
|
+
An example of configuration file is provided under conf/local.yml.
|
80
|
+
|
53
81
|
## Orchestration
|
54
82
|
|
55
83
|
The tool is designed to do snapshots at **node level** (and not at **cluster level**) - basically it has to be installed
|
@@ -94,13 +122,12 @@ The command for triggering a cleanup is :
|
|
94
122
|
cassback -A -C conf/path_to_some_config_file.yml
|
95
123
|
|
96
124
|
# Unit tests
|
125
|
+
|
97
126
|
Unit tests can be executed locally by running the following command :
|
98
127
|
|
99
128
|
rake test
|
100
129
|
|
101
130
|
## Contributing
|
102
131
|
|
103
|
-
|
104
|
-
|
105
|
-
Issue reports and merge requests are welcome on Criteo's GitLab at : https://gitlab.criteois.com/ruby-gems/cassback
|
132
|
+
Bug reports and pull requests are welcome on GitHub at : https://github.com/criteo/cassback
|
106
133
|
|
data/bin/cassback
CHANGED
@@ -49,6 +49,7 @@ command_line_config = {
|
|
49
49
|
options = {
|
50
50
|
'cassandra' => {
|
51
51
|
'config' => '/etc/cassandra/conf/cassandra.yaml',
|
52
|
+
'disk_threshold' => 75
|
52
53
|
},
|
53
54
|
'hadoop' => {
|
54
55
|
'hostname' => 'localhost',
|
@@ -165,7 +166,7 @@ begin
|
|
165
166
|
retry_interval: options['hadoop']['retryInterval'], read_timeout: options['hadoop']['readTimeout'])
|
166
167
|
|
167
168
|
# Create the Cassandra object
|
168
|
-
cassandra = Cassandra.new(options['cassandra']['config'], logger)
|
169
|
+
cassandra = Cassandra.new(options['cassandra']['config'], options['cassandra']['disk_threshold'], logger)
|
169
170
|
|
170
171
|
# Create the backup object
|
171
172
|
bck = BackupTool.new(cassandra, hadoop, logger)
|
data/lib/cassandra.rb
CHANGED
@@ -5,7 +5,7 @@ require 'yaml'
|
|
5
5
|
class Cassandra
|
6
6
|
attr_reader :data_path, :cluster_name, :node_name
|
7
7
|
|
8
|
-
def initialize(config_file, logger)
|
8
|
+
def initialize(config_file, disk_threshold, logger)
|
9
9
|
@logger = logger
|
10
10
|
|
11
11
|
read_config_file(config_file)
|
@@ -15,6 +15,8 @@ class Cassandra
|
|
15
15
|
@logger.info("Cassandra cluster name = #{@cluster_name}")
|
16
16
|
@logger.info("Cassandra node name = #{@node_name}")
|
17
17
|
@logger.info("Cassandra data path = #{@data_path}")
|
18
|
+
|
19
|
+
@disk_threshold = disk_threshold
|
18
20
|
end
|
19
21
|
|
20
22
|
def read_config_file(config_file)
|
@@ -45,6 +47,13 @@ class Cassandra
|
|
45
47
|
# First delete the snapshot if it exists.
|
46
48
|
nodetool_clearsnapshot(name)
|
47
49
|
|
50
|
+
# Check if we have enough disk space left
|
51
|
+
m = /\ ([0-9]+)%\ /.match(IO.popen("df #{@data_path}").readlines[1])
|
52
|
+
used = Integer(m[1])
|
53
|
+
if used > @disk_threshold
|
54
|
+
raise("Not enough disk space remaining for snapshot (#{used}% used > #{@disk_threshold}% required)")
|
55
|
+
end
|
56
|
+
|
48
57
|
# Then trigger it.
|
49
58
|
@logger.debug("Starting a new Cassandra snapshot #{name}")
|
50
59
|
begin
|
data/lib/cassback/version.rb
CHANGED
metadata
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: cassback
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.1.
|
4
|
+
version: 0.1.9
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Vincent Van Hollebeke
|
@@ -9,7 +9,7 @@ authors:
|
|
9
9
|
autorequire:
|
10
10
|
bindir: bin
|
11
11
|
cert_chain: []
|
12
|
-
date: 2016-
|
12
|
+
date: 2016-05-18 00:00:00.000000000 Z
|
13
13
|
dependencies:
|
14
14
|
- !ruby/object:Gem::Dependency
|
15
15
|
name: bundler
|
@@ -130,7 +130,7 @@ extensions: []
|
|
130
130
|
extra_rdoc_files: []
|
131
131
|
files:
|
132
132
|
- ".gitignore"
|
133
|
-
- ".rubocop.
|
133
|
+
- ".rubocop.yml"
|
134
134
|
- Gemfile
|
135
135
|
- LICENSE
|
136
136
|
- README.md
|
@@ -144,15 +144,6 @@ files:
|
|
144
144
|
- lib/cassandra.rb
|
145
145
|
- lib/cassback/version.rb
|
146
146
|
- lib/hadoop.rb
|
147
|
-
- scripts/deploy.sh
|
148
|
-
- scripts/manualbackups/ansible.cfg
|
149
|
-
- scripts/manualbackups/inventory.txt
|
150
|
-
- scripts/manualbackups/play_book.sh
|
151
|
-
- scripts/manualbackups/playbooks/backups.yml
|
152
|
-
- scripts/manualbackups/roles/planb/files/backup.sh
|
153
|
-
- scripts/manualbackups/roles/planb/files/httpfs.sh
|
154
|
-
- scripts/manualbackups/roles/planb/files/krb5.conf
|
155
|
-
- scripts/manualbackups/roles/planb/tasks/main.yml
|
156
147
|
- scripts/pre-push
|
157
148
|
- test/cassandra_stub.rb
|
158
149
|
- test/hadoop_stub.rb
|
@@ -177,7 +168,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
|
|
177
168
|
version: '0'
|
178
169
|
requirements: []
|
179
170
|
rubyforge_project:
|
180
|
-
rubygems_version: 2.
|
171
|
+
rubygems_version: 2.6.3
|
181
172
|
signing_key:
|
182
173
|
specification_version: 4
|
183
174
|
summary: Cassandra backup to HDFS.
|
data/scripts/deploy.sh
DELETED
@@ -1,18 +0,0 @@
|
|
1
|
-
[cstars02-par]
|
2
|
-
cstars02e01-par ansible_ssh_host="cstars02e01-par.storage.criteo.prod"
|
3
|
-
cstars02e02-par ansible_ssh_host="cstars02e02-par.storage.criteo.prod"
|
4
|
-
cstars02e03-par ansible_ssh_host="cstars02e03-par.storage.criteo.prod"
|
5
|
-
cstars02e04-par ansible_ssh_host="cstars02e04-par.storage.criteo.prod"
|
6
|
-
cstars02e05-par ansible_ssh_host="cstars02e05-par.storage.criteo.prod"
|
7
|
-
cstars02e06-par ansible_ssh_host="cstars02e06-par.storage.criteo.prod"
|
8
|
-
cstars02e07-par ansible_ssh_host="cstars02e07-par.storage.criteo.prod"
|
9
|
-
cstars02e08-par ansible_ssh_host="cstars02e08-par.storage.criteo.prod"
|
10
|
-
cstars02e09-par ansible_ssh_host="cstars02e09-par.storage.criteo.prod"
|
11
|
-
cstars02e10-par ansible_ssh_host="cstars02e10-par.storage.criteo.prod"
|
12
|
-
cstars02e11-par ansible_ssh_host="cstars02e11-par.storage.criteo.prod"
|
13
|
-
cstars02e12-par ansible_ssh_host="cstars02e12-par.storage.criteo.prod"
|
14
|
-
cstars02e13-par ansible_ssh_host="cstars02e13-par.storage.criteo.prod"
|
15
|
-
cstars02e14-par ansible_ssh_host="cstars02e14-par.storage.criteo.prod"
|
16
|
-
cstars02e15-par ansible_ssh_host="cstars02e15-par.storage.criteo.prod"
|
17
|
-
cstars02e16-par ansible_ssh_host="cstars02e16-par.storage.criteo.prod"
|
18
|
-
cstars02e17-par ansible_ssh_host="cstars02e17-par.storage.criteo.prod"
|
@@ -1,27 +0,0 @@
|
|
1
|
-
#!/bin/bash
|
2
|
-
|
3
|
-
kinit v.vanhollebeke@CRITEOIS.LAN -k -t ~/keytab
|
4
|
-
|
5
|
-
date=`date +%Y_%m_%d`
|
6
|
-
|
7
|
-
nodetool clearsnapshot
|
8
|
-
|
9
|
-
snapdir=$(nodetool snapshot| grep directory| awk '{print $NF}')
|
10
|
-
echo "Snapshot is $snapdir"
|
11
|
-
|
12
|
-
for dir in $(find /var/opt/cassandra/data -type d |grep snapshots/$snapdir); do
|
13
|
-
kok=$(klist -l|grep v.vanhollebeke@CRITEOIS.LAN|grep -v Expired|wc -l)
|
14
|
-
if [ $kok == 0 ]; then
|
15
|
-
echo "Must renew Kerberos ticket"
|
16
|
-
kinit v.vanhollebeke@CRITEOIS.LAN -k -t ~/keytab
|
17
|
-
else
|
18
|
-
echo "Kerberos ticket OK"
|
19
|
-
fi
|
20
|
-
keyspace=`echo $dir|awk -F\/ '{print $6}'`
|
21
|
-
table=`echo $dir|awk -F\/ '{print $7}'`
|
22
|
-
echo "Saving $keyspace $table"
|
23
|
-
./httpfs.sh /var/opt/cassandra/data/$keyspace/$table/snapshots/$snapdir tmp/cassandrabackups/prod/cstars02/$date/$HOSTNAME/$table
|
24
|
-
|
25
|
-
done
|
26
|
-
|
27
|
-
echo "FINISHED !!!!"
|
@@ -1,27 +0,0 @@
|
|
1
|
-
#!/bin/sh
|
2
|
-
|
3
|
-
BASE='http://0.httpfs.hpc.criteo.prod:14000/webhdfs/v1'
|
4
|
-
#BASE='http://httpfs.pa4.hpc.criteo.prod:14000'
|
5
|
-
|
6
|
-
IN=$1
|
7
|
-
OUT=$2
|
8
|
-
|
9
|
-
echo "Creating destination directory: $OUT"
|
10
|
-
curl --negotiate -u : "$BASE/$OUT?op=MKDIRS&permission=0777" -X PUT -s > /dev/null
|
11
|
-
|
12
|
-
for p in $(find $IN -type f)
|
13
|
-
do
|
14
|
-
f=$(basename $p)
|
15
|
-
echo "$IN/$f"
|
16
|
-
|
17
|
-
# Create file
|
18
|
-
dest=$(curl --negotiate -u : "$BASE/$OUT/$f?op=CREATE&overwrite=true&permission=0777" -i -X PUT -s | grep Location | tail -n1 | cut -d\ -f2 | tr -d '\r\n')
|
19
|
-
[ $? != 0 ] && echo "ERROR"
|
20
|
-
|
21
|
-
echo "DEST IS ${dest}"
|
22
|
-
|
23
|
-
# Upload file
|
24
|
-
curl --negotiate -u : "$dest" -i -X PUT -T "$IN/$f" -H 'Content-Type: application/octet-stream' > /dev/null
|
25
|
-
[ $? != 0 ] && echo "ERROR"
|
26
|
-
|
27
|
-
done
|
@@ -1,26 +0,0 @@
|
|
1
|
-
[libdefaults]
|
2
|
-
dns_lookup_realm = true
|
3
|
-
dns_lookup_kdc = true
|
4
|
-
ticket_lifetime = 24h
|
5
|
-
renew_lifetime = 7d
|
6
|
-
forwardable = true
|
7
|
-
default_realm = CRITEOIS.LAN
|
8
|
-
udp_preference_limit = 1
|
9
|
-
realm_try_domains = 1
|
10
|
-
permitted_enctypes = aes128-cts-hmac-sha1-96 des3-cbc-sha1 arcfour-hmac
|
11
|
-
default_tkt_enctypes = aes128-cts-hmac-sha1-96 des3-cbc-sha1 arcfour-hmac
|
12
|
-
[domain_realm]
|
13
|
-
.hpc.criteo.preprod = HPC.CRITEO.PREPROD
|
14
|
-
.hpc.criteo.prod = AMS.HPC.CRITEO.PROD
|
15
|
-
.pa4.hpc.criteo.prod = PA4.HPC.CRITEO.PROD
|
16
|
-
.as.hpc.criteo.prod = AS.HPC.CRITEO.PROD
|
17
|
-
.na.hpc.criteo.prod = NA.HPC.CRITEO.PROD
|
18
|
-
.cn.hpc.criteo.prod = CN.HPC.CRITEO.PROD
|
19
|
-
[capaths]
|
20
|
-
CRITEOIS.LAN = {
|
21
|
-
AMS.HPC.CRITEO.PROD = .
|
22
|
-
PA4.HPC.CRITEO.PROD = AMS.HPC.CRITEO.PROD
|
23
|
-
AS.HPC.CRITEO.PROD = AMS.HPC.CRITEO.PROD
|
24
|
-
NA.HPC.CRITEO.PROD = AMS.HPC.CRITEO.PROD
|
25
|
-
CN.HPC.CRITEO.PROD = AMS.HPC.CRITEO.PROD
|
26
|
-
}
|
@@ -1,34 +0,0 @@
|
|
1
|
-
---
|
2
|
-
|
3
|
-
- name: Copy krb5.conf into /etc
|
4
|
-
copy: src=krb5.conf dest=/etc/krb5.conf
|
5
|
-
sudo: yes
|
6
|
-
tags: keytab
|
7
|
-
|
8
|
-
- name: Copy my keytab
|
9
|
-
copy: src=keytab dest=~/keytab
|
10
|
-
tags: keytab
|
11
|
-
|
12
|
-
- name: Check if keytab works
|
13
|
-
command: kinit $USER@CRITEOIS.LAN -k -t ~/keytab
|
14
|
-
tags: keytab
|
15
|
-
|
16
|
-
- name: Copy httpfs.sh script
|
17
|
-
copy: src=httpfs.sh dest=~/httpfs.sh mode=750
|
18
|
-
tags: backup
|
19
|
-
|
20
|
-
- name: Copy backup.sh script
|
21
|
-
copy: src=backup.sh dest=~/backup.sh mode=750
|
22
|
-
tags: backup
|
23
|
-
|
24
|
-
- name: Start Backup
|
25
|
-
shell: ./backup.sh >logfile 2>&1 chdir=~
|
26
|
-
tags: backup
|
27
|
-
|
28
|
-
- name: Clear snapshots
|
29
|
-
shell: sudo nodetool clearsnapshot
|
30
|
-
tags: clear
|
31
|
-
|
32
|
-
- name: Verify if snapshots are REALLY deleted
|
33
|
-
shell: "[ $(find /var/opt/cassandra -type d |grep snap|wc -l) == 0 ]"
|
34
|
-
tags: verify
|
File without changes
|