cassback 0.1.8 → 0.1.9

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: d1b84e227b8ac885872dca30ef87343ef497d6b1
4
- data.tar.gz: b2f0923cc561e07b14c3c316ae86aa091bdd829f
3
+ metadata.gz: 493265549ae69ac85882959fd0751368fcf1e940
4
+ data.tar.gz: 1f7e6550297e0451c35c62a52b8002883d343e81
5
5
  SHA512:
6
- metadata.gz: f899bacdc79ae008b9e471cb2d5835dbf31140a68adde2166efed9461f0cbf9ae006f84a43436d68dbaa2781e601e902f012d2e8d6956cd8048bd14bf1722ad0
7
- data.tar.gz: 8719a219fb94a1ce7337b388c460add9b3e7ead073b3e15d1d8eec3780a65004ff403f994687c1e48a5c1cdd0c12c86eb351379999df63006aff224c20a23c94
6
+ metadata.gz: b03e231ff64cf1b6ef5d72c6bf28aef338ecab1f90fac64d75c2a6e811d914edb9df4d83fb959bf2383d22ee0e5b118dc2a02c76e60626a12dfa8b2323b4eb36
7
+ data.tar.gz: cf5ba0d94bdda1a1cbf490e6d54a8413fa28e000e61b3d4fabe55528ba690390c8e55376a4a82f2d51ef12216052516834ed8e412c169533f6422adc45914583
data/README.md CHANGED
@@ -2,6 +2,7 @@
2
2
 
3
3
  Welcome to your Cassback!
4
4
  This is a project that aims backup Cassandra SSTables and load them into HDFS for further usage.
5
+ It intended to be used as a command line tool - for now it can be triggered only by commands.
5
6
 
6
7
  ## Installation
7
8
 
@@ -40,6 +41,31 @@ A simple command that you can use for starting a backup is :
40
41
 
41
42
  cassback -S -C path_to_some_config_file.yml
42
43
 
44
+ ## Structure of the backup
45
+
46
+ A backup, that has been pushed from multiple Cassandra nodes to one HDFS location, has the following structure :
47
+
48
+ **a) Backup files**
49
+
50
+ <hadoop_root_folder>/<cluster_name>/<node_name>/<keyspace>/<table>/<snapshot_path>/<backup_file>
51
+
52
+
53
+ **b) Backup complete flags** (stored at cluster level in the metadata folder) :
54
+
55
+ <hadoop_root_folder>/<cass_snap_metadata>/<cluster_name>/BACKUP_COMPLETE_<date>
56
+
57
+
58
+ **c) Metadata files** (stored at node level in metadata folder) :
59
+
60
+ <hadoop_root_folder>/cass_snap_metadata/<cluster_name>/<node_name>/cass_snap_<date>
61
+
62
+ ## Incremental or full backups ?
63
+
64
+ **Backups are done incrementally, but published as full backups** - the tool checks locally which files will have to be uploaded to HDFS and checks
65
+ if those files are already present in HDFS (because Cassandra files are immutable we don't have to risk to have two files
66
+ with same name but different content). However when the metadata file is published it points to all the files that
67
+ compose the backup so it basically publishes it as being a full backup.
68
+
43
69
  ## Configuration
44
70
 
45
71
  The application has some default configuration defined.
@@ -50,6 +76,8 @@ You can overwrite the default configuration using two meanings :
50
76
  2. Using individual configuration properties passed as parameters on the command line.
51
77
  The command line parameters have precedence over the configuration file.
52
78
 
79
+ An example of configuration file is provided under conf/local.yml.
80
+
53
81
  ## Orchestration
54
82
 
55
83
  The tool is designed to do snapshots at **node level** (and not at **cluster level**) - basically it has to be installed
@@ -94,13 +122,12 @@ The command for triggering a cleanup is :
94
122
  cassback -A -C conf/path_to_some_config_file.yml
95
123
 
96
124
  # Unit tests
125
+
97
126
  Unit tests can be executed locally by running the following command :
98
127
 
99
128
  rake test
100
129
 
101
130
  ## Contributing
102
131
 
103
- For now this is an internal Criteo project, but were aiming for making it open source and publishing to GitHub.
104
-
105
- Issue reports and merge requests are welcome on Criteo's GitLab at : https://gitlab.criteois.com/ruby-gems/cassback
132
+ Bug reports and pull requests are welcome on GitHub at : https://github.com/criteo/cassback
106
133
 
data/bin/cassback CHANGED
@@ -49,6 +49,7 @@ command_line_config = {
49
49
  options = {
50
50
  'cassandra' => {
51
51
  'config' => '/etc/cassandra/conf/cassandra.yaml',
52
+ 'disk_threshold' => 75
52
53
  },
53
54
  'hadoop' => {
54
55
  'hostname' => 'localhost',
@@ -165,7 +166,7 @@ begin
165
166
  retry_interval: options['hadoop']['retryInterval'], read_timeout: options['hadoop']['readTimeout'])
166
167
 
167
168
  #  Create the Cassandra object
168
- cassandra = Cassandra.new(options['cassandra']['config'], logger)
169
+ cassandra = Cassandra.new(options['cassandra']['config'], options['cassandra']['disk_threshold'], logger)
169
170
 
170
171
  #  Create the backup object
171
172
  bck = BackupTool.new(cassandra, hadoop, logger)
data/lib/cassandra.rb CHANGED
@@ -5,7 +5,7 @@ require 'yaml'
5
5
  class Cassandra
6
6
  attr_reader :data_path, :cluster_name, :node_name
7
7
 
8
- def initialize(config_file, logger)
8
+ def initialize(config_file, disk_threshold, logger)
9
9
  @logger = logger
10
10
 
11
11
  read_config_file(config_file)
@@ -15,6 +15,8 @@ class Cassandra
15
15
  @logger.info("Cassandra cluster name = #{@cluster_name}")
16
16
  @logger.info("Cassandra node name = #{@node_name}")
17
17
  @logger.info("Cassandra data path = #{@data_path}")
18
+
19
+ @disk_threshold = disk_threshold
18
20
  end
19
21
 
20
22
  def read_config_file(config_file)
@@ -45,6 +47,13 @@ class Cassandra
45
47
  # First delete the snapshot if it exists.
46
48
  nodetool_clearsnapshot(name)
47
49
 
50
+ # Check if we have enough disk space left
51
+ m = /\ ([0-9]+)%\ /.match(IO.popen("df #{@data_path}").readlines[1])
52
+ used = Integer(m[1])
53
+ if used > @disk_threshold
54
+ raise("Not enough disk space remaining for snapshot (#{used}% used > #{@disk_threshold}% required)")
55
+ end
56
+
48
57
  # Then trigger it.
49
58
  @logger.debug("Starting a new Cassandra snapshot #{name}")
50
59
  begin
@@ -1,3 +1,3 @@
1
1
  module Cassback
2
- VERSION = '0.1.8'.freeze
2
+ VERSION = '0.1.9'.freeze
3
3
  end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: cassback
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.8
4
+ version: 0.1.9
5
5
  platform: ruby
6
6
  authors:
7
7
  - Vincent Van Hollebeke
@@ -9,7 +9,7 @@ authors:
9
9
  autorequire:
10
10
  bindir: bin
11
11
  cert_chain: []
12
- date: 2016-04-28 00:00:00.000000000 Z
12
+ date: 2016-05-18 00:00:00.000000000 Z
13
13
  dependencies:
14
14
  - !ruby/object:Gem::Dependency
15
15
  name: bundler
@@ -130,7 +130,7 @@ extensions: []
130
130
  extra_rdoc_files: []
131
131
  files:
132
132
  - ".gitignore"
133
- - ".rubocop.yml_disabled"
133
+ - ".rubocop.yml"
134
134
  - Gemfile
135
135
  - LICENSE
136
136
  - README.md
@@ -144,15 +144,6 @@ files:
144
144
  - lib/cassandra.rb
145
145
  - lib/cassback/version.rb
146
146
  - lib/hadoop.rb
147
- - scripts/deploy.sh
148
- - scripts/manualbackups/ansible.cfg
149
- - scripts/manualbackups/inventory.txt
150
- - scripts/manualbackups/play_book.sh
151
- - scripts/manualbackups/playbooks/backups.yml
152
- - scripts/manualbackups/roles/planb/files/backup.sh
153
- - scripts/manualbackups/roles/planb/files/httpfs.sh
154
- - scripts/manualbackups/roles/planb/files/krb5.conf
155
- - scripts/manualbackups/roles/planb/tasks/main.yml
156
147
  - scripts/pre-push
157
148
  - test/cassandra_stub.rb
158
149
  - test/hadoop_stub.rb
@@ -177,7 +168,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
177
168
  version: '0'
178
169
  requirements: []
179
170
  rubyforge_project:
180
- rubygems_version: 2.5.2
171
+ rubygems_version: 2.6.3
181
172
  signing_key:
182
173
  specification_version: 4
183
174
  summary: Cassandra backup to HDFS.
data/scripts/deploy.sh DELETED
@@ -1,3 +0,0 @@
1
- #!/bin/bash
2
-
3
- while [ 1 = 1 ]; do inotifywait .;scp -r . cstars01e01-par.storage.criteo.preprod:cassback2;scp -r . cstars01e02-par.storage.criteo.preprod:cassback2;done
@@ -1,12 +0,0 @@
1
- [defaults]
2
- host_key_checking=false
3
- record_host_keys=false
4
- remote_tmp=/tmp/.ansible/tmp
5
- forks=128
6
- roles_path=roles
7
- library=library
8
-
9
- [ssh_connection]
10
- control_path=%(directory)s/%%h-%%r
11
- pipelining=True
12
- scp_if_ssh=True
@@ -1,18 +0,0 @@
1
- [cstars02-par]
2
- cstars02e01-par ansible_ssh_host="cstars02e01-par.storage.criteo.prod"
3
- cstars02e02-par ansible_ssh_host="cstars02e02-par.storage.criteo.prod"
4
- cstars02e03-par ansible_ssh_host="cstars02e03-par.storage.criteo.prod"
5
- cstars02e04-par ansible_ssh_host="cstars02e04-par.storage.criteo.prod"
6
- cstars02e05-par ansible_ssh_host="cstars02e05-par.storage.criteo.prod"
7
- cstars02e06-par ansible_ssh_host="cstars02e06-par.storage.criteo.prod"
8
- cstars02e07-par ansible_ssh_host="cstars02e07-par.storage.criteo.prod"
9
- cstars02e08-par ansible_ssh_host="cstars02e08-par.storage.criteo.prod"
10
- cstars02e09-par ansible_ssh_host="cstars02e09-par.storage.criteo.prod"
11
- cstars02e10-par ansible_ssh_host="cstars02e10-par.storage.criteo.prod"
12
- cstars02e11-par ansible_ssh_host="cstars02e11-par.storage.criteo.prod"
13
- cstars02e12-par ansible_ssh_host="cstars02e12-par.storage.criteo.prod"
14
- cstars02e13-par ansible_ssh_host="cstars02e13-par.storage.criteo.prod"
15
- cstars02e14-par ansible_ssh_host="cstars02e14-par.storage.criteo.prod"
16
- cstars02e15-par ansible_ssh_host="cstars02e15-par.storage.criteo.prod"
17
- cstars02e16-par ansible_ssh_host="cstars02e16-par.storage.criteo.prod"
18
- cstars02e17-par ansible_ssh_host="cstars02e17-par.storage.criteo.prod"
@@ -1,13 +0,0 @@
1
- #!/bin/bash
2
-
3
- PLAYBOOK=$1
4
-
5
- if [ "$PLAYBOOK" = "" ]; then
6
- echo "Usage: $0 <playbook> [ansible options]"
7
- exit 65
8
- fi
9
-
10
- shift
11
- ansible-playbook --inventory-file=inventory.txt playbooks/$PLAYBOOK.yml --extra-vars $*
12
-
13
- exit $?
@@ -1,6 +0,0 @@
1
- ---
2
-
3
- - gather_facts: no
4
- hosts: cstars02-par
5
- roles:
6
- - role: planb
@@ -1,27 +0,0 @@
1
- #!/bin/bash
2
-
3
- kinit v.vanhollebeke@CRITEOIS.LAN -k -t ~/keytab
4
-
5
- date=`date +%Y_%m_%d`
6
-
7
- nodetool clearsnapshot
8
-
9
- snapdir=$(nodetool snapshot| grep directory| awk '{print $NF}')
10
- echo "Snapshot is $snapdir"
11
-
12
- for dir in $(find /var/opt/cassandra/data -type d |grep snapshots/$snapdir); do
13
- kok=$(klist -l|grep v.vanhollebeke@CRITEOIS.LAN|grep -v Expired|wc -l)
14
- if [ $kok == 0 ]; then
15
- echo "Must renew Kerberos ticket"
16
- kinit v.vanhollebeke@CRITEOIS.LAN -k -t ~/keytab
17
- else
18
- echo "Kerberos ticket OK"
19
- fi
20
- keyspace=`echo $dir|awk -F\/ '{print $6}'`
21
- table=`echo $dir|awk -F\/ '{print $7}'`
22
- echo "Saving $keyspace $table"
23
- ./httpfs.sh /var/opt/cassandra/data/$keyspace/$table/snapshots/$snapdir tmp/cassandrabackups/prod/cstars02/$date/$HOSTNAME/$table
24
-
25
- done
26
-
27
- echo "FINISHED !!!!"
@@ -1,27 +0,0 @@
1
- #!/bin/sh
2
-
3
- BASE='http://0.httpfs.hpc.criteo.prod:14000/webhdfs/v1'
4
- #BASE='http://httpfs.pa4.hpc.criteo.prod:14000'
5
-
6
- IN=$1
7
- OUT=$2
8
-
9
- echo "Creating destination directory: $OUT"
10
- curl --negotiate -u : "$BASE/$OUT?op=MKDIRS&permission=0777" -X PUT -s > /dev/null
11
-
12
- for p in $(find $IN -type f)
13
- do
14
- f=$(basename $p)
15
- echo "$IN/$f"
16
-
17
- # Create file
18
- dest=$(curl --negotiate -u : "$BASE/$OUT/$f?op=CREATE&overwrite=true&permission=0777" -i -X PUT -s | grep Location | tail -n1 | cut -d\ -f2 | tr -d '\r\n')
19
- [ $? != 0 ] && echo "ERROR"
20
-
21
- echo "DEST IS ${dest}"
22
-
23
- # Upload file
24
- curl --negotiate -u : "$dest" -i -X PUT -T "$IN/$f" -H 'Content-Type: application/octet-stream' > /dev/null
25
- [ $? != 0 ] && echo "ERROR"
26
-
27
- done
@@ -1,26 +0,0 @@
1
- [libdefaults]
2
- dns_lookup_realm = true
3
- dns_lookup_kdc = true
4
- ticket_lifetime = 24h
5
- renew_lifetime = 7d
6
- forwardable = true
7
- default_realm = CRITEOIS.LAN
8
- udp_preference_limit = 1
9
- realm_try_domains = 1
10
- permitted_enctypes = aes128-cts-hmac-sha1-96 des3-cbc-sha1 arcfour-hmac
11
- default_tkt_enctypes = aes128-cts-hmac-sha1-96 des3-cbc-sha1 arcfour-hmac
12
- [domain_realm]
13
- .hpc.criteo.preprod = HPC.CRITEO.PREPROD
14
- .hpc.criteo.prod = AMS.HPC.CRITEO.PROD
15
- .pa4.hpc.criteo.prod = PA4.HPC.CRITEO.PROD
16
- .as.hpc.criteo.prod = AS.HPC.CRITEO.PROD
17
- .na.hpc.criteo.prod = NA.HPC.CRITEO.PROD
18
- .cn.hpc.criteo.prod = CN.HPC.CRITEO.PROD
19
- [capaths]
20
- CRITEOIS.LAN = {
21
- AMS.HPC.CRITEO.PROD = .
22
- PA4.HPC.CRITEO.PROD = AMS.HPC.CRITEO.PROD
23
- AS.HPC.CRITEO.PROD = AMS.HPC.CRITEO.PROD
24
- NA.HPC.CRITEO.PROD = AMS.HPC.CRITEO.PROD
25
- CN.HPC.CRITEO.PROD = AMS.HPC.CRITEO.PROD
26
- }
@@ -1,34 +0,0 @@
1
- ---
2
-
3
- - name: Copy krb5.conf into /etc
4
- copy: src=krb5.conf dest=/etc/krb5.conf
5
- sudo: yes
6
- tags: keytab
7
-
8
- - name: Copy my keytab
9
- copy: src=keytab dest=~/keytab
10
- tags: keytab
11
-
12
- - name: Check if keytab works
13
- command: kinit $USER@CRITEOIS.LAN -k -t ~/keytab
14
- tags: keytab
15
-
16
- - name: Copy httpfs.sh script
17
- copy: src=httpfs.sh dest=~/httpfs.sh mode=750
18
- tags: backup
19
-
20
- - name: Copy backup.sh script
21
- copy: src=backup.sh dest=~/backup.sh mode=750
22
- tags: backup
23
-
24
- - name: Start Backup
25
- shell: ./backup.sh >logfile 2>&1 chdir=~
26
- tags: backup
27
-
28
- - name: Clear snapshots
29
- shell: sudo nodetool clearsnapshot
30
- tags: clear
31
-
32
- - name: Verify if snapshots are REALLY deleted
33
- shell: "[ $(find /var/opt/cassandra -type d |grep snap|wc -l) == 0 ]"
34
- tags: verify
File without changes