cassback 0.1.8 → 0.1.9

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: d1b84e227b8ac885872dca30ef87343ef497d6b1
4
- data.tar.gz: b2f0923cc561e07b14c3c316ae86aa091bdd829f
3
+ metadata.gz: 493265549ae69ac85882959fd0751368fcf1e940
4
+ data.tar.gz: 1f7e6550297e0451c35c62a52b8002883d343e81
5
5
  SHA512:
6
- metadata.gz: f899bacdc79ae008b9e471cb2d5835dbf31140a68adde2166efed9461f0cbf9ae006f84a43436d68dbaa2781e601e902f012d2e8d6956cd8048bd14bf1722ad0
7
- data.tar.gz: 8719a219fb94a1ce7337b388c460add9b3e7ead073b3e15d1d8eec3780a65004ff403f994687c1e48a5c1cdd0c12c86eb351379999df63006aff224c20a23c94
6
+ metadata.gz: b03e231ff64cf1b6ef5d72c6bf28aef338ecab1f90fac64d75c2a6e811d914edb9df4d83fb959bf2383d22ee0e5b118dc2a02c76e60626a12dfa8b2323b4eb36
7
+ data.tar.gz: cf5ba0d94bdda1a1cbf490e6d54a8413fa28e000e61b3d4fabe55528ba690390c8e55376a4a82f2d51ef12216052516834ed8e412c169533f6422adc45914583
data/README.md CHANGED
@@ -2,6 +2,7 @@
2
2
 
3
3
  Welcome to your Cassback!
4
4
  This is a project that aims backup Cassandra SSTables and load them into HDFS for further usage.
5
+ It intended to be used as a command line tool - for now it can be triggered only by commands.
5
6
 
6
7
  ## Installation
7
8
 
@@ -40,6 +41,31 @@ A simple command that you can use for starting a backup is :
40
41
 
41
42
  cassback -S -C path_to_some_config_file.yml
42
43
 
44
+ ## Structure of the backup
45
+
46
+ A backup, that has been pushed from multiple Cassandra nodes to one HDFS location, has the following structure :
47
+
48
+ **a) Backup files**
49
+
50
+ <hadoop_root_folder>/<cluster_name>/<node_name>/<keyspace>/<table>/<snapshot_path>/<backup_file>
51
+
52
+
53
+ **b) Backup complete flags** (stored at cluster level in the metadata folder) :
54
+
55
+ <hadoop_root_folder>/<cass_snap_metadata>/<cluster_name>/BACKUP_COMPLETE_<date>
56
+
57
+
58
+ **c) Metadata files** (stored at node level in metadata folder) :
59
+
60
+ <hadoop_root_folder>/cass_snap_metadata/<cluster_name>/<node_name>/cass_snap_<date>
61
+
62
+ ## Incremental or full backups ?
63
+
64
+ **Backups are done incrementally, but published as full backups** - the tool checks locally which files will have to be uploaded to HDFS and checks
65
+ if those files are already present in HDFS (because Cassandra files are immutable we don't have to risk to have two files
66
+ with same name but different content). However when the metadata file is published it points to all the files that
67
+ compose the backup so it basically publishes it as being a full backup.
68
+
43
69
  ## Configuration
44
70
 
45
71
  The application has some default configuration defined.
@@ -50,6 +76,8 @@ You can overwrite the default configuration using two meanings :
50
76
  2. Using individual configuration properties passed as parameters on the command line.
51
77
  The command line parameters have precedence over the configuration file.
52
78
 
79
+ An example of configuration file is provided under conf/local.yml.
80
+
53
81
  ## Orchestration
54
82
 
55
83
  The tool is designed to do snapshots at **node level** (and not at **cluster level**) - basically it has to be installed
@@ -94,13 +122,12 @@ The command for triggering a cleanup is :
94
122
  cassback -A -C conf/path_to_some_config_file.yml
95
123
 
96
124
  # Unit tests
125
+
97
126
  Unit tests can be executed locally by running the following command :
98
127
 
99
128
  rake test
100
129
 
101
130
  ## Contributing
102
131
 
103
- For now this is an internal Criteo project, but were aiming for making it open source and publishing to GitHub.
104
-
105
- Issue reports and merge requests are welcome on Criteo's GitLab at : https://gitlab.criteois.com/ruby-gems/cassback
132
+ Bug reports and pull requests are welcome on GitHub at : https://github.com/criteo/cassback
106
133
 
data/bin/cassback CHANGED
@@ -49,6 +49,7 @@ command_line_config = {
49
49
  options = {
50
50
  'cassandra' => {
51
51
  'config' => '/etc/cassandra/conf/cassandra.yaml',
52
+ 'disk_threshold' => 75
52
53
  },
53
54
  'hadoop' => {
54
55
  'hostname' => 'localhost',
@@ -165,7 +166,7 @@ begin
165
166
  retry_interval: options['hadoop']['retryInterval'], read_timeout: options['hadoop']['readTimeout'])
166
167
 
167
168
  #  Create the Cassandra object
168
- cassandra = Cassandra.new(options['cassandra']['config'], logger)
169
+ cassandra = Cassandra.new(options['cassandra']['config'], options['cassandra']['disk_threshold'], logger)
169
170
 
170
171
  #  Create the backup object
171
172
  bck = BackupTool.new(cassandra, hadoop, logger)
data/lib/cassandra.rb CHANGED
@@ -5,7 +5,7 @@ require 'yaml'
5
5
  class Cassandra
6
6
  attr_reader :data_path, :cluster_name, :node_name
7
7
 
8
- def initialize(config_file, logger)
8
+ def initialize(config_file, disk_threshold, logger)
9
9
  @logger = logger
10
10
 
11
11
  read_config_file(config_file)
@@ -15,6 +15,8 @@ class Cassandra
15
15
  @logger.info("Cassandra cluster name = #{@cluster_name}")
16
16
  @logger.info("Cassandra node name = #{@node_name}")
17
17
  @logger.info("Cassandra data path = #{@data_path}")
18
+
19
+ @disk_threshold = disk_threshold
18
20
  end
19
21
 
20
22
  def read_config_file(config_file)
@@ -45,6 +47,13 @@ class Cassandra
45
47
  # First delete the snapshot if it exists.
46
48
  nodetool_clearsnapshot(name)
47
49
 
50
+ # Check if we have enough disk space left
51
+ m = /\ ([0-9]+)%\ /.match(IO.popen("df #{@data_path}").readlines[1])
52
+ used = Integer(m[1])
53
+ if used > @disk_threshold
54
+ raise("Not enough disk space remaining for snapshot (#{used}% used > #{@disk_threshold}% required)")
55
+ end
56
+
48
57
  # Then trigger it.
49
58
  @logger.debug("Starting a new Cassandra snapshot #{name}")
50
59
  begin
@@ -1,3 +1,3 @@
1
1
  module Cassback
2
- VERSION = '0.1.8'.freeze
2
+ VERSION = '0.1.9'.freeze
3
3
  end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: cassback
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.8
4
+ version: 0.1.9
5
5
  platform: ruby
6
6
  authors:
7
7
  - Vincent Van Hollebeke
@@ -9,7 +9,7 @@ authors:
9
9
  autorequire:
10
10
  bindir: bin
11
11
  cert_chain: []
12
- date: 2016-04-28 00:00:00.000000000 Z
12
+ date: 2016-05-18 00:00:00.000000000 Z
13
13
  dependencies:
14
14
  - !ruby/object:Gem::Dependency
15
15
  name: bundler
@@ -130,7 +130,7 @@ extensions: []
130
130
  extra_rdoc_files: []
131
131
  files:
132
132
  - ".gitignore"
133
- - ".rubocop.yml_disabled"
133
+ - ".rubocop.yml"
134
134
  - Gemfile
135
135
  - LICENSE
136
136
  - README.md
@@ -144,15 +144,6 @@ files:
144
144
  - lib/cassandra.rb
145
145
  - lib/cassback/version.rb
146
146
  - lib/hadoop.rb
147
- - scripts/deploy.sh
148
- - scripts/manualbackups/ansible.cfg
149
- - scripts/manualbackups/inventory.txt
150
- - scripts/manualbackups/play_book.sh
151
- - scripts/manualbackups/playbooks/backups.yml
152
- - scripts/manualbackups/roles/planb/files/backup.sh
153
- - scripts/manualbackups/roles/planb/files/httpfs.sh
154
- - scripts/manualbackups/roles/planb/files/krb5.conf
155
- - scripts/manualbackups/roles/planb/tasks/main.yml
156
147
  - scripts/pre-push
157
148
  - test/cassandra_stub.rb
158
149
  - test/hadoop_stub.rb
@@ -177,7 +168,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
177
168
  version: '0'
178
169
  requirements: []
179
170
  rubyforge_project:
180
- rubygems_version: 2.5.2
171
+ rubygems_version: 2.6.3
181
172
  signing_key:
182
173
  specification_version: 4
183
174
  summary: Cassandra backup to HDFS.
data/scripts/deploy.sh DELETED
@@ -1,3 +0,0 @@
1
- #!/bin/bash
2
-
3
- while [ 1 = 1 ]; do inotifywait .;scp -r . cstars01e01-par.storage.criteo.preprod:cassback2;scp -r . cstars01e02-par.storage.criteo.preprod:cassback2;done
@@ -1,12 +0,0 @@
1
- [defaults]
2
- host_key_checking=false
3
- record_host_keys=false
4
- remote_tmp=/tmp/.ansible/tmp
5
- forks=128
6
- roles_path=roles
7
- library=library
8
-
9
- [ssh_connection]
10
- control_path=%(directory)s/%%h-%%r
11
- pipelining=True
12
- scp_if_ssh=True
@@ -1,18 +0,0 @@
1
- [cstars02-par]
2
- cstars02e01-par ansible_ssh_host="cstars02e01-par.storage.criteo.prod"
3
- cstars02e02-par ansible_ssh_host="cstars02e02-par.storage.criteo.prod"
4
- cstars02e03-par ansible_ssh_host="cstars02e03-par.storage.criteo.prod"
5
- cstars02e04-par ansible_ssh_host="cstars02e04-par.storage.criteo.prod"
6
- cstars02e05-par ansible_ssh_host="cstars02e05-par.storage.criteo.prod"
7
- cstars02e06-par ansible_ssh_host="cstars02e06-par.storage.criteo.prod"
8
- cstars02e07-par ansible_ssh_host="cstars02e07-par.storage.criteo.prod"
9
- cstars02e08-par ansible_ssh_host="cstars02e08-par.storage.criteo.prod"
10
- cstars02e09-par ansible_ssh_host="cstars02e09-par.storage.criteo.prod"
11
- cstars02e10-par ansible_ssh_host="cstars02e10-par.storage.criteo.prod"
12
- cstars02e11-par ansible_ssh_host="cstars02e11-par.storage.criteo.prod"
13
- cstars02e12-par ansible_ssh_host="cstars02e12-par.storage.criteo.prod"
14
- cstars02e13-par ansible_ssh_host="cstars02e13-par.storage.criteo.prod"
15
- cstars02e14-par ansible_ssh_host="cstars02e14-par.storage.criteo.prod"
16
- cstars02e15-par ansible_ssh_host="cstars02e15-par.storage.criteo.prod"
17
- cstars02e16-par ansible_ssh_host="cstars02e16-par.storage.criteo.prod"
18
- cstars02e17-par ansible_ssh_host="cstars02e17-par.storage.criteo.prod"
@@ -1,13 +0,0 @@
1
- #!/bin/bash
2
-
3
- PLAYBOOK=$1
4
-
5
- if [ "$PLAYBOOK" = "" ]; then
6
- echo "Usage: $0 <playbook> [ansible options]"
7
- exit 65
8
- fi
9
-
10
- shift
11
- ansible-playbook --inventory-file=inventory.txt playbooks/$PLAYBOOK.yml --extra-vars $*
12
-
13
- exit $?
@@ -1,6 +0,0 @@
1
- ---
2
-
3
- - gather_facts: no
4
- hosts: cstars02-par
5
- roles:
6
- - role: planb
@@ -1,27 +0,0 @@
1
- #!/bin/bash
2
-
3
- kinit v.vanhollebeke@CRITEOIS.LAN -k -t ~/keytab
4
-
5
- date=`date +%Y_%m_%d`
6
-
7
- nodetool clearsnapshot
8
-
9
- snapdir=$(nodetool snapshot| grep directory| awk '{print $NF}')
10
- echo "Snapshot is $snapdir"
11
-
12
- for dir in $(find /var/opt/cassandra/data -type d |grep snapshots/$snapdir); do
13
- kok=$(klist -l|grep v.vanhollebeke@CRITEOIS.LAN|grep -v Expired|wc -l)
14
- if [ $kok == 0 ]; then
15
- echo "Must renew Kerberos ticket"
16
- kinit v.vanhollebeke@CRITEOIS.LAN -k -t ~/keytab
17
- else
18
- echo "Kerberos ticket OK"
19
- fi
20
- keyspace=`echo $dir|awk -F\/ '{print $6}'`
21
- table=`echo $dir|awk -F\/ '{print $7}'`
22
- echo "Saving $keyspace $table"
23
- ./httpfs.sh /var/opt/cassandra/data/$keyspace/$table/snapshots/$snapdir tmp/cassandrabackups/prod/cstars02/$date/$HOSTNAME/$table
24
-
25
- done
26
-
27
- echo "FINISHED !!!!"
@@ -1,27 +0,0 @@
1
- #!/bin/sh
2
-
3
- BASE='http://0.httpfs.hpc.criteo.prod:14000/webhdfs/v1'
4
- #BASE='http://httpfs.pa4.hpc.criteo.prod:14000'
5
-
6
- IN=$1
7
- OUT=$2
8
-
9
- echo "Creating destination directory: $OUT"
10
- curl --negotiate -u : "$BASE/$OUT?op=MKDIRS&permission=0777" -X PUT -s > /dev/null
11
-
12
- for p in $(find $IN -type f)
13
- do
14
- f=$(basename $p)
15
- echo "$IN/$f"
16
-
17
- # Create file
18
- dest=$(curl --negotiate -u : "$BASE/$OUT/$f?op=CREATE&overwrite=true&permission=0777" -i -X PUT -s | grep Location | tail -n1 | cut -d\ -f2 | tr -d '\r\n')
19
- [ $? != 0 ] && echo "ERROR"
20
-
21
- echo "DEST IS ${dest}"
22
-
23
- # Upload file
24
- curl --negotiate -u : "$dest" -i -X PUT -T "$IN/$f" -H 'Content-Type: application/octet-stream' > /dev/null
25
- [ $? != 0 ] && echo "ERROR"
26
-
27
- done
@@ -1,26 +0,0 @@
1
- [libdefaults]
2
- dns_lookup_realm = true
3
- dns_lookup_kdc = true
4
- ticket_lifetime = 24h
5
- renew_lifetime = 7d
6
- forwardable = true
7
- default_realm = CRITEOIS.LAN
8
- udp_preference_limit = 1
9
- realm_try_domains = 1
10
- permitted_enctypes = aes128-cts-hmac-sha1-96 des3-cbc-sha1 arcfour-hmac
11
- default_tkt_enctypes = aes128-cts-hmac-sha1-96 des3-cbc-sha1 arcfour-hmac
12
- [domain_realm]
13
- .hpc.criteo.preprod = HPC.CRITEO.PREPROD
14
- .hpc.criteo.prod = AMS.HPC.CRITEO.PROD
15
- .pa4.hpc.criteo.prod = PA4.HPC.CRITEO.PROD
16
- .as.hpc.criteo.prod = AS.HPC.CRITEO.PROD
17
- .na.hpc.criteo.prod = NA.HPC.CRITEO.PROD
18
- .cn.hpc.criteo.prod = CN.HPC.CRITEO.PROD
19
- [capaths]
20
- CRITEOIS.LAN = {
21
- AMS.HPC.CRITEO.PROD = .
22
- PA4.HPC.CRITEO.PROD = AMS.HPC.CRITEO.PROD
23
- AS.HPC.CRITEO.PROD = AMS.HPC.CRITEO.PROD
24
- NA.HPC.CRITEO.PROD = AMS.HPC.CRITEO.PROD
25
- CN.HPC.CRITEO.PROD = AMS.HPC.CRITEO.PROD
26
- }
@@ -1,34 +0,0 @@
1
- ---
2
-
3
- - name: Copy krb5.conf into /etc
4
- copy: src=krb5.conf dest=/etc/krb5.conf
5
- sudo: yes
6
- tags: keytab
7
-
8
- - name: Copy my keytab
9
- copy: src=keytab dest=~/keytab
10
- tags: keytab
11
-
12
- - name: Check if keytab works
13
- command: kinit $USER@CRITEOIS.LAN -k -t ~/keytab
14
- tags: keytab
15
-
16
- - name: Copy httpfs.sh script
17
- copy: src=httpfs.sh dest=~/httpfs.sh mode=750
18
- tags: backup
19
-
20
- - name: Copy backup.sh script
21
- copy: src=backup.sh dest=~/backup.sh mode=750
22
- tags: backup
23
-
24
- - name: Start Backup
25
- shell: ./backup.sh >logfile 2>&1 chdir=~
26
- tags: backup
27
-
28
- - name: Clear snapshots
29
- shell: sudo nodetool clearsnapshot
30
- tags: clear
31
-
32
- - name: Verify if snapshots are REALLY deleted
33
- shell: "[ $(find /var/opt/cassandra -type d |grep snap|wc -l) == 0 ]"
34
- tags: verify
File without changes