maxmind-db 1.0.0 → 1.1.0

Sign up to get free protection for your applications and to get access to all the features.
Files changed (77) hide show
  1. checksums.yaml +5 -5
  2. data/CHANGELOG.md +13 -2
  3. data/Gemfile +8 -0
  4. data/Gemfile.lock +34 -0
  5. data/README.md +2 -2
  6. data/Rakefile +2 -0
  7. data/bin/mmdb-benchmark.rb +4 -1
  8. data/lib/maxmind/db.rb +49 -27
  9. data/lib/maxmind/db/decoder.rb +26 -26
  10. data/lib/maxmind/db/errors.rb +3 -1
  11. data/lib/maxmind/db/file_reader.rb +5 -3
  12. data/lib/maxmind/db/memory_reader.rb +3 -1
  13. data/lib/maxmind/db/metadata.rb +3 -1
  14. data/maxmind-db.gemspec +5 -2
  15. data/test/mmdb_util.rb +2 -0
  16. data/test/test_decoder.rb +2 -0
  17. data/test/test_reader.rb +117 -6
  18. metadata +7 -64
  19. data/test/data/LICENSE +0 -4
  20. data/test/data/MaxMind-DB-spec.md +0 -558
  21. data/test/data/MaxMind-DB-test-metadata-pointers.mmdb +0 -0
  22. data/test/data/README.md +0 -4
  23. data/test/data/bad-data/README.md +0 -7
  24. data/test/data/bad-data/libmaxminddb/libmaxminddb-offset-integer-overflow.mmdb +0 -0
  25. data/test/data/bad-data/maxminddb-golang/cyclic-data-structure.mmdb +0 -0
  26. data/test/data/bad-data/maxminddb-golang/invalid-bytes-length.mmdb +0 -1
  27. data/test/data/bad-data/maxminddb-golang/invalid-data-record-offset.mmdb +0 -0
  28. data/test/data/bad-data/maxminddb-golang/invalid-map-key-length.mmdb +0 -0
  29. data/test/data/bad-data/maxminddb-golang/invalid-string-length.mmdb +0 -1
  30. data/test/data/bad-data/maxminddb-golang/metadata-is-an-uint128.mmdb +0 -1
  31. data/test/data/bad-data/maxminddb-golang/unexpected-bytes.mmdb +0 -0
  32. data/test/data/perltidyrc +0 -12
  33. data/test/data/source-data/GeoIP2-Anonymous-IP-Test.json +0 -41
  34. data/test/data/source-data/GeoIP2-City-Test.json +0 -12852
  35. data/test/data/source-data/GeoIP2-Connection-Type-Test.json +0 -102
  36. data/test/data/source-data/GeoIP2-Country-Test.json +0 -11347
  37. data/test/data/source-data/GeoIP2-DensityIncome-Test.json +0 -14
  38. data/test/data/source-data/GeoIP2-Domain-Test.json +0 -452
  39. data/test/data/source-data/GeoIP2-Enterprise-Test.json +0 -673
  40. data/test/data/source-data/GeoIP2-ISP-Test.json +0 -12585
  41. data/test/data/source-data/GeoIP2-Precision-Enterprise-Test.json +0 -1598
  42. data/test/data/source-data/GeoIP2-User-Count-Test.json +0 -2824
  43. data/test/data/source-data/GeoLite2-ASN-Test.json +0 -37
  44. data/test/data/source-data/README +0 -15
  45. data/test/data/test-data/GeoIP2-Anonymous-IP-Test.mmdb +0 -0
  46. data/test/data/test-data/GeoIP2-City-Test-Broken-Double-Format.mmdb +0 -0
  47. data/test/data/test-data/GeoIP2-City-Test-Invalid-Node-Count.mmdb +0 -0
  48. data/test/data/test-data/GeoIP2-City-Test.mmdb +0 -0
  49. data/test/data/test-data/GeoIP2-Connection-Type-Test.mmdb +0 -0
  50. data/test/data/test-data/GeoIP2-Country-Test.mmdb +0 -0
  51. data/test/data/test-data/GeoIP2-DensityIncome-Test.mmdb +0 -0
  52. data/test/data/test-data/GeoIP2-Domain-Test.mmdb +0 -0
  53. data/test/data/test-data/GeoIP2-Enterprise-Test.mmdb +0 -0
  54. data/test/data/test-data/GeoIP2-ISP-Test.mmdb +0 -0
  55. data/test/data/test-data/GeoIP2-Precision-Enterprise-Test.mmdb +0 -0
  56. data/test/data/test-data/GeoIP2-User-Count-Test.mmdb +0 -0
  57. data/test/data/test-data/GeoLite2-ASN-Test.mmdb +0 -0
  58. data/test/data/test-data/MaxMind-DB-no-ipv4-search-tree.mmdb +0 -0
  59. data/test/data/test-data/MaxMind-DB-string-value-entries.mmdb +0 -0
  60. data/test/data/test-data/MaxMind-DB-test-broken-pointers-24.mmdb +0 -0
  61. data/test/data/test-data/MaxMind-DB-test-broken-search-tree-24.mmdb +0 -0
  62. data/test/data/test-data/MaxMind-DB-test-decoder.mmdb +0 -0
  63. data/test/data/test-data/MaxMind-DB-test-ipv4-24.mmdb +0 -0
  64. data/test/data/test-data/MaxMind-DB-test-ipv4-28.mmdb +0 -0
  65. data/test/data/test-data/MaxMind-DB-test-ipv4-32.mmdb +0 -0
  66. data/test/data/test-data/MaxMind-DB-test-ipv6-24.mmdb +0 -0
  67. data/test/data/test-data/MaxMind-DB-test-ipv6-28.mmdb +0 -0
  68. data/test/data/test-data/MaxMind-DB-test-ipv6-32.mmdb +0 -0
  69. data/test/data/test-data/MaxMind-DB-test-metadata-pointers.mmdb +0 -0
  70. data/test/data/test-data/MaxMind-DB-test-mixed-24.mmdb +0 -0
  71. data/test/data/test-data/MaxMind-DB-test-mixed-28.mmdb +0 -0
  72. data/test/data/test-data/MaxMind-DB-test-mixed-32.mmdb +0 -0
  73. data/test/data/test-data/MaxMind-DB-test-nested.mmdb +0 -0
  74. data/test/data/test-data/README.md +0 -26
  75. data/test/data/test-data/maps-with-pointers.raw +0 -0
  76. data/test/data/test-data/write-test-data.pl +0 -620
  77. data/test/data/tidyall.ini +0 -5
@@ -1,10 +1,12 @@
1
+ # frozen_string_literal: true
2
+
1
3
  require 'maxmind/db/errors'
2
4
 
3
5
  module MaxMind # :nodoc:
4
6
  class DB
5
7
  class FileReader # :nodoc:
6
8
  def initialize(filename)
7
- @fh = File.new(filename, 'rb'.freeze)
9
+ @fh = File.new(filename, 'rb')
8
10
  @size = @fh.size
9
11
  @mutex = Mutex.new
10
12
  end
@@ -16,7 +18,7 @@ module MaxMind # :nodoc:
16
18
  end
17
19
 
18
20
  def read(offset, size)
19
- return ''.freeze.b if size == 0
21
+ return ''.b if size == 0
20
22
 
21
23
  # When we support only Ruby 2.5+, remove this and require pread.
22
24
  if @fh.respond_to?(:pread)
@@ -28,7 +30,7 @@ module MaxMind # :nodoc:
28
30
  end
29
31
  end
30
32
 
31
- raise InvalidDatabaseError, 'The MaxMind DB file contains bad data'.freeze if buf.nil? || buf.length != size
33
+ raise InvalidDatabaseError, 'The MaxMind DB file contains bad data' if buf.nil? || buf.length != size
32
34
 
33
35
  buf
34
36
  end
@@ -1,3 +1,5 @@
1
+ # frozen_string_literal: true
2
+
1
3
  module MaxMind # :nodoc:
2
4
  class DB
3
5
  class MemoryReader # :nodoc:
@@ -8,7 +10,7 @@ module MaxMind # :nodoc:
8
10
  return
9
11
  end
10
12
 
11
- @buf = File.read(filename, mode: 'rb'.freeze).freeze
13
+ @buf = File.read(filename, mode: 'rb').freeze
12
14
  @size = @buf.length
13
15
  end
14
16
 
@@ -1,7 +1,9 @@
1
+ # frozen_string_literal: true
2
+
1
3
  module MaxMind # :nodoc:
2
4
  class DB
3
5
  # Metadata holds metadata about a {MaxMind
4
- # DB}[http://maxmind.github.io/MaxMind-DB/] file.
6
+ # DB}[https://maxmind.github.io/MaxMind-DB/] file.
5
7
  class Metadata
6
8
  # The number of nodes in the database.
7
9
  attr_reader :node_count
@@ -1,12 +1,14 @@
1
+ # frozen_string_literal: true
2
+
1
3
  Gem::Specification.new do |s|
2
4
  s.authors = ['William Storey']
3
5
  s.files = Dir['**/*']
4
6
  s.name = 'maxmind-db'
5
7
  s.summary = 'A gem for reading MaxMind DB files.'
6
- s.version = '1.0.0'
8
+ s.version = '1.1.0'
7
9
 
8
10
  s.description = 'A gem for reading MaxMind DB files. MaxMind DB is a binary file format that stores data indexed by IP address subnets (IPv4 or IPv6).'
9
- s.email = 'wstorey@maxmind.com'
11
+ s.email = 'support@maxmind.com'
10
12
  s.homepage = 'https://github.com/maxmind/MaxMind-DB-Reader-ruby'
11
13
  s.licenses = ['Apache-2.0', 'MIT']
12
14
  s.metadata = {
@@ -16,4 +18,5 @@ Gem::Specification.new do |s|
16
18
  'homepage_uri' => 'https://github.com/maxmind/MaxMind-DB-Reader-ruby',
17
19
  'source_code_uri' => 'https://github.com/maxmind/MaxMind-DB-Reader-ruby',
18
20
  }
21
+ s.required_ruby_version = '>= 2.4.0'
19
22
  end
@@ -1,3 +1,5 @@
1
+ # frozen_string_literal: true
2
+
1
3
  class MMDBUtil # :nodoc:
2
4
  def self.make_metadata_map(record_size)
3
5
  # Map
@@ -1,3 +1,5 @@
1
+ # frozen_string_literal: true
2
+
1
3
  require 'maxmind/db'
2
4
  require 'minitest/autorun'
3
5
  require 'mmdb_util'
@@ -1,3 +1,5 @@
1
+ # frozen_string_literal: true
2
+
1
3
  require 'maxmind/db'
2
4
  require 'minitest/autorun'
3
5
  require 'mmdb_util'
@@ -27,6 +29,115 @@ class ReaderTest < Minitest::Test # :nodoc:
27
29
  end
28
30
  end
29
31
 
32
+ def test_get_with_prefix_len
33
+ decoder_record = {
34
+ 'array' => [1, 2, 3],
35
+ 'boolean' => true,
36
+ 'bytes' => "\x00\x00\x00*",
37
+ 'double' => 42.123456,
38
+ 'float' => 1.100000023841858,
39
+ 'int32' => -268_435_456,
40
+ 'map' => {
41
+ 'mapX' => {
42
+ 'arrayX' => [7, 8, 9],
43
+ 'utf8_stringX' => 'hello',
44
+ },
45
+ },
46
+ 'uint128' => 1_329_227_995_784_915_872_903_807_060_280_344_576,
47
+ 'uint16' => 0x64,
48
+ 'uint32' => 0x10000000,
49
+ 'uint64' => 0x1000000000000000,
50
+ 'utf8_string' => 'unicode! ☯ - ♫',
51
+ }
52
+
53
+ tests = [{
54
+ 'ip' => '1.1.1.1',
55
+ 'file_name' => 'MaxMind-DB-test-ipv6-32.mmdb',
56
+ 'expected_prefix_length' => 8,
57
+ 'expected_record' => nil,
58
+ }, {
59
+ 'ip' => '::1:ffff:ffff',
60
+ 'file_name' => 'MaxMind-DB-test-ipv6-24.mmdb',
61
+ 'expected_prefix_length' => 128,
62
+ 'expected_record' => {
63
+ 'ip' => '::1:ffff:ffff'
64
+ },
65
+ }, {
66
+ 'ip' => '::2:0:1',
67
+ 'file_name' => 'MaxMind-DB-test-ipv6-24.mmdb',
68
+ 'expected_prefix_length' => 122,
69
+ 'expected_record' => {
70
+ 'ip' => '::2:0:0'
71
+ },
72
+ }, {
73
+ 'ip' => '1.1.1.1',
74
+ 'file_name' => 'MaxMind-DB-test-ipv4-24.mmdb',
75
+ 'expected_prefix_length' => 32,
76
+ 'expected_record' => {
77
+ 'ip' => '1.1.1.1'
78
+ },
79
+ }, {
80
+ 'ip' => '1.1.1.3',
81
+ 'file_name' => 'MaxMind-DB-test-ipv4-24.mmdb',
82
+ 'expected_prefix_length' => 31,
83
+ 'expected_record' => {
84
+ 'ip' => '1.1.1.2'
85
+ },
86
+ }, {
87
+ 'ip' => '1.1.1.3',
88
+ 'file_name' => 'MaxMind-DB-test-decoder.mmdb',
89
+ 'expected_prefix_length' => 24,
90
+ 'expected_record' => decoder_record,
91
+ }, {
92
+ 'ip' => '::ffff:1.1.1.128',
93
+ 'file_name' => 'MaxMind-DB-test-decoder.mmdb',
94
+ 'expected_prefix_length' => 120,
95
+ 'expected_record' => decoder_record,
96
+ }, {
97
+ 'ip' => '::1.1.1.128',
98
+ 'file_name' => 'MaxMind-DB-test-decoder.mmdb',
99
+ 'expected_prefix_length' => 120,
100
+ 'expected_record' => decoder_record,
101
+ }, {
102
+ 'ip' => '200.0.2.1',
103
+ 'file_name' => 'MaxMind-DB-no-ipv4-search-tree.mmdb',
104
+ 'expected_prefix_length' => 0,
105
+ 'expected_record' => '::0/64',
106
+ }, {
107
+ 'ip' => '::200.0.2.1',
108
+ 'file_name' => 'MaxMind-DB-no-ipv4-search-tree.mmdb',
109
+ 'expected_prefix_length' => 64,
110
+ 'expected_record' => '::0/64',
111
+ }, {
112
+ 'ip' => '0:0:0:0:ffff:ffff:ffff:ffff',
113
+ 'file_name' => 'MaxMind-DB-no-ipv4-search-tree.mmdb',
114
+ 'expected_prefix_length' => 64,
115
+ 'expected_record' => '::0/64',
116
+ }, {
117
+ 'ip' => 'ef00::',
118
+ 'file_name' => 'MaxMind-DB-no-ipv4-search-tree.mmdb',
119
+ 'expected_prefix_length' => 1,
120
+ 'expected_record' => nil,
121
+ }]
122
+
123
+ tests.each do |test|
124
+ reader = MaxMind::DB.new('test/data/test-data/' + test['file_name'])
125
+ record, prefix_length = reader.get_with_prefix_length(test['ip'])
126
+
127
+ assert_equal(test['expected_prefix_length'], prefix_length,
128
+ format('expected prefix_length of %d for %s in %s but got %p',
129
+ test['expected_prefix_length'], test['ip'],
130
+ test['file_name'], prefix_length))
131
+
132
+ msg = format('expected_record for %s in %s', test['ip'], test['file_name'])
133
+ if test['expected_record'].nil?
134
+ assert_nil(record, msg)
135
+ else
136
+ assert_equal(test['expected_record'], record, msg)
137
+ end
138
+ end
139
+ end
140
+
30
141
  def test_decoder
31
142
  reader = MaxMind::DB.new(
32
143
  'test/data/test-data/MaxMind-DB-test-decoder.mmdb'
@@ -273,8 +384,8 @@ class ReaderTest < Minitest::Test # :nodoc:
273
384
  node_bytes: "\xab\xcd\xef".b + "\xbc\xfe\xfa".b,
274
385
  left: 11_259_375,
275
386
  right: 12_386_042,
276
- check_left: "\x00\xab\xcd\xef".b.unpack('N')[0],
277
- check_right: "\x00\xbc\xfe\xfa".b.unpack('N')[0],
387
+ check_left: "\x00\xab\xcd\xef".b.unpack1('N'),
388
+ check_right: "\x00\xbc\xfe\xfa".b.unpack1('N'),
278
389
  },
279
390
  {
280
391
  record_size: 28,
@@ -282,8 +393,8 @@ class ReaderTest < Minitest::Test # :nodoc:
282
393
  node_bytes: "\xab\xcd\xef".b + "\x12".b + "\xfd\xdc\xfa".b,
283
394
  left: 28_036_591,
284
395
  right: 50_191_610,
285
- check_left: "\x01\xab\xcd\xef".b.unpack('N')[0],
286
- check_right: "\x02\xfd\xdc\xfa".b.unpack('N')[0],
396
+ check_left: "\x01\xab\xcd\xef".b.unpack1('N'),
397
+ check_right: "\x02\xfd\xdc\xfa".b.unpack1('N'),
287
398
  },
288
399
  {
289
400
  record_size: 32,
@@ -291,8 +402,8 @@ class ReaderTest < Minitest::Test # :nodoc:
291
402
  node_bytes: "\xab\xcd\xef\x12".b + "\xfd\xdc\xfa\x15".b,
292
403
  left: 2_882_400_018,
293
404
  right: 4_259_117_589,
294
- check_left: "\xab\xcd\xef\x12".b.unpack('N')[0],
295
- check_right: "\xfd\xdc\xfa\x15".b.unpack('N')[0],
405
+ check_left: "\xab\xcd\xef\x12".b.unpack1('N'),
406
+ check_right: "\xfd\xdc\xfa\x15".b.unpack1('N'),
296
407
  },
297
408
  ]
298
409
 
metadata CHANGED
@@ -1,23 +1,25 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: maxmind-db
3
3
  version: !ruby/object:Gem::Version
4
- version: 1.0.0
4
+ version: 1.1.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - William Storey
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2019-01-05 00:00:00.000000000 Z
11
+ date: 2020-01-08 00:00:00.000000000 Z
12
12
  dependencies: []
13
13
  description: A gem for reading MaxMind DB files. MaxMind DB is a binary file format
14
14
  that stores data indexed by IP address subnets (IPv4 or IPv6).
15
- email: wstorey@maxmind.com
15
+ email: support@maxmind.com
16
16
  executables: []
17
17
  extensions: []
18
18
  extra_rdoc_files: []
19
19
  files:
20
20
  - CHANGELOG.md
21
+ - Gemfile
22
+ - Gemfile.lock
21
23
  - LICENSE-APACHE
22
24
  - LICENSE-MIT
23
25
  - README.dev.md
@@ -31,65 +33,6 @@ files:
31
33
  - lib/maxmind/db/memory_reader.rb
32
34
  - lib/maxmind/db/metadata.rb
33
35
  - maxmind-db.gemspec
34
- - test/data/LICENSE
35
- - test/data/MaxMind-DB-spec.md
36
- - test/data/MaxMind-DB-test-metadata-pointers.mmdb
37
- - test/data/README.md
38
- - test/data/bad-data/README.md
39
- - test/data/bad-data/libmaxminddb/libmaxminddb-offset-integer-overflow.mmdb
40
- - test/data/bad-data/maxminddb-golang/cyclic-data-structure.mmdb
41
- - test/data/bad-data/maxminddb-golang/invalid-bytes-length.mmdb
42
- - test/data/bad-data/maxminddb-golang/invalid-data-record-offset.mmdb
43
- - test/data/bad-data/maxminddb-golang/invalid-map-key-length.mmdb
44
- - test/data/bad-data/maxminddb-golang/invalid-string-length.mmdb
45
- - test/data/bad-data/maxminddb-golang/metadata-is-an-uint128.mmdb
46
- - test/data/bad-data/maxminddb-golang/unexpected-bytes.mmdb
47
- - test/data/perltidyrc
48
- - test/data/source-data/GeoIP2-Anonymous-IP-Test.json
49
- - test/data/source-data/GeoIP2-City-Test.json
50
- - test/data/source-data/GeoIP2-Connection-Type-Test.json
51
- - test/data/source-data/GeoIP2-Country-Test.json
52
- - test/data/source-data/GeoIP2-DensityIncome-Test.json
53
- - test/data/source-data/GeoIP2-Domain-Test.json
54
- - test/data/source-data/GeoIP2-Enterprise-Test.json
55
- - test/data/source-data/GeoIP2-ISP-Test.json
56
- - test/data/source-data/GeoIP2-Precision-Enterprise-Test.json
57
- - test/data/source-data/GeoIP2-User-Count-Test.json
58
- - test/data/source-data/GeoLite2-ASN-Test.json
59
- - test/data/source-data/README
60
- - test/data/test-data/GeoIP2-Anonymous-IP-Test.mmdb
61
- - test/data/test-data/GeoIP2-City-Test-Broken-Double-Format.mmdb
62
- - test/data/test-data/GeoIP2-City-Test-Invalid-Node-Count.mmdb
63
- - test/data/test-data/GeoIP2-City-Test.mmdb
64
- - test/data/test-data/GeoIP2-Connection-Type-Test.mmdb
65
- - test/data/test-data/GeoIP2-Country-Test.mmdb
66
- - test/data/test-data/GeoIP2-DensityIncome-Test.mmdb
67
- - test/data/test-data/GeoIP2-Domain-Test.mmdb
68
- - test/data/test-data/GeoIP2-Enterprise-Test.mmdb
69
- - test/data/test-data/GeoIP2-ISP-Test.mmdb
70
- - test/data/test-data/GeoIP2-Precision-Enterprise-Test.mmdb
71
- - test/data/test-data/GeoIP2-User-Count-Test.mmdb
72
- - test/data/test-data/GeoLite2-ASN-Test.mmdb
73
- - test/data/test-data/MaxMind-DB-no-ipv4-search-tree.mmdb
74
- - test/data/test-data/MaxMind-DB-string-value-entries.mmdb
75
- - test/data/test-data/MaxMind-DB-test-broken-pointers-24.mmdb
76
- - test/data/test-data/MaxMind-DB-test-broken-search-tree-24.mmdb
77
- - test/data/test-data/MaxMind-DB-test-decoder.mmdb
78
- - test/data/test-data/MaxMind-DB-test-ipv4-24.mmdb
79
- - test/data/test-data/MaxMind-DB-test-ipv4-28.mmdb
80
- - test/data/test-data/MaxMind-DB-test-ipv4-32.mmdb
81
- - test/data/test-data/MaxMind-DB-test-ipv6-24.mmdb
82
- - test/data/test-data/MaxMind-DB-test-ipv6-28.mmdb
83
- - test/data/test-data/MaxMind-DB-test-ipv6-32.mmdb
84
- - test/data/test-data/MaxMind-DB-test-metadata-pointers.mmdb
85
- - test/data/test-data/MaxMind-DB-test-mixed-24.mmdb
86
- - test/data/test-data/MaxMind-DB-test-mixed-28.mmdb
87
- - test/data/test-data/MaxMind-DB-test-mixed-32.mmdb
88
- - test/data/test-data/MaxMind-DB-test-nested.mmdb
89
- - test/data/test-data/README.md
90
- - test/data/test-data/maps-with-pointers.raw
91
- - test/data/test-data/write-test-data.pl
92
- - test/data/tidyall.ini
93
36
  - test/mmdb_util.rb
94
37
  - test/test_decoder.rb
95
38
  - test/test_reader.rb
@@ -111,7 +54,7 @@ required_ruby_version: !ruby/object:Gem::Requirement
111
54
  requirements:
112
55
  - - ">="
113
56
  - !ruby/object:Gem::Version
114
- version: '0'
57
+ version: 2.4.0
115
58
  required_rubygems_version: !ruby/object:Gem::Requirement
116
59
  requirements:
117
60
  - - ">="
@@ -119,7 +62,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
119
62
  version: '0'
120
63
  requirements: []
121
64
  rubyforge_project:
122
- rubygems_version: 2.7.6
65
+ rubygems_version: 2.5.2.1
123
66
  signing_key:
124
67
  specification_version: 4
125
68
  summary: A gem for reading MaxMind DB files.
@@ -1,4 +0,0 @@
1
- This work is licensed under the Creative Commons Attribution-ShareAlike 3.0
2
- Unported License. To view a copy of this license, visit
3
- http://creativecommons.org/licenses/by-sa/3.0/ or send a letter to Creative
4
- Commons, 444 Castro Street, Suite 900, Mountain View, California, 94041, USA.
@@ -1,558 +0,0 @@
1
- ---
2
- layout: default
3
- title: MaxMind DB File Format Specification
4
- version: v2.0
5
- ---
6
- # MaxMind DB File Format Specification
7
-
8
- ## Description
9
-
10
- The MaxMind DB file format is a database format that maps IPv4 and IPv6
11
- addresses to data records using an efficient binary search tree.
12
-
13
- ## Version
14
-
15
- This spec documents **version 2.0** of the MaxMind DB binary format.
16
-
17
- The version number consists of separate major and minor version numbers. It
18
- should not be considered a decimal number. In other words, version 2.10 comes
19
- after version 2.9.
20
-
21
- Code which is capable of reading a given major version of the format should
22
- not be broken by minor version changes to the format.
23
-
24
- ## Overview
25
-
26
- The binary database is split into three parts:
27
-
28
- 1. The binary search tree. Each level of the tree corresponds to a single bit
29
- in the 128 bit representation of an IPv6 address.
30
- 2. The data section. These are the values returned to the client for a
31
- specific IP address, e.g. "US", "New York", or a more complex map type made up
32
- of multiple fields.
33
- 3. Database metadata. Information about the database itself.
34
-
35
- ## Database Metadata
36
-
37
- This portion of the database is stored at the end of the file. It is
38
- documented first because understanding some of the metadata is key to
39
- understanding how the other sections work.
40
-
41
- This section can be found by looking for a binary sequence matching
42
- "\xab\xcd\xefMaxMind.com". The *last* occurrence of this string in the file
43
- marks the end of the data section and the beginning of the metadata. Since we
44
- allow for arbitrary binary data in the data section, some other piece of data
45
- could contain these values. This is why you need to find the last occurrence
46
- of this sequence.
47
-
48
- The maximum allowable size for the metadata section, including the marker that
49
- starts the metadata, is 128KiB.
50
-
51
- The metadata is stored as a map data structure. This structure is described
52
- later in the spec. Changing a key's data type or removing a key would
53
- constitute a major version change for this spec.
54
-
55
- Except where otherwise specified, each key listed is required for the database
56
- to be considered valid.
57
-
58
- Adding a key constitutes a minor version change. Removing a key or changing
59
- its type constitutes a major version change.
60
-
61
- The list of known keys for the current version of the format is as follows:
62
-
63
- ### node\_count
64
-
65
- This is an unsigned 32-bit integer indicating the number of nodes in the
66
- search tree.
67
-
68
- ### record\_size
69
-
70
- This is an unsigned 16-bit integer. It indicates the number of bits in a
71
- record in the search tree. Note that each node consists of *two* records.
72
-
73
- ### ip\_version
74
-
75
- This is an unsigned 16-bit integer which is always 4 or 6. It indicates
76
- whether the database contains IPv4 or IPv6 address data.
77
-
78
- ### database\_type
79
-
80
- This is a string that indicates the structure of each data record associated
81
- with an IP address. The actual definition of these structures is left up to
82
- the database creator.
83
-
84
- Names starting with "GeoIP" are reserved for use by MaxMind (and "GeoIP" is a
85
- trademark anyway).
86
-
87
- ### languages
88
-
89
- An array of strings, each of which is a locale code. A given record may
90
- contain data items that have been localized to some or all of these
91
- locales. Records should not contain localized data for locales not included in
92
- this array.
93
-
94
- This is an optional key, as this may not be relevant for all types of data.
95
-
96
- ### binary\_format\_major\_version
97
-
98
- This is an unsigned 16-bit integer indicating the major version number for the
99
- database's binary format.
100
-
101
- ### binary\_format\_minor\_version
102
-
103
- This is an unsigned 16-bit integer indicating the minor version number for the
104
- database's binary format.
105
-
106
- ### build\_epoch
107
-
108
- This is an unsigned 64-bit integer that contains the database build timestamp
109
- as a Unix epoch value.
110
-
111
- ### description
112
-
113
- This key will always point to a map. The keys of that map will be language
114
- codes, and the values will be a description in that language as a UTF-8
115
- string.
116
-
117
- The codes may include additional information such as script or country
118
- identifiers, like "zh-TW" or "mn-Cyrl-MN". The additional identifiers will be
119
- separated by a dash character ("-").
120
-
121
- This key is optional. However, creators of databases are strongly
122
- encouraged to include a description in at least one language.
123
-
124
- ### Calculating the Search Tree Section Size
125
-
126
- The formula for calculating the search tree section size *in bytes* is as
127
- follows:
128
-
129
- ( ( $record_size * 2 ) / 8 ) * $number_of_nodes
130
-
131
- The end of the search tree marks the beginning of the data section.
132
-
133
- ## Binary Search Tree Section
134
-
135
- The database file starts with a binary search tree. The number of nodes in the
136
- tree is dependent on how many unique netblocks are needed for the particular
137
- database. For example, the city database needs many more small netblocks than
138
- the country database.
139
-
140
- The top most node is always located at the beginning of the search tree
141
- section's address space. The top node is node 0.
142
-
143
- Each node consists of two records, each of which is a pointer to an address in
144
- the file.
145
-
146
- The pointers can point to one of three things. First, it may point to another
147
- node in the search tree address space. These pointers are followed as part of
148
- the IP address search algorithm, described below.
149
-
150
- The pointer can point to a value equal to `$number_of_nodes`. If this is the
151
- case, it means that the IP address we are searching for is not in the
152
- database.
153
-
154
- Finally, it may point to an address in the data section. This is the data
155
- relevant to the given netblock.
156
-
157
- ### Node Layout
158
-
159
- Each node in the search tree consists of two records, each of which is a
160
- pointer. The record size varies by database, but inside a single database node
161
- records are always the same size. A record may be anywhere from 24 to 128 bits
162
- long, depending on the number of nodes in the tree. These pointers are
163
- stored in big-endian format (most significant byte first).
164
-
165
- Here are some examples of how the records are laid out in a node for 24, 28,
166
- and 32 bit records. Larger record sizes follow this same pattern.
167
-
168
- #### 24 bits (small database), one node is 6 bytes
169
-
170
- | <------------- node --------------->|
171
- | 23 .. 0 | 23 .. 0 |
172
-
173
- #### 28 bits (medium database), one node is 7 bytes
174
-
175
- | <------------- node --------------->|
176
- | 23 .. 0 | 27..24 | 27..24 | 23 .. 0 |
177
-
178
- Note, the last 4 bits of each pointer are combined into the middle byte.
179
-
180
- #### 32 bits (large database), one node is 8 bytes
181
-
182
- | <------------- node --------------->|
183
- | 31 .. 0 | 31 .. 0 |
184
-
185
- ### Search Lookup Algorithm
186
-
187
- The first step is to convert the IP address to its big-endian binary
188
- representation. For an IPv4 address, this becomes 32 bits. For IPv6 you get
189
- 128 bits.
190
-
191
- The leftmost bit corresponds to the first node in the search tree. For each
192
- bit, a value of 0 means we choose the left record in a node, and a value of 1
193
- means we choose the right record.
194
-
195
- The record value is always interpreted as an unsigned integer. The maximum
196
- size of the integer is dependent on the number of bits in a record (24, 28, or
197
- 32).
198
-
199
- If the record value is a number that is less than the *number of nodes* (not
200
- in bytes, but the actual node count) in the search tree (this is stored in the
201
- database metadata), then the value is a node number. In this case, we find
202
- that node in the search tree and repeat the lookup algorithm from there.
203
-
204
- If the record value is equal to the number of nodes, that means that we do not
205
- have any data for the IP address, and the search ends here.
206
-
207
- If the record value is *greater* than the number of nodes in the search tree,
208
- then it is an actual pointer value pointing into the data section. The value
209
- of the pointer is calculated from the start of the data section, *not* from
210
- the start of the file.
211
-
212
- In order to determine where in the data section we should start looking, we use
213
- the following formula:
214
-
215
- $data_section_offset = ( $record_value - $node_count ) - 16
216
-
217
- The `16` is the size of the data section separator (see below for details).
218
-
219
- The reason that we subtract the `$node_count` is best demonstrated by an example.
220
-
221
- Let's assume we have a 24-bit tree with 1,000 nodes. Each node contains 48
222
- bits, or 6 bytes. The size of the tree is 6,000 bytes.
223
-
224
- When a record in the tree contains a number that is less than 1,000, this
225
- is a *node number*, and we look up that node. If a record contains a value
226
- greater than or equal to 1,016, we know that it is a data section value. We
227
- subtract the node count (1,000) and then subtract 16 for the data section
228
- separator, giving us the number 0, the first byte of the data section.
229
-
230
- If a record contained the value 6,000, this formula would give us an offset of
231
- 4,984 into the data section.
232
-
233
- In order to determine where in the file this offset really points to, we also
234
- need to know where the data section starts. This can be calculated by
235
- determining the size of the search tree in bytes and then adding an additional
236
- 16 bytes for the data section separator.
237
-
238
- So the final formula to determine the offset in the file is:
239
-
240
- $offset_in_file = ( $record_value - $node_count )
241
- + $search_tree_size_in_bytes
242
-
243
- ### IPv4 addresses in an IPv6 tree
244
-
245
- When storing IPv4 addresses in an IPv6 tree, they are stored as-is, so they
246
- occupy the first 32-bits of the address space (from 0 to 2**32 - 1).
247
-
248
- Creators of databases should decide on a strategy for handling the various
249
- mappings between IPv4 and IPv6.
250
-
251
- The strategy that MaxMind uses for its GeoIP databases is to include a pointer
252
- from the `::ffff:0:0/96` subnet to the root node of the IPv4 address space in
253
- the tree. This accounts for the
254
- [IPv4-mapped IPv6 address](http://en.wikipedia.org/wiki/IPv6#IPv4-mapped_IPv6_addresses).
255
-
256
- MaxMind also includes a pointer from the `2002::/16` subnet to the root node
257
- of the IPv4 address space in the tree. This accounts for the
258
- [6to4 mapping](http://en.wikipedia.org/wiki/6to4) subnet.
259
-
260
- Database creators are encouraged to document whether they are doing something
261
- similar for their databases.
262
-
263
- The Teredo subnet cannot be accounted for in the tree. Instead, code that
264
- searches the tree can offer to decode the IPv4 portion of a Teredo address and
265
- look that up.
266
-
267
- ## Data Section Separator
268
-
269
- There are 16 bytes of NULLs in between the search tree and the data
270
- section. This separator exists in order to make it possible for a verification
271
- tool to distinguish between the two sections.
272
-
273
- This separator is not considered part of the data section itself. In other
274
- words, the data section starts at `$size_of_search_tree + 16` bytes in the
275
- file.
276
-
277
- ## Output Data Section
278
-
279
- Each output data field has an associated type, and that type is encoded as a
280
- number that begins the data field. Some types are variable length. In those
281
- cases, the type indicator is also followed by a length. The data payload
282
- always comes at the end of the field.
283
-
284
- All binary data is stored in big-endian format.
285
-
286
- Note that the *interpretation* of a given data type's meaning is decided by
287
- higher-level APIs, not by the binary format itself.
288
-
289
- ### pointer - 1
290
-
291
- A pointer to another part of the data section's address space. The pointer
292
- will point to the beginning of a field. It is illegal for a pointer to point
293
- to another pointer.
294
-
295
- Pointer values start from the beginning of the data section, *not* the
296
- beginning of the file.
297
-
298
- ### UTF-8 string - 2
299
-
300
- A variable length byte sequence that contains valid utf8. If the length is
301
- zero then this is an empty string.
302
-
303
- ### double - 3
304
-
305
- This is stored as an IEEE-754 double (binary64) in big-endian format. The
306
- length of a double is always 8 bytes.
307
-
308
- ### bytes - 4
309
-
310
- A variable length byte sequence containing any sort of binary data. If the
311
- length is zero then this a zero-length byte sequence.
312
-
313
- This is not currently used but may be used in the future to embed non-text
314
- data (images, etc.).
315
-
316
- ### integer formats
317
-
318
- Integers are stored in variable length binary fields.
319
-
320
- We support 16-bit, 32-bit, 64-bit, and 128-bit unsigned integers. We also
321
- support 32-bit signed integers.
322
-
323
- A 128-bit integer can use up to 16 bytes, but may use fewer. Similarly, a
324
- 32-bit integer may use from 0-4 bytes. The number of bytes used is determined
325
- by the length specifier in the control byte. See below for details.
326
-
327
- A length of zero always indicates the number 0.
328
-
329
- When storing a signed integer, the left-most bit is the sign. A 1 is negative
330
- and a 0 is positive.
331
-
332
- The type numbers for our integer types are:
333
-
334
- * unsigned 16-bit int - 5
335
- * unsigned 32-bit int - 6
336
- * signed 32-bit int - 8
337
- * unsigned 64-bit int - 9
338
- * unsigned 128-bit int - 10
339
-
340
- The unsigned 32-bit and 128-bit types may be used to store IPv4 and IPv6
341
- addresses, respectively.
342
-
343
- The signed 32-bit integers are stored using the 2's complement representation.
344
-
345
- ### map - 7
346
-
347
- A map data type contains a set of key/value pairs. Unlike other data types,
348
- the length information for maps indicates how many key/value pairs it
349
- contains, not its length in bytes. This size can be zero.
350
-
351
- See below for the algorithm used to determine the number of pairs in the
352
- hash. This algorithm is also used to determine the length of a field's
353
- payload.
354
-
355
- ### array - 11
356
-
357
- An array type contains a set of ordered values. The length information for
358
- arrays indicates how many values it contains, not its length in bytes. This
359
- size can be zero.
360
-
361
- This type uses the same algorithm as maps for determining the length of a
362
- field's payload.
363
-
364
- ### data cache container - 12
365
-
366
- This is a special data type that marks a container used to cache repeated
367
- data. For example, instead of repeating the string "United States" over and
368
- over in the database, we store it in the cache container and use pointers
369
- *into* this container instead.
370
-
371
- Nothing in the database will ever contain a pointer to this field
372
- itself. Instead, various fields will point into the container.
373
-
374
- The primary reason for making this a separate data type versus simply inlining
375
- the cached data is so that a database dumper tool can skip this cache when
376
- dumping the data section. The cache contents will end up being dumped as
377
- pointers into it are followed.
378
-
379
- ### end marker - 13
380
-
381
- The end marker marks the end of the data section. It is not strictly
382
- necessary, but including this marker allows a data section deserializer to
383
- process a stream of input, rather than having to find the end of the section
384
- before beginning the deserialization.
385
-
386
- This data type is not followed by a payload, and its size is always zero.
387
-
388
- ### boolean - 14
389
-
390
- A true or false value. The length information for a boolean type will always
391
- be 0 or 1, indicating the value. There is no payload for this field.
392
-
393
- ### float - 15
394
-
395
- This is stored as an IEEE-754 float (binary32) in big-endian format. The
396
- length of a float is always 4 bytes.
397
-
398
- This type is provided primarily for completeness. Because of the way floating
399
- point numbers are stored, this type can easily lose precision when serialized
400
- and then deserialized. If this is an issue for you, consider using a double
401
- instead.
402
-
403
- ### Data Field Format
404
-
405
- Each field starts with a control byte. This control byte provides information
406
- about the field's data type and payload size.
407
-
408
- The first three bits of the control byte tell you what type the field is. If
409
- these bits are all 0, then this is an "extended" type, which means that the
410
- *next* byte contains the actual type. Otherwise, the first three bits will
411
- contain a number from 1 to 7, the actual type for the field.
412
-
413
- We've tried to assign the most commonly used types as numbers 1-7 as an
414
- optimization.
415
-
416
- With an extended type, the type number in the second byte is the number
417
- minus 7. In other words, an array (type 11) will be stored with a 0 for the
418
- type in the first byte and a 4 in the second.
419
-
420
- Here is an example of how the control byte may combine with the next byte to
421
- tell us the type:
422
-
423
- 001XXXXX pointer
424
- 010XXXXX UTF-8 string
425
- 110XXXXX unsigned 32-bit int (ASCII)
426
- 000XXXXX 00000011 unsigned 128-bit int (binary)
427
- 000XXXXX 00000100 array
428
- 000XXXXX 00000110 end marker
429
-
430
- #### Payload Size
431
-
432
- The next five bits in the control byte tell you how long the data field's
433
- payload is, except for maps and pointers. Maps and pointers use this size
434
- information a bit differently. See below.
435
-
436
- If the five bits are smaller than 29, then those bits are the payload size in
437
- bytes. For example:
438
-
439
- 01000010 UTF-8 string - 2 bytes long
440
- 01011100 UTF-8 string - 28 bytes long
441
- 11000001 unsigned 32-bit int - 1 byte long
442
- 00000011 00000011 unsigned 128-bit int - 3 bytes long
443
-
444
- If the five bits are equal to 29, 30, or 31, then use the following algorithm
445
- to calculate the payload size.
446
-
447
- If the value is 29, then the size is 29 + *the next byte after the type
448
- specifying bytes as an unsigned integer*.
449
-
450
- If the value is 30, then the size is 285 + *the next two bytes after the type
451
- specifying bytes as a single unsigned integer*.
452
-
453
- If the value is 31, then the size is 65,821 + *the next three bytes after the
454
- type specifying bytes as a single unsigned integer*.
455
-
456
- Some examples:
457
-
458
- 01011101 00110011 UTF-8 string - 80 bytes long
459
-
460
- In this case, the last five bits of the control byte equal 29. We treat the
461
- next byte as an unsigned integer. The next byte is 51, so the total size is
462
- (29 + 51) = 80.
463
-
464
- 01011110 00110011 00110011 UTF-8 string - 13,392 bytes long
465
-
466
- The last five bits of the control byte equal 30. We treat the next two bytes
467
- as a single unsigned integer. The next two bytes equal 13,107, so the total
468
- size is (285 + 13,107) = 13,392.
469
-
470
- 01011111 00110011 00110011 00110011 UTF-8 string - 3,421,264 bytes long
471
-
472
- The last five bits of the control byte equal 31. We treat the next three bytes
473
- as a single unsigned integer. The next three bytes equal 3,355,443, so the
474
- total size is (65,821 + 3,355,443) = 3,421,264.
475
-
476
- This means that the maximum payload size for a single field is 16,843,036
477
- bytes.
478
-
479
- The binary number types always have a known size, but for consistency's sake,
480
- the control byte will always specify the correct size for these types.
481
-
482
- #### Maps
483
-
484
- Maps use the size in the control byte (and any following bytes) to indicate
485
- the number of key/value pairs in the map, not the size of the payload in
486
- bytes.
487
-
488
- This means that the maximum number of pairs for a single map is 16,843,036.
489
-
490
- Maps are laid out with each key followed by its value, followed by the next
491
- pair, etc.
492
-
493
- The keys are **always** UTF-8 strings. The values may be any data type,
494
- including maps or pointers.
495
-
496
- Once we know the number of pairs, we can look at each pair in turn to
497
- determine the size of the key and the key name, as well as the value's type
498
- and payload.
499
-
500
- #### Pointers
501
-
502
- Pointers use the last five bits in the control byte to calculate the pointer
503
- value.
504
-
505
- To calculate the pointer value, we start by subdividing the five bits into two
506
- groups. The first two bits indicate the size, and the next three bits are part
507
- of the value, so we end up with a control byte breaking down like this:
508
- 001SSVVV.
509
-
510
- The size can be 0, 1, 2, or 3.
511
-
512
- If the size is 0, the pointer is built by appending the next byte to the last
513
- three bits to produce an 11-bit value.
514
-
515
- If the size is 1, the pointer is built by appending the next two bytes to the
516
- last three bits to produce a 19-bit value + 2048.
517
-
518
- If the size is 2, the pointer is built by appending the next three bytes to the
519
- last three bits to produce a 27-bit value + 526336.
520
-
521
- Finally, if the size is 3, the pointer's value is contained in the next four
522
- bytes as a 32-bit value. In this case, the last three bits of the control byte
523
- are ignored.
524
-
525
- This means that we are limited to 4GB of address space for pointers, so the
526
- data section size for the database is limited to 4GB.
527
-
528
- ## Reference Implementations
529
-
530
- ### Writer
531
-
532
- * [Perl](https://github.com/maxmind/MaxMind-DB-Writer-perl)
533
-
534
- ### Reader
535
-
536
- * [C](https://github.com/maxmind/libmaxminddb)
537
- * [C#](https://github.com/maxmind/MaxMind-DB-Reader-dotnet)
538
- * [Java](https://github.com/maxmind/MaxMind-DB-Reader-java)
539
- * [Perl](https://github.com/maxmind/MaxMind-DB-Reader-perl)
540
- * [PHP](https://github.com/maxmind/MaxMind-DB-Reader-php)
541
- * [Python](https://github.com/maxmind/MaxMind-DB-Reader-python)
542
-
543
- ## Authors
544
-
545
- This specification was created by the following authors:
546
-
547
- * Greg Oschwald \<goschwald@maxmind.com\>
548
- * Dave Rolsky \<drolsky@maxmind.com\>
549
- * Boris Zentner \<bzentner@maxmind.com\>
550
-
551
- ## License
552
-
553
- This work is licensed under the Creative Commons Attribution-ShareAlike 3.0
554
- Unported License. To view a copy of this license, visit
555
- [http://creativecommons.org/licenses/by-sa/3.0/](http://creativecommons.org/licenses/by-sa/3.0/)
556
- or send a letter to Creative Commons, 444 Castro Street, Suite 900, Mountain
557
- View, California, 94041, USA
558
-