datafusion 0.0.1 → 0.0.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 2cbf10e291f48bd2ef08847fcc61cd175c9fc5a9
4
- data.tar.gz: 745fbaea6b99886b511f353dd51c08220cfd39f4
3
+ metadata.gz: e053ca5d480cb3a5f873faad4374f958c74f72f8
4
+ data.tar.gz: c99231072632111edeea398f222963dabfd002a2
5
5
  SHA512:
6
- metadata.gz: efc5715f103b9b5713fb535d214fbdaef0106969e09dea2049d98ee54e0c0fa80ce6273e97d994a8c72eccc293fc2f05f5560a307b83b2bdaeb5bc2cf0704a1c
7
- data.tar.gz: 32c38f0c12f011c24415692908f8537c11115748323f202a7ab7337cb9213227f4066de3ac31e33a0401c1e72c20508f086d4a736c195bc83d08bc92ca2788fc
6
+ metadata.gz: b893f1e1661b1f4c3cfa3f731f5be0fd3195e2b68df4669c6080911f4c5f69fe2022fb4dbf32827e7018063be75f8e4032ca14db45a4c93b4a863c9005dce1ac
7
+ data.tar.gz: 15f66d30ac9a16dd484ab2334fd6cca0c9f954f9b9e8b2fecb7ec15f2fca072b8ce08b6d2e7a270d3985e591d3d20c34fde5db7f1673375c1c745a601ac82399
data/README.md CHANGED
@@ -1,59 +1,103 @@
1
- # Mediumize
1
+ # Datafusion
2
2
 
3
- [![Gem Version](https://img.shields.io/gem/v/mediumize.svg)](https://rubygems.org/gems/mediumize)
4
- [![Build Status](https://travis-ci.org/jondot/mediumize.svg?branch=master)](https://travis-ci.org/jondot/mediumize)
3
+ [![Gem Version](https://img.shields.io/gem/v/datafusion.svg)](https://rubygems.org/gems/datafusion)
4
+ [![Build Status](https://travis-ci.org/jondot/datafusion.svg?branch=master)](https://travis-ci.org/jondot/datafusion)
5
5
 
6
- Automatically post (and cross-post) your markdown style blog posts to your [Medium](http://medium.com) account from [Jekyll](http://jekyllrb.com/), [Middleman](middlemanapp.com), [Hugo](http://gohugo.io/) and others.
6
+ Fuse various data from different databases and data sources using Postgres, and generate
7
+ a one-stop-shop for your BI activity with simple SQL.
7
8
 
8
- Mediumize will only publish drafts, and never publicly.
9
9
 
10
10
 
11
- ## Installation
12
11
 
13
- Add this line to your application's Gemfile:
12
+ ## Installation
14
13
 
15
- ```ruby
16
- gem 'mediumize'
14
+ ```
15
+ $ gem install datafusion
17
16
  ```
18
17
 
19
- And then execute:
18
+ ## Usage
20
19
 
21
- $ bundle
20
+ This is the configurator part of Datafusion, which is used internally with the Docker image.
21
+ You can use the docker image directly to get all of the functionality needed in one package.
22
22
 
23
- Or install it yourself as:
23
+ However, if you are composing your own image, or just wanting an easy way to do foreign
24
+ data wrapper, use the instructions below.
24
25
 
25
- $ gem install mediumize
26
26
 
27
- ## Usage
27
+ You should have an `integrations.yaml` file (see below for more).
28
28
 
29
- Either via command line (suitable for manual / Hugo flows):
29
+ ```
30
+ $ datafusion -f integrations.yaml
31
+ :
32
+ : SQL output...
33
+ :
34
+ .
35
+ ```
36
+ The tool will spit out all of the necessary SQL setup code for your database to run.
37
+ You can pipe it to `psql` or capture into a file to run with the `psql -f` command:
30
38
 
31
- $ mediumize -t your-medium-integration-token file1.md file2.md ... fileN.md
39
+ Piping:
32
40
 
33
- Or, integrate it via Ruby into your Jekyll / Middleman flow:
41
+ ```
42
+ $ datafusion -f integrations.yaml | psql -U postgres
43
+ ```
34
44
 
35
- ```ruby
36
- require 'mediumize'
37
- p = Mediumize::Publisher(
38
- :token => "your-medium-integration-token",
39
- :frontmatter => true
40
- )
45
+ With a file:
41
46
 
42
- %w{
43
- file1.md
44
- file2.md
45
- fileN.md
46
- }.each do |file|
47
- puts p.publish(file)
48
- end
49
47
  ```
48
+ $ datafusion -f integrations.yaml > /tmp/script.sql && psql -U postgres -f /tmp/script.sql
49
+ ```
50
+
51
+
52
+ _Not yet implemented_:
53
+
54
+ You can use the -c flag to provide a connection in the form of a url to a `postgres`
55
+ database:
50
56
 
51
- ## Development
57
+ ```
58
+ $ datafusion -f integrations.yaml -c posgres://postgres:pass@localhost:5432/mydb
59
+ ```
60
+
61
+ ## Integrations.yaml
62
+
63
+ This tool uses a special specification for data sources, typically in a file called
64
+ `integrations.yaml`. Here is an example:
65
+
66
+ ```yaml
67
+ postgres1:
68
+ kind: postgres
69
+ server:
70
+ address: localhost
71
+ port: 5432
72
+ username: u1
73
+ password: p1
74
+ dbname: users
75
+ tables:
76
+ - name: ware1
77
+ table_name: registrations
78
+ mapping:
79
+ id: TEXT
80
+ warehouse_id: TEXT
81
+ mysql1:
82
+ kind: mysql
83
+ server:
84
+ address: localhost
85
+ port: 3306
86
+ username: u1
87
+ password: p1
88
+ dbname: users
89
+ tables:
90
+ - name: ware1
91
+ table_name: registrations
92
+ mapping:
93
+ id: TEXT
94
+ warehouse_id: TEXT
95
+ ```
52
96
 
53
- 1. `git clone https://github.com/jondot/mediumize && cd mediumize`
54
- 2. `bundle`
55
- 3. `rake test`
56
- 4. Optionally, use guard
97
+ The idea is to specify your databases or data source in a human-readable way once,
98
+ and have that parsed by datafusion and set up a `postgres` instance to be able to
99
+ integrate with them and give you the ability to fuse and dissect your data across
100
+ sources.
57
101
 
58
102
 
59
103
  # Contributing
@@ -62,7 +106,7 @@ Fork, implement, add tests, pull request, get my everlasting thanks and a respec
62
106
 
63
107
  ### Thanks:
64
108
 
65
- To all [contributors](https://github.com/jondot/mediumize/graphs/contributors)
109
+ To all [contributors](https://github.com/jondot/datafusion/graphs/contributors)
66
110
 
67
111
  # Copyright
68
112
 
data/bin/datafusion CHANGED
@@ -14,11 +14,12 @@ end
14
14
  # $ datafusion --fuse integrations.yml
15
15
  # $ datafusion --agent
16
16
  #
17
- begin
18
17
  o = Slop::Options.new
19
18
  o.string '-f', '--fuse', ''
20
19
  o.string '-u', '--user', '', default: 'postgres'
21
- o.bool '-a', '--agent', '', default: false
20
+ o.string '-a', '--agent', 'Connection string (i.e postgres://localhost)', default: ""
21
+ o.bool '-d', '--dryrun', 'dry run for refreshes', default: false
22
+
22
23
  o.on '--version', 'print the version' do
23
24
  puts Datafusion::VERSION
24
25
  exit
@@ -29,15 +30,22 @@ begin
29
30
  end
30
31
  opts = Slop::Parser.new(o).parse(ARGV)
31
32
 
32
- # if agent..
33
-
34
-
35
- if opts[:fuse] && !File.exist?(opts[:fuse])
36
- bail "Error: please provide a file to fuse", opts
33
+ if opts[:fuse] && opts[:agent].empty?
34
+ if File.exist?(opts[:fuse])
35
+ puts Datafusion.fuse(opts[:user], opts[:fuse])
36
+ else
37
+ bail "Error: please provide a file to fuse", opts
38
+ end
39
+ elsif opts[:fuse] && opts[:agent]
40
+
41
+ exec_class = Datafusion::DebugExecutor
42
+ unless opts[:dryrun]
43
+ exec_class = Datafusion::DbExecutor
44
+ end
45
+ exec = exec_class.new(opts[:agent])
46
+ sched = Datafusion.refresh(opts[:fuse], exec)
47
+ Datafusion.log.info("Running refresh agent.")
48
+ sched.join
37
49
  end
38
- puts Datafusion.fuse(opts[:user], opts[:fuse])
39
50
 
40
51
 
41
- rescue
42
- bail "Error: #{$!}", o
43
- end
data/datafusion.gemspec CHANGED
@@ -21,6 +21,8 @@ Gem::Specification.new do |spec|
21
21
 
22
22
  spec.add_dependency 'slop', '~> 4.2.1'
23
23
  spec.add_dependency 'colorize', '~> 0.7.7'
24
+ spec.add_dependency 'rufus-scheduler', '~> 3.2.0'
25
+ spec.add_dependency 'sequel', '~> 4.3.0'
24
26
 
25
27
  spec.add_development_dependency "bundler", "~> 1.10"
26
28
  spec.add_development_dependency "rake", "~> 10.0"
@@ -0,0 +1,34 @@
1
+ require 'sequel'
2
+
3
+ module Datafusion
4
+ class DbExecutor
5
+ TAG = "DBEXECUTOR"
6
+
7
+ def initialize(conn)
8
+ @db = Sequel.connect(conn)
9
+ end
10
+ def exec(schedule)
11
+ #
12
+ # TODO use refresh [..] concurrently
13
+ #
14
+ # This means we also need to define a unique index per materialized
15
+ # view so that PG will know how to use MVCC.
16
+ #
17
+ # This needs some code to detect:
18
+ # 1. At setup time - when an index is already there, don't add it.
19
+ # 2. At refresh time - if a table doesn't have any data, it cannot be
20
+ # refreshed with concurrently - it needs a normal refresh first.
21
+ #
22
+ # For now we refresh and block.
23
+ #
24
+ run = rand(36**5).to_s(36)
25
+
26
+ Datafusion.log.info("#{TAG}: starting run id:#{run} for #{schedule}")
27
+ refresh_sql = "REFRESH materialized view #{schedule['name']}"
28
+ @db[refresh_sql].each do |r|
29
+ Datafusion.log.info("#{TAG}: out: #{r}")
30
+ end
31
+ Datafusion.log.info("#{TAG}: finished run id:#{run}")
32
+ end
33
+ end
34
+ end
@@ -0,0 +1,10 @@
1
+ module Datafusion
2
+ class DebugExecutor
3
+ def initialize(conn)
4
+ end
5
+ def exec(schedule)
6
+ puts "EXECUTE: #{schedule}"
7
+ end
8
+ end
9
+ end
10
+
@@ -0,0 +1,7 @@
1
+ <% if data %>
2
+ -- cached defs
3
+ CREATE materialized view <%= data["name"] %>
4
+ WITH NO DATA
5
+ as <%= data["query"] %>;
6
+ <% end %>
7
+
@@ -1,3 +1,8 @@
1
+ <%
2
+ name = data["name"]
3
+ user = data["user"]
4
+ server = "#{name}_server"
5
+ %>
1
6
  --------------------------------------
2
7
  -- Set up data fusion for:
3
8
  -- name: <%= name %>
@@ -7,9 +12,7 @@
7
12
 
8
13
  CREATE extension if not exists multicorn;
9
14
 
10
- <%
11
- server = "#{name}_server"
12
- %>
15
+
13
16
  -- create server object
14
17
  DROP server if exists <%= server %> CASCADE;
15
18
  CREATE server <%= server %>
@@ -31,4 +34,5 @@ OPTIONS (
31
34
  key '<%= table["key"]%>',
32
35
  list_name '<%= table["list_name"] %>'
33
36
  );
37
+ <%= partial :cached => table["cached"] %>
34
38
  <% end %>
@@ -1,4 +1,8 @@
1
-
1
+ <%
2
+ name = data["name"]
3
+ user = data["user"]
4
+ server = "#{name}_server"
5
+ %>
2
6
  --------------------------------------
3
7
  -- Set up data fusion for:
4
8
  -- name: <%= name %>
@@ -8,11 +12,6 @@
8
12
 
9
13
  CREATE extension if not exists mongo_fdw;
10
14
 
11
-
12
- <%
13
- server = "#{name}_server"
14
- %>
15
-
16
15
  -- create server object
17
16
  DROP server if exists <%= server %> CASCADE;
18
17
  CREATE server <%= server %>
@@ -43,4 +42,5 @@ OPTIONS (
43
42
  database '<%= table["database"] %>',
44
43
  collection '<%= table["collection"] %>'
45
44
  );
45
+ <%= partial :cached => table["cached"] %>
46
46
  <% end %>
@@ -1,3 +1,8 @@
1
+ <%
2
+ name = data["name"]
3
+ user = data["user"]
4
+ server = "#{name}_server"
5
+ %>
1
6
  --------------------------------------
2
7
  -- Set up data fusion for:
3
8
  -- name: <%= name %>
@@ -7,10 +12,6 @@
7
12
 
8
13
  CREATE extension if not exists mysql_fdw;
9
14
 
10
- <%
11
- server = "#{name}_server"
12
- %>
13
-
14
15
  -- create server object
15
16
  DROP server if exists <%= server %> CASCADE;
16
17
  CREATE server <%= server %>
@@ -41,5 +42,6 @@ OPTIONS (
41
42
  dbname '<%= data["server"]["dbname"] %>',
42
43
  table_name '<%= table["table_name"]%>'
43
44
  );
45
+ <%= partial :cached => table["cached"] %>
44
46
  <% end %>
45
47
 
@@ -1,3 +1,8 @@
1
+ <%
2
+ name = data["name"]
3
+ user = data["user"]
4
+ server = "#{name}_server"
5
+ %>
1
6
  --------------------------------------
2
7
  -- Set up data fusion for:
3
8
  -- name: <%= name %>
@@ -6,11 +11,6 @@
6
11
  --------------------------------------
7
12
  CREATE extension if not exists neo4j_fdw;
8
13
 
9
-
10
- <%
11
- server = "#{name}_server"
12
- %>
13
-
14
14
  -- create server object
15
15
  DROP server if exists <%= server %> CASCADE;
16
16
  CREATE SERVER <%= server %>
@@ -31,4 +31,5 @@ SERVER <%= server %>
31
31
  OPTIONS (
32
32
  query '<%= table["query"] %>'
33
33
  );
34
+ <%= partial :cached => table["cached"] %>
34
35
  <% end %>
@@ -1,3 +1,8 @@
1
+ <%
2
+ name = data["name"]
3
+ user = data["user"]
4
+ server = "#{name}_server"
5
+ %>
1
6
  --------------------------------------
2
7
  -- Set up data fusion for:
3
8
  -- name: <%= name %>
@@ -7,9 +12,6 @@
7
12
 
8
13
  CREATE extension if not exists multicorn;
9
14
 
10
- <%
11
- server = "#{name}_server"
12
- %>
13
15
  -- create server object
14
16
  DROP server if exists <%= server %> CASCADE;
15
17
  CREATE server <%= server %>
@@ -32,4 +34,5 @@ OPTIONS (
32
34
  rest_api_key '<%= table["rest_api_key"] %>',
33
35
  class_name '<%= table["class_name"] %>'
34
36
  );
37
+ <%= partial :cached => table["cached"] %>
35
38
  <% end %>
@@ -1,3 +1,8 @@
1
+ <%
2
+ name = data["name"]
3
+ user = data["user"]
4
+ server = "#{name}_server"
5
+ %>
1
6
  --------------------------------------
2
7
  -- Set up data fusion for:
3
8
  -- name: <%= name %>
@@ -6,11 +11,6 @@
6
11
  --------------------------------------
7
12
 
8
13
  CREATE extension if not exists postgres_fdw;
9
-
10
- <%
11
- server = "#{name}_server"
12
- %>
13
-
14
14
  -- create server object
15
15
  DROP server if exists <%= server %> CASCADE;
16
16
  CREATE server <%= server %>
@@ -41,4 +41,5 @@ SERVER <%= server %>
41
41
  OPTIONS (
42
42
  table_name '<%= table["table_name"]%>'
43
43
  );
44
+ <%= partial :cached => table["cached"] %>
44
45
  <% end %>
@@ -1,3 +1,8 @@
1
+ <%
2
+ name = data["name"]
3
+ user = data["user"]
4
+ server = "#{name}_server"
5
+ %>
1
6
  --------------------------------------
2
7
  -- Set up data fusion for:
3
8
  -- name: <%= name %>
@@ -6,11 +11,6 @@
6
11
  --------------------------------------
7
12
  CREATE extension if not exists redis_fdw;
8
13
 
9
-
10
- <%
11
- server = "#{name}_server"
12
- %>
13
-
14
14
  -- create server object
15
15
  DROP server if exists <%= server %> CASCADE;
16
16
  CREATE SERVER <%= server %>
@@ -39,4 +39,5 @@ SERVER <%= server %>
39
39
  OPTIONS (
40
40
  database '<%= table["database"] %>'
41
41
  );
42
+ <%= partial :cached => table["cached"] %>
42
43
  <% end %>
@@ -6,16 +6,23 @@ module Datafusion
6
6
  class SnippetRenderer
7
7
  attr_reader :data, :name, :user
8
8
 
9
- def initialize(user, name, data={})
10
- @erb = ERB.new(File.read(KINDS_PATH.join(data["kind"]+".erb")))
9
+ def initialize(snippet, data={})
10
+ @erb = ERB.new(File.read(KINDS_PATH.join(snippet+".erb")))
11
11
  @data = data
12
- @name = name
13
- @user = user
12
+ if data
13
+ @name = data["name"]
14
+ @user = data["user"]
15
+ end
14
16
  end
15
17
 
16
18
  def render
17
19
  @erb.result(binding)
18
20
  end
21
+
22
+ def partial(desc)
23
+ pname, pdata = desc.first
24
+ SnippetRenderer.new("_#{pname}", pdata).render()
25
+ end
19
26
  end
20
27
  end
21
28
 
@@ -1,3 +1,3 @@
1
1
  module Datafusion
2
- VERSION = "0.0.1"
2
+ VERSION = "0.0.2"
3
3
  end
data/lib/datafusion.rb CHANGED
@@ -1,16 +1,46 @@
1
1
  require "datafusion/version"
2
2
  require "datafusion/integrations"
3
3
  require "datafusion/snippet_renderer"
4
+ require "datafusion/db_executor"
5
+ require "datafusion/debug_executor"
6
+
7
+ require "logger"
8
+ require "rufus-scheduler"
4
9
 
5
10
  module Datafusion
11
+ def self.log
12
+ @log ||= Logger.new(STDOUT)
13
+ @log
14
+ end
15
+
16
+ def self.log=(logger)
17
+ @log = logger
18
+ end
19
+
6
20
  def self.fuse(pguser, file)
7
21
  integs = Integrations.load(file)
8
22
  out = ""
9
23
  integs.each do |k, v|
10
- erb = SnippetRenderer.new(pguser, k, v)
24
+ erb = SnippetRenderer.new(v["kind"], v.merge({"user" => pguser, "name" => k}))
11
25
  out += erb.render()
12
26
  end
13
27
  out
14
28
  end
29
+
30
+ def self.refresh(file, executor)
31
+ integs = Integrations.load(file)
32
+ schedules = integs.map do |k, v|
33
+ v["tables"].map{|t| t["cached"] }.compact
34
+ end.flatten
35
+ Datafusion.log.info("Discovered #{schedules.size} schedule(s).")
36
+
37
+ scheduler = Rufus::Scheduler.new
38
+ schedules.each do |schedule|
39
+ scheduler.every(schedule["refresh"]) do
40
+ executor.exec(schedule)
41
+ end
42
+ end
43
+ scheduler
44
+ end
15
45
  end
16
46
 
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: datafusion
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.0.1
4
+ version: 0.0.2
5
5
  platform: ruby
6
6
  authors:
7
7
  - Dotan Nahum
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2016-01-07 00:00:00.000000000 Z
11
+ date: 2016-01-09 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: slop
@@ -38,6 +38,34 @@ dependencies:
38
38
  - - "~>"
39
39
  - !ruby/object:Gem::Version
40
40
  version: 0.7.7
41
+ - !ruby/object:Gem::Dependency
42
+ name: rufus-scheduler
43
+ requirement: !ruby/object:Gem::Requirement
44
+ requirements:
45
+ - - "~>"
46
+ - !ruby/object:Gem::Version
47
+ version: 3.2.0
48
+ type: :runtime
49
+ prerelease: false
50
+ version_requirements: !ruby/object:Gem::Requirement
51
+ requirements:
52
+ - - "~>"
53
+ - !ruby/object:Gem::Version
54
+ version: 3.2.0
55
+ - !ruby/object:Gem::Dependency
56
+ name: sequel
57
+ requirement: !ruby/object:Gem::Requirement
58
+ requirements:
59
+ - - "~>"
60
+ - !ruby/object:Gem::Version
61
+ version: 4.3.0
62
+ type: :runtime
63
+ prerelease: false
64
+ version_requirements: !ruby/object:Gem::Requirement
65
+ requirements:
66
+ - - "~>"
67
+ - !ruby/object:Gem::Version
68
+ version: 4.3.0
41
69
  - !ruby/object:Gem::Dependency
42
70
  name: bundler
43
71
  requirement: !ruby/object:Gem::Requirement
@@ -97,7 +125,10 @@ files:
97
125
  - bin/datafusion
98
126
  - datafusion.gemspec
99
127
  - lib/datafusion.rb
128
+ - lib/datafusion/db_executor.rb
129
+ - lib/datafusion/debug_executor.rb
100
130
  - lib/datafusion/integrations.rb
131
+ - lib/datafusion/kinds/_cached.erb
101
132
  - lib/datafusion/kinds/mailchimp.erb
102
133
  - lib/datafusion/kinds/mongodb.erb
103
134
  - lib/datafusion/kinds/mysql.erb