datafusion 0.0.1 → 0.0.2

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 2cbf10e291f48bd2ef08847fcc61cd175c9fc5a9
4
- data.tar.gz: 745fbaea6b99886b511f353dd51c08220cfd39f4
3
+ metadata.gz: e053ca5d480cb3a5f873faad4374f958c74f72f8
4
+ data.tar.gz: c99231072632111edeea398f222963dabfd002a2
5
5
  SHA512:
6
- metadata.gz: efc5715f103b9b5713fb535d214fbdaef0106969e09dea2049d98ee54e0c0fa80ce6273e97d994a8c72eccc293fc2f05f5560a307b83b2bdaeb5bc2cf0704a1c
7
- data.tar.gz: 32c38f0c12f011c24415692908f8537c11115748323f202a7ab7337cb9213227f4066de3ac31e33a0401c1e72c20508f086d4a736c195bc83d08bc92ca2788fc
6
+ metadata.gz: b893f1e1661b1f4c3cfa3f731f5be0fd3195e2b68df4669c6080911f4c5f69fe2022fb4dbf32827e7018063be75f8e4032ca14db45a4c93b4a863c9005dce1ac
7
+ data.tar.gz: 15f66d30ac9a16dd484ab2334fd6cca0c9f954f9b9e8b2fecb7ec15f2fca072b8ce08b6d2e7a270d3985e591d3d20c34fde5db7f1673375c1c745a601ac82399
data/README.md CHANGED
@@ -1,59 +1,103 @@
1
- # Mediumize
1
+ # Datafusion
2
2
 
3
- [![Gem Version](https://img.shields.io/gem/v/mediumize.svg)](https://rubygems.org/gems/mediumize)
4
- [![Build Status](https://travis-ci.org/jondot/mediumize.svg?branch=master)](https://travis-ci.org/jondot/mediumize)
3
+ [![Gem Version](https://img.shields.io/gem/v/datafusion.svg)](https://rubygems.org/gems/datafusion)
4
+ [![Build Status](https://travis-ci.org/jondot/datafusion.svg?branch=master)](https://travis-ci.org/jondot/datafusion)
5
5
 
6
- Automatically post (and cross-post) your markdown style blog posts to your [Medium](http://medium.com) account from [Jekyll](http://jekyllrb.com/), [Middleman](middlemanapp.com), [Hugo](http://gohugo.io/) and others.
6
+ Fuse various data from different databases and data sources using Postgres, and generate
7
+ a one-stop-shop for your BI activity with simple SQL.
7
8
 
8
- Mediumize will only publish drafts, and never publicly.
9
9
 
10
10
 
11
- ## Installation
12
11
 
13
- Add this line to your application's Gemfile:
12
+ ## Installation
14
13
 
15
- ```ruby
16
- gem 'mediumize'
14
+ ```
15
+ $ gem install datafusion
17
16
  ```
18
17
 
19
- And then execute:
18
+ ## Usage
20
19
 
21
- $ bundle
20
+ This is the configurator part of Datafusion, which is used internally with the Docker image.
21
+ You can use the docker image directly to get all of the functionality needed in one package.
22
22
 
23
- Or install it yourself as:
23
+ However, if you are composing your own image, or just wanting an easy way to do foreign
24
+ data wrapper, use the instructions below.
24
25
 
25
- $ gem install mediumize
26
26
 
27
- ## Usage
27
+ You should have an `integrations.yaml` file (see below for more).
28
28
 
29
- Either via command line (suitable for manual / Hugo flows):
29
+ ```
30
+ $ datafusion -f integrations.yaml
31
+ :
32
+ : SQL output...
33
+ :
34
+ .
35
+ ```
36
+ The tool will spit out all of the necessary SQL setup code for your database to run.
37
+ You can pipe it to `psql` or capture into a file to run with the `psql -f` command:
30
38
 
31
- $ mediumize -t your-medium-integration-token file1.md file2.md ... fileN.md
39
+ Piping:
32
40
 
33
- Or, integrate it via Ruby into your Jekyll / Middleman flow:
41
+ ```
42
+ $ datafusion -f integrations.yaml | psql -U postgres
43
+ ```
34
44
 
35
- ```ruby
36
- require 'mediumize'
37
- p = Mediumize::Publisher(
38
- :token => "your-medium-integration-token",
39
- :frontmatter => true
40
- )
45
+ With a file:
41
46
 
42
- %w{
43
- file1.md
44
- file2.md
45
- fileN.md
46
- }.each do |file|
47
- puts p.publish(file)
48
- end
49
47
  ```
48
+ $ datafusion -f integrations.yaml > /tmp/script.sql && psql -U postgres -f /tmp/script.sql
49
+ ```
50
+
51
+
52
+ _Not yet implemented_:
53
+
54
+ You can use the -c flag to provide a connection in the form of a url to a `postgres`
55
+ database:
50
56
 
51
- ## Development
57
+ ```
58
+ $ datafusion -f integrations.yaml -c posgres://postgres:pass@localhost:5432/mydb
59
+ ```
60
+
61
+ ## Integrations.yaml
62
+
63
+ This tool uses a special specification for data sources, typically in a file called
64
+ `integrations.yaml`. Here is an example:
65
+
66
+ ```yaml
67
+ postgres1:
68
+ kind: postgres
69
+ server:
70
+ address: localhost
71
+ port: 5432
72
+ username: u1
73
+ password: p1
74
+ dbname: users
75
+ tables:
76
+ - name: ware1
77
+ table_name: registrations
78
+ mapping:
79
+ id: TEXT
80
+ warehouse_id: TEXT
81
+ mysql1:
82
+ kind: mysql
83
+ server:
84
+ address: localhost
85
+ port: 3306
86
+ username: u1
87
+ password: p1
88
+ dbname: users
89
+ tables:
90
+ - name: ware1
91
+ table_name: registrations
92
+ mapping:
93
+ id: TEXT
94
+ warehouse_id: TEXT
95
+ ```
52
96
 
53
- 1. `git clone https://github.com/jondot/mediumize && cd mediumize`
54
- 2. `bundle`
55
- 3. `rake test`
56
- 4. Optionally, use guard
97
+ The idea is to specify your databases or data source in a human-readable way once,
98
+ and have that parsed by datafusion and set up a `postgres` instance to be able to
99
+ integrate with them and give you the ability to fuse and dissect your data across
100
+ sources.
57
101
 
58
102
 
59
103
  # Contributing
@@ -62,7 +106,7 @@ Fork, implement, add tests, pull request, get my everlasting thanks and a respec
62
106
 
63
107
  ### Thanks:
64
108
 
65
- To all [contributors](https://github.com/jondot/mediumize/graphs/contributors)
109
+ To all [contributors](https://github.com/jondot/datafusion/graphs/contributors)
66
110
 
67
111
  # Copyright
68
112
 
data/bin/datafusion CHANGED
@@ -14,11 +14,12 @@ end
14
14
  # $ datafusion --fuse integrations.yml
15
15
  # $ datafusion --agent
16
16
  #
17
- begin
18
17
  o = Slop::Options.new
19
18
  o.string '-f', '--fuse', ''
20
19
  o.string '-u', '--user', '', default: 'postgres'
21
- o.bool '-a', '--agent', '', default: false
20
+ o.string '-a', '--agent', 'Connection string (i.e postgres://localhost)', default: ""
21
+ o.bool '-d', '--dryrun', 'dry run for refreshes', default: false
22
+
22
23
  o.on '--version', 'print the version' do
23
24
  puts Datafusion::VERSION
24
25
  exit
@@ -29,15 +30,22 @@ begin
29
30
  end
30
31
  opts = Slop::Parser.new(o).parse(ARGV)
31
32
 
32
- # if agent..
33
-
34
-
35
- if opts[:fuse] && !File.exist?(opts[:fuse])
36
- bail "Error: please provide a file to fuse", opts
33
+ if opts[:fuse] && opts[:agent].empty?
34
+ if File.exist?(opts[:fuse])
35
+ puts Datafusion.fuse(opts[:user], opts[:fuse])
36
+ else
37
+ bail "Error: please provide a file to fuse", opts
38
+ end
39
+ elsif opts[:fuse] && opts[:agent]
40
+
41
+ exec_class = Datafusion::DebugExecutor
42
+ unless opts[:dryrun]
43
+ exec_class = Datafusion::DbExecutor
44
+ end
45
+ exec = exec_class.new(opts[:agent])
46
+ sched = Datafusion.refresh(opts[:fuse], exec)
47
+ Datafusion.log.info("Running refresh agent.")
48
+ sched.join
37
49
  end
38
- puts Datafusion.fuse(opts[:user], opts[:fuse])
39
50
 
40
51
 
41
- rescue
42
- bail "Error: #{$!}", o
43
- end
data/datafusion.gemspec CHANGED
@@ -21,6 +21,8 @@ Gem::Specification.new do |spec|
21
21
 
22
22
  spec.add_dependency 'slop', '~> 4.2.1'
23
23
  spec.add_dependency 'colorize', '~> 0.7.7'
24
+ spec.add_dependency 'rufus-scheduler', '~> 3.2.0'
25
+ spec.add_dependency 'sequel', '~> 4.3.0'
24
26
 
25
27
  spec.add_development_dependency "bundler", "~> 1.10"
26
28
  spec.add_development_dependency "rake", "~> 10.0"
@@ -0,0 +1,34 @@
1
+ require 'sequel'
2
+
3
+ module Datafusion
4
+ class DbExecutor
5
+ TAG = "DBEXECUTOR"
6
+
7
+ def initialize(conn)
8
+ @db = Sequel.connect(conn)
9
+ end
10
+ def exec(schedule)
11
+ #
12
+ # TODO use refresh [..] concurrently
13
+ #
14
+ # This means we also need to define a unique index per materialized
15
+ # view so that PG will know how to use MVCC.
16
+ #
17
+ # This needs some code to detect:
18
+ # 1. At setup time - when an index is already there, don't add it.
19
+ # 2. At refresh time - if a table doesn't have any data, it cannot be
20
+ # refreshed with concurrently - it needs a normal refresh first.
21
+ #
22
+ # For now we refresh and block.
23
+ #
24
+ run = rand(36**5).to_s(36)
25
+
26
+ Datafusion.log.info("#{TAG}: starting run id:#{run} for #{schedule}")
27
+ refresh_sql = "REFRESH materialized view #{schedule['name']}"
28
+ @db[refresh_sql].each do |r|
29
+ Datafusion.log.info("#{TAG}: out: #{r}")
30
+ end
31
+ Datafusion.log.info("#{TAG}: finished run id:#{run}")
32
+ end
33
+ end
34
+ end
@@ -0,0 +1,10 @@
1
+ module Datafusion
2
+ class DebugExecutor
3
+ def initialize(conn)
4
+ end
5
+ def exec(schedule)
6
+ puts "EXECUTE: #{schedule}"
7
+ end
8
+ end
9
+ end
10
+
@@ -0,0 +1,7 @@
1
+ <% if data %>
2
+ -- cached defs
3
+ CREATE materialized view <%= data["name"] %>
4
+ WITH NO DATA
5
+ as <%= data["query"] %>;
6
+ <% end %>
7
+
@@ -1,3 +1,8 @@
1
+ <%
2
+ name = data["name"]
3
+ user = data["user"]
4
+ server = "#{name}_server"
5
+ %>
1
6
  --------------------------------------
2
7
  -- Set up data fusion for:
3
8
  -- name: <%= name %>
@@ -7,9 +12,7 @@
7
12
 
8
13
  CREATE extension if not exists multicorn;
9
14
 
10
- <%
11
- server = "#{name}_server"
12
- %>
15
+
13
16
  -- create server object
14
17
  DROP server if exists <%= server %> CASCADE;
15
18
  CREATE server <%= server %>
@@ -31,4 +34,5 @@ OPTIONS (
31
34
  key '<%= table["key"]%>',
32
35
  list_name '<%= table["list_name"] %>'
33
36
  );
37
+ <%= partial :cached => table["cached"] %>
34
38
  <% end %>
@@ -1,4 +1,8 @@
1
-
1
+ <%
2
+ name = data["name"]
3
+ user = data["user"]
4
+ server = "#{name}_server"
5
+ %>
2
6
  --------------------------------------
3
7
  -- Set up data fusion for:
4
8
  -- name: <%= name %>
@@ -8,11 +12,6 @@
8
12
 
9
13
  CREATE extension if not exists mongo_fdw;
10
14
 
11
-
12
- <%
13
- server = "#{name}_server"
14
- %>
15
-
16
15
  -- create server object
17
16
  DROP server if exists <%= server %> CASCADE;
18
17
  CREATE server <%= server %>
@@ -43,4 +42,5 @@ OPTIONS (
43
42
  database '<%= table["database"] %>',
44
43
  collection '<%= table["collection"] %>'
45
44
  );
45
+ <%= partial :cached => table["cached"] %>
46
46
  <% end %>
@@ -1,3 +1,8 @@
1
+ <%
2
+ name = data["name"]
3
+ user = data["user"]
4
+ server = "#{name}_server"
5
+ %>
1
6
  --------------------------------------
2
7
  -- Set up data fusion for:
3
8
  -- name: <%= name %>
@@ -7,10 +12,6 @@
7
12
 
8
13
  CREATE extension if not exists mysql_fdw;
9
14
 
10
- <%
11
- server = "#{name}_server"
12
- %>
13
-
14
15
  -- create server object
15
16
  DROP server if exists <%= server %> CASCADE;
16
17
  CREATE server <%= server %>
@@ -41,5 +42,6 @@ OPTIONS (
41
42
  dbname '<%= data["server"]["dbname"] %>',
42
43
  table_name '<%= table["table_name"]%>'
43
44
  );
45
+ <%= partial :cached => table["cached"] %>
44
46
  <% end %>
45
47
 
@@ -1,3 +1,8 @@
1
+ <%
2
+ name = data["name"]
3
+ user = data["user"]
4
+ server = "#{name}_server"
5
+ %>
1
6
  --------------------------------------
2
7
  -- Set up data fusion for:
3
8
  -- name: <%= name %>
@@ -6,11 +11,6 @@
6
11
  --------------------------------------
7
12
  CREATE extension if not exists neo4j_fdw;
8
13
 
9
-
10
- <%
11
- server = "#{name}_server"
12
- %>
13
-
14
14
  -- create server object
15
15
  DROP server if exists <%= server %> CASCADE;
16
16
  CREATE SERVER <%= server %>
@@ -31,4 +31,5 @@ SERVER <%= server %>
31
31
  OPTIONS (
32
32
  query '<%= table["query"] %>'
33
33
  );
34
+ <%= partial :cached => table["cached"] %>
34
35
  <% end %>
@@ -1,3 +1,8 @@
1
+ <%
2
+ name = data["name"]
3
+ user = data["user"]
4
+ server = "#{name}_server"
5
+ %>
1
6
  --------------------------------------
2
7
  -- Set up data fusion for:
3
8
  -- name: <%= name %>
@@ -7,9 +12,6 @@
7
12
 
8
13
  CREATE extension if not exists multicorn;
9
14
 
10
- <%
11
- server = "#{name}_server"
12
- %>
13
15
  -- create server object
14
16
  DROP server if exists <%= server %> CASCADE;
15
17
  CREATE server <%= server %>
@@ -32,4 +34,5 @@ OPTIONS (
32
34
  rest_api_key '<%= table["rest_api_key"] %>',
33
35
  class_name '<%= table["class_name"] %>'
34
36
  );
37
+ <%= partial :cached => table["cached"] %>
35
38
  <% end %>
@@ -1,3 +1,8 @@
1
+ <%
2
+ name = data["name"]
3
+ user = data["user"]
4
+ server = "#{name}_server"
5
+ %>
1
6
  --------------------------------------
2
7
  -- Set up data fusion for:
3
8
  -- name: <%= name %>
@@ -6,11 +11,6 @@
6
11
  --------------------------------------
7
12
 
8
13
  CREATE extension if not exists postgres_fdw;
9
-
10
- <%
11
- server = "#{name}_server"
12
- %>
13
-
14
14
  -- create server object
15
15
  DROP server if exists <%= server %> CASCADE;
16
16
  CREATE server <%= server %>
@@ -41,4 +41,5 @@ SERVER <%= server %>
41
41
  OPTIONS (
42
42
  table_name '<%= table["table_name"]%>'
43
43
  );
44
+ <%= partial :cached => table["cached"] %>
44
45
  <% end %>
@@ -1,3 +1,8 @@
1
+ <%
2
+ name = data["name"]
3
+ user = data["user"]
4
+ server = "#{name}_server"
5
+ %>
1
6
  --------------------------------------
2
7
  -- Set up data fusion for:
3
8
  -- name: <%= name %>
@@ -6,11 +11,6 @@
6
11
  --------------------------------------
7
12
  CREATE extension if not exists redis_fdw;
8
13
 
9
-
10
- <%
11
- server = "#{name}_server"
12
- %>
13
-
14
14
  -- create server object
15
15
  DROP server if exists <%= server %> CASCADE;
16
16
  CREATE SERVER <%= server %>
@@ -39,4 +39,5 @@ SERVER <%= server %>
39
39
  OPTIONS (
40
40
  database '<%= table["database"] %>'
41
41
  );
42
+ <%= partial :cached => table["cached"] %>
42
43
  <% end %>
@@ -6,16 +6,23 @@ module Datafusion
6
6
  class SnippetRenderer
7
7
  attr_reader :data, :name, :user
8
8
 
9
- def initialize(user, name, data={})
10
- @erb = ERB.new(File.read(KINDS_PATH.join(data["kind"]+".erb")))
9
+ def initialize(snippet, data={})
10
+ @erb = ERB.new(File.read(KINDS_PATH.join(snippet+".erb")))
11
11
  @data = data
12
- @name = name
13
- @user = user
12
+ if data
13
+ @name = data["name"]
14
+ @user = data["user"]
15
+ end
14
16
  end
15
17
 
16
18
  def render
17
19
  @erb.result(binding)
18
20
  end
21
+
22
+ def partial(desc)
23
+ pname, pdata = desc.first
24
+ SnippetRenderer.new("_#{pname}", pdata).render()
25
+ end
19
26
  end
20
27
  end
21
28
 
@@ -1,3 +1,3 @@
1
1
  module Datafusion
2
- VERSION = "0.0.1"
2
+ VERSION = "0.0.2"
3
3
  end
data/lib/datafusion.rb CHANGED
@@ -1,16 +1,46 @@
1
1
  require "datafusion/version"
2
2
  require "datafusion/integrations"
3
3
  require "datafusion/snippet_renderer"
4
+ require "datafusion/db_executor"
5
+ require "datafusion/debug_executor"
6
+
7
+ require "logger"
8
+ require "rufus-scheduler"
4
9
 
5
10
  module Datafusion
11
+ def self.log
12
+ @log ||= Logger.new(STDOUT)
13
+ @log
14
+ end
15
+
16
+ def self.log=(logger)
17
+ @log = logger
18
+ end
19
+
6
20
  def self.fuse(pguser, file)
7
21
  integs = Integrations.load(file)
8
22
  out = ""
9
23
  integs.each do |k, v|
10
- erb = SnippetRenderer.new(pguser, k, v)
24
+ erb = SnippetRenderer.new(v["kind"], v.merge({"user" => pguser, "name" => k}))
11
25
  out += erb.render()
12
26
  end
13
27
  out
14
28
  end
29
+
30
+ def self.refresh(file, executor)
31
+ integs = Integrations.load(file)
32
+ schedules = integs.map do |k, v|
33
+ v["tables"].map{|t| t["cached"] }.compact
34
+ end.flatten
35
+ Datafusion.log.info("Discovered #{schedules.size} schedule(s).")
36
+
37
+ scheduler = Rufus::Scheduler.new
38
+ schedules.each do |schedule|
39
+ scheduler.every(schedule["refresh"]) do
40
+ executor.exec(schedule)
41
+ end
42
+ end
43
+ scheduler
44
+ end
15
45
  end
16
46
 
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: datafusion
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.0.1
4
+ version: 0.0.2
5
5
  platform: ruby
6
6
  authors:
7
7
  - Dotan Nahum
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2016-01-07 00:00:00.000000000 Z
11
+ date: 2016-01-09 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: slop
@@ -38,6 +38,34 @@ dependencies:
38
38
  - - "~>"
39
39
  - !ruby/object:Gem::Version
40
40
  version: 0.7.7
41
+ - !ruby/object:Gem::Dependency
42
+ name: rufus-scheduler
43
+ requirement: !ruby/object:Gem::Requirement
44
+ requirements:
45
+ - - "~>"
46
+ - !ruby/object:Gem::Version
47
+ version: 3.2.0
48
+ type: :runtime
49
+ prerelease: false
50
+ version_requirements: !ruby/object:Gem::Requirement
51
+ requirements:
52
+ - - "~>"
53
+ - !ruby/object:Gem::Version
54
+ version: 3.2.0
55
+ - !ruby/object:Gem::Dependency
56
+ name: sequel
57
+ requirement: !ruby/object:Gem::Requirement
58
+ requirements:
59
+ - - "~>"
60
+ - !ruby/object:Gem::Version
61
+ version: 4.3.0
62
+ type: :runtime
63
+ prerelease: false
64
+ version_requirements: !ruby/object:Gem::Requirement
65
+ requirements:
66
+ - - "~>"
67
+ - !ruby/object:Gem::Version
68
+ version: 4.3.0
41
69
  - !ruby/object:Gem::Dependency
42
70
  name: bundler
43
71
  requirement: !ruby/object:Gem::Requirement
@@ -97,7 +125,10 @@ files:
97
125
  - bin/datafusion
98
126
  - datafusion.gemspec
99
127
  - lib/datafusion.rb
128
+ - lib/datafusion/db_executor.rb
129
+ - lib/datafusion/debug_executor.rb
100
130
  - lib/datafusion/integrations.rb
131
+ - lib/datafusion/kinds/_cached.erb
101
132
  - lib/datafusion/kinds/mailchimp.erb
102
133
  - lib/datafusion/kinds/mongodb.erb
103
134
  - lib/datafusion/kinds/mysql.erb