fluent-plugin-aliyun-odps 0.1.2 → 0.1.3
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/.gitignore +1 -0
- data/CHANGELOG.md +5 -3
- data/README.cn.md +163 -87
- data/README.md +3 -3
- data/VERSION +1 -1
- data/build.sh +10 -0
- data/example.conf +26 -0
- data/fluent-plugin-aliyun-odps.gemspec +5 -6
- data/lib/fluent/plugin/http/http_flag.rb +1 -1
- data/lib/fluent/plugin/out_aliyun_odps.rb +54 -45
- metadata +28 -18
- data/lib/fluent/plugin/crc/src/extconf.rb +0 -3
- data/odps_example.conf +0 -31
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA1:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: 5b96ab3b2194318e749ee184cd5bdf8bb4a4875c
|
|
4
|
+
data.tar.gz: a846f80a43b87c5551491a5c8e4c3d387b8e003f
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: 592b3a66fa676ecd783e2e999c82a6757d1a387fe3a0731c5f195e8d44ba2065f55df0b90535611040ff6bbdfda0449d4d16f9c6e6c61db6153e8b7fe5076403
|
|
7
|
+
data.tar.gz: 56164832ec7b335daf343ba0a02a464e89830028dbad1430177fa19bf153177e31d81938da81708fb59a18e1374249cd013451061c8594f528b651a9fa4dcb80
|
data/.gitignore
CHANGED
data/CHANGELOG.md
CHANGED
|
@@ -3,9 +3,9 @@ Fix datetime format bug, support String, DateTime, Time type when write to a dat
|
|
|
3
3
|
0.0.5
|
|
4
4
|
Add reload shard when import fails, and remove unload shard operation when shut down.
|
|
5
5
|
0.0.6
|
|
6
|
-
Add decimal support
|
|
6
|
+
Add decimal support,fix string input while setting double and int.
|
|
7
7
|
0.0.7
|
|
8
|
-
Add error msg when add partition fail, support fast crc
|
|
8
|
+
Add error msg when add partition fail, support fast crc, remove pack size limit.
|
|
9
9
|
0.0.8
|
|
10
10
|
Add abandon mode, fix fluent retry bug, fix partition mixed mode bug.
|
|
11
11
|
0.0.9
|
|
@@ -15,4 +15,6 @@ Add partition when catch NoSuchPartition.
|
|
|
15
15
|
0.1.1
|
|
16
16
|
Fix some log format.
|
|
17
17
|
0.1.2
|
|
18
|
-
Use XStreamPack.
|
|
18
|
+
Use XStreamPack.
|
|
19
|
+
0.1.3
|
|
20
|
+
Drop record with error log when parse partition failed.
|
data/README.cn.md
CHANGED
|
@@ -1,47 +1,122 @@
|
|
|
1
1
|
# Aliyun ODPS Plugin for Fluentd
|
|
2
2
|
|
|
3
|
-
##
|
|
3
|
+
## 开始使用
|
|
4
4
|
---
|
|
5
5
|
|
|
6
|
-
###
|
|
6
|
+
### 介绍
|
|
7
7
|
|
|
8
|
-
-
|
|
9
|
-
- ODPS DataHub Service(DHS)
|
|
8
|
+
- 开放数据处理服务(Open Data Processing Service,简称ODPS)是阿里巴巴自主研发的海量数据处理平台。主要服务于批量结构化数据的存储和计算,可以提供海量数据仓库的解决方案以及针对大数据的分析建模服务。
|
|
9
|
+
- ODPS DataHub Service(DHS)是一个ODPS的内建服务,向用户提供实时数据的发布(Publish)和订阅(Subscribe)的功能。发布的数据会自动被写入ODPS表中。所以DHS也可以做为ODPS导入数据的一个入口。
|
|
10
|
+
- 本插件提供向odps表通过DataHub服务写入数据的能力,并具备按用户要求的格式自动创建分区的功能。
|
|
10
11
|
|
|
11
12
|
|
|
12
|
-
###
|
|
13
|
+
### 环境要求
|
|
13
14
|
|
|
14
|
-
|
|
15
|
+
使用此插件,需要具备如下环境:
|
|
15
16
|
|
|
16
|
-
1. Ruby 2.1.0
|
|
17
|
-
2. Gem 2.4.5
|
|
18
|
-
3. Fluentd-0.10.49
|
|
19
|
-
4. Protobuf-3.5.1
|
|
17
|
+
1. Ruby 2.1.0 或更新
|
|
18
|
+
2. Gem 2.4.5 或更新
|
|
19
|
+
3. Fluentd-0.10.49 或更新 (*[Home Page](http://www.fluentd.org/)*)
|
|
20
|
+
4. Protobuf-3.5.1 或更新(Ruby protobuf)
|
|
20
21
|
5. Ruby-devel
|
|
21
22
|
|
|
22
|
-
###
|
|
23
|
+
### 安装部署
|
|
24
|
+
安装部署Fluentd可以选择以下两种方式之一。
|
|
25
|
+
1. 一键安装包适用于第一次安装Ruby&Fluentd环境的用户或局域网用户,一键安装包包含了所需的Ruby环境以及Fluentd。目前一键安装包仅支持Linux环境。
|
|
26
|
+
2. 通过网络安装适用于对Ruby有了解的用户,需要提前确认Ruby版本,若低于2.1.0则需要升级或安装更高级的Ruby环境,然后通过RubyGem安装Fluentd。
|
|
23
27
|
|
|
24
|
-
|
|
28
|
+
注:
|
|
29
|
+
* RubyGem源建议更改为https://ruby.taobao.org/
|
|
30
|
+
* 局域网环境安装可以通过本地安装Gem文件
|
|
31
|
+
```
|
|
32
|
+
gem install --local fluent-plugin-aliyun-odps-0.1.2.gem
|
|
33
|
+
```
|
|
25
34
|
|
|
35
|
+
#### 安装方式一:一键安装包安装
|
|
36
|
+
1. 下载解压 [fluentd_package.tar.gz](http://gitlab.alibaba-inc.com/aliopensource/aliyun-odps-fluentd-plugin/blob/master/package/fluentd_package.tar.gz)
|
|
37
|
+
2. 可以修改install_agent.sh中$DIR为你想安装ruby的路径,默认会安装在当前路径下面
|
|
38
|
+
3. 执行如下命令,提示“Success”表示安装成功
|
|
26
39
|
```
|
|
27
|
-
|
|
40
|
+
bash install_agent.sh
|
|
28
41
|
```
|
|
42
|
+
4. fluentd程序会被安装在当前目录的bin目录下面
|
|
29
43
|
|
|
30
|
-
|
|
44
|
+
#### 安装方式二:通过网络安装
|
|
45
|
+
1. Ruby安装(已经存在Ruby 2.1.0以上环境可忽略此步骤):
|
|
46
|
+
```
|
|
47
|
+
wget https://cache.ruby-lang.org/pub/ruby/2.3/ruby-2.3.0.tar.gz
|
|
48
|
+
tar xzvf ruby-2.3.0.tar.gz
|
|
49
|
+
cd ruby-2.3.0
|
|
50
|
+
./configure --prefix=DIR
|
|
51
|
+
make
|
|
52
|
+
make install
|
|
53
|
+
```
|
|
54
|
+
2 Fluentd以及插件安装
|
|
55
|
+
```
|
|
56
|
+
$ gem install fluent-plugin-aliyun-odps
|
|
57
|
+
```
|
|
31
58
|
|
|
59
|
+
### 插件使用示例
|
|
60
|
+
#### 示例一 上传csv文件中的数据
|
|
61
|
+
1. 首先需要在odps准备一张表,在这里假设表名为 students, 包含三个字段 id, name, score, 类型分别为string, string, bigint
|
|
62
|
+
2. 准备csv数据文件, 假设数据文件内容如下
|
|
63
|
+
```
|
|
64
|
+
1, jack ma, 90
|
|
65
|
+
2, pony zhang, 85
|
|
66
|
+
3, lucy wang, 88
|
|
67
|
+
```
|
|
68
|
+
3. 准备fluentd配置文件, 保存以下内容为文件fluentd.conf。
|
|
32
69
|
```
|
|
33
70
|
<source>
|
|
34
71
|
type tail
|
|
35
|
-
path /
|
|
36
|
-
|
|
37
|
-
|
|
38
|
-
tag in.log
|
|
39
|
-
format /^(?<remote>[^ ]*) - - \[(?<datetime>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^\"]*?)(?: +\S*)?)?" (?<code>[^ ]*) (?<size>[^ ]*) "-" "(?<agent>[^\"]*)"$/
|
|
40
|
-
time_format %Y%b%d %H:%M:%S %z
|
|
72
|
+
path /path/to/students.csv
|
|
73
|
+
tag input.csv
|
|
74
|
+
format csv
|
|
41
75
|
</source>
|
|
76
|
+
<match input.*>
|
|
77
|
+
type aliyun_odps
|
|
78
|
+
aliyun_access_id ************
|
|
79
|
+
aliyun_access_key *********
|
|
80
|
+
aliyun_odps_endpoint http://service.odps.aliyun.com/api
|
|
81
|
+
aliyun_odps_hub_endpoint http://dh.odps.aliyun.com
|
|
82
|
+
buffer_chunk_limit 2m
|
|
83
|
+
buffer_queue_limit 128
|
|
84
|
+
flush_interval 5s
|
|
85
|
+
project your_projectName #填写需要导入数据的project名称
|
|
86
|
+
enable_fast_crc true
|
|
87
|
+
<table input.csv>
|
|
88
|
+
table students
|
|
89
|
+
fields id,name,score
|
|
90
|
+
shard_number 1
|
|
91
|
+
retry_time 3
|
|
92
|
+
retry_interval 1
|
|
93
|
+
abandon_mode true
|
|
94
|
+
</table>
|
|
95
|
+
</match>
|
|
96
|
+
```
|
|
97
|
+
4. 执行fluentd命令,并用-c指定配置文件
|
|
98
|
+
```
|
|
99
|
+
fluentd -c fluentd.conf
|
|
100
|
+
```
|
|
101
|
+
5. 完成后可用如下sql命令查询数据
|
|
42
102
|
```
|
|
103
|
+
select * from students;
|
|
43
104
|
```
|
|
44
|
-
|
|
105
|
+
|
|
106
|
+
#### 示例二 抓取上传实时nginx日志文件
|
|
107
|
+
1. 对于nginx日志文件,fluentd可用采用正则表达式的方式来解析数据。
|
|
108
|
+
2. 参考使用如下配置文件,执行命令同示例一。
|
|
109
|
+
```
|
|
110
|
+
<source>
|
|
111
|
+
type tail
|
|
112
|
+
path /home/admin/nginx/logs/access.log #nginx log 地址
|
|
113
|
+
pos_file /tmp/nginx.access.pos
|
|
114
|
+
refresh_interval 5s
|
|
115
|
+
tag nginx.access
|
|
116
|
+
format /^(?<remote>[^ ]*) - \[(?<dt>[^\]]*)\] "(?<method>\S+) ((?<path>[^\"]*) +\S*)?" (?<code>[^ ]*) (?<size>[^ ]*) "(?<agent>[^\"]*)" "(?<requesttime>[^\"]*)"? $/ #解析日志的正则表达式
|
|
117
|
+
time_format %d/%b/%Y:%H:%M:%S %z
|
|
118
|
+
</source>
|
|
119
|
+
<match nginx.access>
|
|
45
120
|
type aliyun_odps
|
|
46
121
|
aliyun_access_id ************
|
|
47
122
|
aliyun_access_key *********
|
|
@@ -52,9 +127,10 @@ $ gem install fluent-plugin-aliyun-odps
|
|
|
52
127
|
flush_interval 5s
|
|
53
128
|
project your_projectName
|
|
54
129
|
enable_fast_crc true
|
|
55
|
-
<table
|
|
56
|
-
|
|
57
|
-
|
|
130
|
+
<table nginx.access>
|
|
131
|
+
table nginx_logs #对应日志写入的odps表
|
|
132
|
+
fields remote,method,path,code,size,agent,requesttime
|
|
133
|
+
shard_number 5
|
|
58
134
|
partition ctime=${datetime.strftime('%Y%m%d')}
|
|
59
135
|
time_format %d/%b/%Y:%H:%M:%S %z
|
|
60
136
|
shard_number 1
|
|
@@ -64,68 +140,68 @@ $ gem install fluent-plugin-aliyun-odps
|
|
|
64
140
|
</table>
|
|
65
141
|
</match>
|
|
66
142
|
```
|
|
67
|
-
### ����˵��
|
|
68
|
-
|
|
69
|
-
- type(Fixed): �̶�ֵ aliyun_odps.
|
|
70
|
-
- aliyun_access_id(Required):������access_id.
|
|
71
|
-
- aliyun_access_key(Required):������access key.
|
|
72
|
-
- aliyun_odps_hub_endpoint(Required):�����ķ�������ECS�ϣ���ѱ�ֵ�趨Ϊ http://dh-ext.odps.aliyun-inc.com, ��������Ϊ http://dh.odps.aliyun.com.
|
|
73
|
-
- aliyunodps_endpoint(Required):�����ķ�������ECS�ϣ���ѱ�ֵ�趨Ϊ http://odps-ext.aiyun-inc.com/api, ��������Ϊ http://service.odps.aliyun.com/api .
|
|
74
|
-
- buffer_chunk_limit(Optional): ���С��֧�֡�k��(KB),��m��(MB)��λ��Ĭ�� 8MB������ֵ2MB, Ŀǰ���֧��20MB.
|
|
75
|
-
- buffer_queue_limit(Optional): ����д�С����ֵ��buffer_chunk_limit��ͬ����������������С��
|
|
76
|
-
- flush_interval(Optional): ǿ�Ʒ��ͼ�����ﵽʱ��������δ����ǿ�Ʒ���, Ĭ�� 60s.
|
|
77
|
-
- abandon_mode(Optional):�����������κ�������pack���ݡ�
|
|
78
|
-
- project(Required): project����.
|
|
79
|
-
- table(Required): table����.
|
|
80
|
-
- fields(Required): ��source��Ӧ���ֶ������������source֮��.
|
|
81
|
-
- partition(Optional)����Ϊ�������������ô���.
|
|
82
|
-
- ������֧�ֵ�����ģʽ:
|
|
83
|
-
- �̶�ֵ: partition ctime=20150804
|
|
84
|
-
- �ؼ���: partition ctime=${remote} ������remoteΪsource��ij�ֶΣ�
|
|
85
|
-
- ʱ���ʽ�ؼ���: partition ctime=${datetime.strftime('%Y%m%d')} ������datetimeΪsource��ijʱ���ʽ�ֶΣ����Ϊ%Y%m%d��ʽ��Ϊ�������ƣ�
|
|
86
|
-
- time_format(Optional):
|
|
87
|
-
- ���ʹ��ʱ���ʽ�ؼ���Ϊ<partition>, �����ñ�����. ����: source[datetime]="29/Aug/2015:11:10:16 +0800",������<time_format>Ϊ"%d/%b/%Y:%H:%M:%S %z"
|
|
88
|
-
- shard_number(Optional):ָ��shard���������������shard[0,shard_number-1]��Χ�ڵ�shard��д�����ݣ�����Ϊ����0��С��table��Ӧshard������������.
|
|
89
|
-
- enable_fast_crc(Optional): ʹ�ÿ���crc���㣬�⽫�����������ܣ���������ʹ�����ⲿ���صĶ�̬���ӿ⣬Ŀǰ��֧��64λlinux��windowsϵͳ.
|
|
90
|
-
- retry_time(Optional): ����ÿ��pack����ʱ�������Դ�����Ĭ��3��.
|
|
91
|
-
- retry_interval(Optional): ���Լ����Ĭ��1s.
|
|
92
|
-
- abandon_mode(Optional): Ĭ��Ϊfalse�����ó�true��������retry_time�����������ݰ�������Ὣ�쳣����fluentd������fluentd�����Ի������ԣ�����������ܻᵼ�������ظ�.
|
|
93
|
-
|
|
94
|
-
## �ٷ���վ
|
|
95
|
-
---
|
|
96
|
-
|
|
97
|
-
- [Fluentd User Guide](http://docs.fluentd.org/)
|
|
98
|
-
|
|
99
|
-
## ����
|
|
100
|
-
---
|
|
101
143
|
|
|
102
|
-
|
|
103
|
-
|
|
104
|
-
|
|
105
|
-
|
|
106
|
-
|
|
107
|
-
|
|
108
|
-
|
|
109
|
-
|
|
110
|
-
|
|
111
|
-
|
|
112
|
-
|
|
144
|
+
#### 示例三导入MySQL中的数据
|
|
145
|
+
1. mysql
|
|
146
|
+
|
|
147
|
+
### 参数说明
|
|
148
|
+
|
|
149
|
+
- type(Fixed): 固定值 aliyun_odps.
|
|
150
|
+
- aliyun_access_id(Required):阿里云access_id.
|
|
151
|
+
- aliyun_access_key(Required):阿里云access key.
|
|
152
|
+
- aliyun_odps_hub_endpoint(Required):如果你的服务部署在ECS上,请把本值设定为 http://dh-ext.odps.aliyun-inc.com, 否则设置为 http://dh.odps.aliyun.com.
|
|
153
|
+
- aliyunodps_endpoint(Required):如果你的服务部署在ECS上,请把本值设定为 http://odps-ext.aiyun-inc.com/api, 否则设置为 http://service.odps.aliyun.com/api .
|
|
154
|
+
- buffer_chunk_limit(Optional): 块大小,支持“k”(KB),“m”(MB)单位,默认 8MB,建议值2MB, 目前最大支持20MB.
|
|
155
|
+
- buffer_queue_limit(Optional): 块队列大小,此值与buffer_chunk_limit共同决定整个缓冲区大小。
|
|
156
|
+
- flush_interval(Optional): 强制发送间隔,达到时间后块数据未满则强制发送, 默认 60s.
|
|
157
|
+
- abandon_mode(Optional):内置重试三次后抛弃该pack数据。
|
|
158
|
+
- project(Required): project名称.
|
|
159
|
+
- table(Required): table名称.
|
|
160
|
+
- fields(Required): 与source对应,字段名必须存在于source之中.
|
|
161
|
+
- partition(Optional):若为分区表,则设置此项.
|
|
162
|
+
- 分区名支持的设置模式:
|
|
163
|
+
- 固定值: partition ctime=20150804
|
|
164
|
+
- 关键字: partition ctime=${remote} (其中remote为source中某字段)
|
|
165
|
+
- 时间格式关键字: partition ctime=${datetime.strftime('%Y%m%d')} (其中datetime为source中某时间格式字段,输出为%Y%m%d格式作为分区名称)
|
|
166
|
+
- time_format(Optional):
|
|
167
|
+
- 如果使用时间格式关键字为<partition>, 请设置本参数. 例如: source[datetime]="29/Aug/2015:11:10:16 +0800",则设置<time_format>为"%d/%b/%Y:%H:%M:%S %z"
|
|
168
|
+
- shard_number(Optional):指定shard数量,将会随机向shard[0,shard_number-1]范围内的shard中写入数据,必须为大于0且小于table对应shard数量上限的整数.
|
|
169
|
+
- enable_fast_crc(Optional): 使用快速crc计算,这将极大提升性能,但是由于使用了外部加载的动态链接库,目前仅支持64位linux、windows系统.
|
|
170
|
+
- retry_time(Optional): 发送每个pack数据时内置重试次数,默认3次.
|
|
171
|
+
- retry_interval(Optional): 重试间隔,默认1s.
|
|
172
|
+
- abandon_mode(Optional): 默认为false,设置成true会在重试retry_time后抛弃该数据包,否则会将异常抛送给fluentd,利用fluentd的重试机制重试,这种情况可能会导致数据重复.
|
|
173
|
+
|
|
174
|
+
## 常见使用问题以及异常描述
|
|
113
175
|
---
|
|
114
|
-
*
|
|
115
|
-
-
|
|
116
|
-
- fluentd
|
|
117
|
-
-
|
|
118
|
-
* enable_fast_crc
|
|
119
|
-
-
|
|
120
|
-
* retry_time/retry_interval
|
|
121
|
-
- fluentd
|
|
122
|
-
* Warning
|
|
123
|
-
-
|
|
124
|
-
* Fluent::BufferQueueLimitError error="queue size exceeds limit"
|
|
125
|
-
- fluentd
|
|
126
|
-
*
|
|
127
|
-
-
|
|
128
|
-
* partition has no corresponding source key or the partition expression is wrong
|
|
129
|
-
-
|
|
130
|
-
* Failed to format the data
|
|
131
|
-
-
|
|
176
|
+
* 程序抛出异常InvalidShardId/ShardNotReady是什么原因导致?
|
|
177
|
+
- 可能系统正在升级,会短暂出现这个问题,会在短时间内恢复;
|
|
178
|
+
- fluentd如果存在多个进程请查看配置项shard_num是否都配置成了一样的值(或都不配置),如果配置不一样是会导致这个问题的,shard_number少的进程会把多余shard Unload掉;
|
|
179
|
+
- 可能存在另外的使用sdk等方式进行了loadshard/unloadshard等操作。
|
|
180
|
+
* enable_fast_crc如何检查是否兼容?
|
|
181
|
+
- 开启此配置后再启动fluentd进程,启动时会验证,如果失败会抛出错误原因(reload不会进行验证),或进入插件目录后利用ldd查看aliyun-odps-fluentd-plugin/lib/fluent/plugin/crc/lib/linux/crc32c.so。
|
|
182
|
+
* retry_time/retry_interval与fluentd自带的retry有何区别?
|
|
183
|
+
- fluentd自带retry默认持续36小时,会将整个buffer_chunk重发,配置动态partition情况下重发全部数据可能造成数据重复。配置这两项就会使用插件内部重试,如果重试失败,会再根据abandon_mode的值判定放弃该pack的数据还是交给fluentd重发整个buffer。
|
|
184
|
+
* Warning:ErrorCode: NoSuchPartition, Message: write failed because The specified partition does not exist.是什么意思?
|
|
185
|
+
- 本插件会再catch到Odps的NoSuchPartition时会主动创建分区,如果遇到这个warn表示Odps表中不存在数据对应分区,会自动创建,如果创建成功会有信息提示。
|
|
186
|
+
* Fluent::BufferQueueLimitError error="queue size exceeds limit"是什么原因?
|
|
187
|
+
- fluentd在读取数据-发送数据过程中,会先读取到一个buffer中,具体大小根据配置中buffer_chunk_limit与buffer_queue_limit共同决定,如果遇到这个错误,很可能是因为堆积数据导致buffer不足,可以尝试增大buffer_queue_limit解决这个问题。
|
|
188
|
+
* 多个config文件如何分别启动一个fluentd进程?
|
|
189
|
+
- 如果存在多个config文件,可以使用in_multiprocess这个插件同时启动不同的进程来服务。
|
|
190
|
+
* partition has no corresponding source key or the partition expression is wrong.这个异常是什么原因?
|
|
191
|
+
- 这个异常表示在source data中找不到配置在partition字段中的值,例如partition ctime=${remote},而remote没有出现在source中,请检查配置。
|
|
192
|
+
* Failed to format the data.这个异常是什么原因?
|
|
193
|
+
- 这个错误信息抛出代表解析partition过程出现问题,请检查partition配置,如果数据中存在脏数据也可能遇到这个问题。
|
|
194
|
+
* 如何更改为淘宝源RubyGem?
|
|
195
|
+
- RubyGems 镜像[https://ruby.taobao.org/]
|
|
196
|
+
|
|
197
|
+
## 官方网站
|
|
198
|
+
- [Fluentd User Guide](http://docs.fluentd.org/)
|
|
199
|
+
|
|
200
|
+
## 作者
|
|
201
|
+
- [Sun Zongtao]()
|
|
202
|
+
- [Cai Ying]()
|
|
203
|
+
- [Dong Xiao](https://github.com/dongxiao1198)
|
|
204
|
+
- [Yang Hongbo](https://github.com/hongbosoftware)
|
|
205
|
+
|
|
206
|
+
## License
|
|
207
|
+
licensed under the [Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0.html)
|
data/README.md
CHANGED
|
@@ -69,14 +69,14 @@ $ gem install fluent-plugin-aliyun-odps
|
|
|
69
69
|
- aliyun_access_key(Required):your aliyun access key.
|
|
70
70
|
- aliyun_odps_hub_endpoint(Required):if you are using ECS, set it as http://dh-ext.odps.aliyun-inc.com, otherwise using http://dh.odps.aliyun.com.
|
|
71
71
|
- aliyunodps_endpoint(Required):if you are using ECS, set it as http://odps-ext.aiyun-inc.com/api, otherwise using http://service.odps.aliyun.com/api .
|
|
72
|
-
- buffer_chunk_limit(Optional):chunk size
|
|
73
|
-
- buffer_queue_limit(Optional):buffer chunk size
|
|
72
|
+
- buffer_chunk_limit(Optional):chunk size,“k” (KB), “m” (MB), and “g” (GB) ,default 8MB,recommended number is 2MB, max size is 20MB.
|
|
73
|
+
- buffer_queue_limit(Optional):buffer chunk size,example: buffer_chunk_limit2m,buffer_queue_limit 128,then the total buffer size is 2*128MB.
|
|
74
74
|
- flush_interval(Optional):interval to flush data buffer, default 60s.
|
|
75
75
|
- abandon_mode(Optional):drop pack after retry 3 times.
|
|
76
76
|
- project(Required):your project name.
|
|
77
77
|
- table(Required):your table name.
|
|
78
78
|
- fields(Required): must match the keys in source.
|
|
79
|
-
- partition(Optional)
|
|
79
|
+
- partition(Optional):set this if your table is partitioned.
|
|
80
80
|
- partition format:
|
|
81
81
|
- fix string: partition ctime=20150804
|
|
82
82
|
- key words: partition ctime=${remote}
|
data/VERSION
CHANGED
|
@@ -1 +1 @@
|
|
|
1
|
-
0.1.
|
|
1
|
+
0.1.3
|
data/build.sh
ADDED
|
@@ -0,0 +1,10 @@
|
|
|
1
|
+
#!/bin/bash
|
|
2
|
+
mkdir package
|
|
3
|
+
mkdir package/temp
|
|
4
|
+
mkdir package/temp/fluentd
|
|
5
|
+
gem build fluent-plugin-aliyun-odps.gemspec
|
|
6
|
+
cp ext/* ./package/temp/fluentd/ -r
|
|
7
|
+
cp README.cn.md ./package/temp/fluentd/README.cn.md
|
|
8
|
+
cp example.conf ./package/temp/fluentd/example.conf
|
|
9
|
+
cp fluent-plugin-aliyun-odps-*.gem ./package/temp/fluentd/dependency_gem/
|
|
10
|
+
tar zcvf ./package/fluentd_package.tar.gz -C ./package/temp/ .
|
data/example.conf
ADDED
|
@@ -0,0 +1,26 @@
|
|
|
1
|
+
<source>
|
|
2
|
+
type tail
|
|
3
|
+
path /path/to/students.csv
|
|
4
|
+
tag input.csv
|
|
5
|
+
format csv
|
|
6
|
+
</source>
|
|
7
|
+
<match input.*>
|
|
8
|
+
type aliyun_odps
|
|
9
|
+
aliyun_access_id ************
|
|
10
|
+
aliyun_access_key *********
|
|
11
|
+
aliyun_odps_endpoint http://service.odps.aliyun.com/api
|
|
12
|
+
aliyun_odps_hub_endpoint http://dh.odps.aliyun.com
|
|
13
|
+
buffer_chunk_limit 2m
|
|
14
|
+
buffer_queue_limit 128
|
|
15
|
+
flush_interval 5s
|
|
16
|
+
project your_projectName #填写需要导入数据的project名称
|
|
17
|
+
enable_fast_crc true
|
|
18
|
+
<table input.csv>
|
|
19
|
+
table students
|
|
20
|
+
fields id,name,score
|
|
21
|
+
shard_number 1
|
|
22
|
+
retry_time 3
|
|
23
|
+
retry_interval 1
|
|
24
|
+
abandon_mode true
|
|
25
|
+
</table>
|
|
26
|
+
</match>
|
|
@@ -12,16 +12,15 @@ Gem::Specification.new do |gem|
|
|
|
12
12
|
gem.email = "dongxiao.dx@alibaba-inc.com"
|
|
13
13
|
gem.has_rdoc = false
|
|
14
14
|
#gem.platform = Gem::Platform::RUBY
|
|
15
|
-
gem.files = `git ls-files`.split("\n")
|
|
15
|
+
gem.files = `git ls-files | grep -v ext | grep -v package`.split("\n")
|
|
16
16
|
gem.test_files = `git ls-files -- {test,spec,features}/*`.split("\n")
|
|
17
17
|
gem.executables = `git ls-files -- bin/*`.split("\n").map{ |f| File.basename(f) }
|
|
18
18
|
gem.require_paths = ['lib']
|
|
19
19
|
|
|
20
20
|
gem.add_dependency "fluentd", [">= 0.10.49", "< 2"]
|
|
21
|
-
gem.add_dependency "protobuf",
|
|
21
|
+
gem.add_dependency "protobuf", '~> 3.5', '>= 3.5.1'
|
|
22
22
|
gem.add_dependency "yajl-ruby", "~> 1.0"
|
|
23
|
-
gem.
|
|
24
|
-
gem.add_development_dependency "
|
|
25
|
-
gem.add_development_dependency "
|
|
26
|
-
gem.add_development_dependency "test-unit", ">= 3.0.8"
|
|
23
|
+
gem.add_development_dependency "rake", '~> 0.9', '>= 0.9.2'
|
|
24
|
+
gem.add_development_dependency "flexmock", '~> 1.2', '>= 1.2.0'
|
|
25
|
+
gem.add_development_dependency "test-unit", '~> 3.0', '>= 3.0.8'
|
|
27
26
|
end
|
|
@@ -110,66 +110,73 @@ module Fluent
|
|
|
110
110
|
begin
|
|
111
111
|
#if partition is not empty
|
|
112
112
|
unless @partition.blank? then
|
|
113
|
-
|
|
114
|
-
|
|
115
|
-
|
|
116
|
-
|
|
117
|
-
|
|
118
|
-
|
|
119
|
-
|
|
120
|
-
|
|
121
|
-
|
|
122
|
-
|
|
123
|
-
|
|
124
|
-
|
|
125
|
-
|
|
126
|
-
if
|
|
127
|
-
|
|
113
|
+
begin
|
|
114
|
+
#if partition has params in it
|
|
115
|
+
if @partition.include? "=${"
|
|
116
|
+
#split partition
|
|
117
|
+
partition_arrays=@partition.split(',')
|
|
118
|
+
partition_name=''
|
|
119
|
+
i=1
|
|
120
|
+
for p in partition_arrays do
|
|
121
|
+
#if partition is time formated
|
|
122
|
+
if p.include? "strftime"
|
|
123
|
+
key=p[p.index("{")+1, p.index(".strftime")-1-p.index("{")]
|
|
124
|
+
partition_column=p[0, p.index("=")]
|
|
125
|
+
timeFormat=p[p.index("(")+2, p.index(")")-3-p.index("(")]
|
|
126
|
+
if data.has_key?(key)
|
|
127
|
+
if time_format == nil
|
|
128
|
+
partition_value=Time.parse(data[key]).strftime(timeFormat)
|
|
129
|
+
else
|
|
130
|
+
partition_value=Time.strptime(data[key], time_format).strftime(timeFormat)
|
|
131
|
+
end
|
|
132
|
+
if i==1
|
|
133
|
+
partition_name+=partition_column+"="+partition_value
|
|
134
|
+
else
|
|
135
|
+
partition_name+=","+partition_column+"="+partition_value
|
|
136
|
+
end
|
|
128
137
|
else
|
|
129
|
-
|
|
138
|
+
raise "partition has no corresponding source key or the partition expression is wrong,"+data.to_s
|
|
130
139
|
end
|
|
131
|
-
|
|
132
|
-
|
|
140
|
+
elsif p.include? "=${"
|
|
141
|
+
key=p[p.index("{")+1, p.index("}")-1-p.index("{")]
|
|
142
|
+
partition_column=p[0, p.index("=")]
|
|
143
|
+
if data.has_key?(key)
|
|
144
|
+
partition_value=data[key]
|
|
145
|
+
if i==1
|
|
146
|
+
partition_name+=partition_column+"="+partition_value
|
|
147
|
+
else
|
|
148
|
+
partition_name+=","+partition_column+"="+partition_value
|
|
149
|
+
end
|
|
133
150
|
else
|
|
134
|
-
|
|
151
|
+
raise "partition has no corresponding source key or the partition expression is wrong,"+data.to_s
|
|
135
152
|
end
|
|
136
153
|
else
|
|
137
|
-
raise "partition has no corresponding source key or the partition expression is wrong,"+data.to_s
|
|
138
|
-
end
|
|
139
|
-
elsif p.include? "=${"
|
|
140
|
-
key=p[p.index("{")+1, p.index("}")-1-p.index("{")]
|
|
141
|
-
partition_column=p[0, p.index("=")]
|
|
142
|
-
if data.has_key?(key)
|
|
143
|
-
partition_value=data[key]
|
|
144
154
|
if i==1
|
|
145
|
-
partition_name+=
|
|
155
|
+
partition_name+=p
|
|
146
156
|
else
|
|
147
|
-
partition_name+=","+
|
|
157
|
+
partition_name+=","+p
|
|
148
158
|
end
|
|
149
|
-
else
|
|
150
|
-
raise "partition has no corresponding source key or the partition expression is wrong,"+data.to_s
|
|
151
|
-
end
|
|
152
|
-
else
|
|
153
|
-
if i==1
|
|
154
|
-
partition_name+=p
|
|
155
|
-
else
|
|
156
|
-
partition_name+=","+p
|
|
157
159
|
end
|
|
160
|
+
i+=1
|
|
158
161
|
end
|
|
159
|
-
|
|
162
|
+
else
|
|
163
|
+
partition_name=@partition
|
|
164
|
+
end
|
|
165
|
+
if partitions[partition_name]==nil
|
|
166
|
+
partitions[partition_name]=[]
|
|
167
|
+
end
|
|
168
|
+
partitions[partition_name] << @format_proc.call(data)
|
|
169
|
+
rescue => ex
|
|
170
|
+
if (@abandon_mode)
|
|
171
|
+
@log.error "Format partition failed, abandon this record. Msg:" +ex.message + " Table:" + @table
|
|
172
|
+
@log.error "Drop data:" + data.to_s
|
|
173
|
+
else
|
|
174
|
+
raise e
|
|
160
175
|
end
|
|
161
|
-
else
|
|
162
|
-
partition_name=@partition
|
|
163
|
-
end
|
|
164
|
-
if partitions[partition_name]==nil
|
|
165
|
-
partitions[partition_name]=[]
|
|
166
176
|
end
|
|
167
|
-
partitions[partition_name] << @format_proc.call(data)
|
|
168
|
-
|
|
169
177
|
else
|
|
170
178
|
records << @format_proc.call(data)
|
|
171
179
|
end
|
|
172
|
-
|
|
173
180
|
rescue => e
|
|
174
181
|
raise "Failed to format the data:"+ e.message + " " +e.backtrace.inspect.to_s
|
|
175
182
|
end
|
|
@@ -214,6 +221,7 @@ module Fluent
|
|
|
214
221
|
else
|
|
215
222
|
if (@abandon_mode)
|
|
216
223
|
@log.error "Retry failed, abandon this pack. Msg:" + e.message + " partitions:" + k.to_s + " table:" + @table
|
|
224
|
+
@log.error v[sendCount*threadId..sendCount*(threadId+1)+restCount-1]
|
|
217
225
|
else
|
|
218
226
|
raise e
|
|
219
227
|
end
|
|
@@ -252,6 +260,7 @@ module Fluent
|
|
|
252
260
|
else
|
|
253
261
|
if (@abandon_mode)
|
|
254
262
|
@log.error "Retry failed, abandon this pack. Msg:" + e.message + " Table:" + @table
|
|
263
|
+
@log.error records[sendCount*threadId..sendCount*(threadId+1)+restCount-1]
|
|
255
264
|
else
|
|
256
265
|
raise e
|
|
257
266
|
end
|
metadata
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
|
2
2
|
name: fluent-plugin-aliyun-odps
|
|
3
3
|
version: !ruby/object:Gem::Version
|
|
4
|
-
version: 0.1.
|
|
4
|
+
version: 0.1.3
|
|
5
5
|
platform: ruby
|
|
6
6
|
authors:
|
|
7
7
|
- Xiao Dong
|
|
@@ -9,7 +9,7 @@ authors:
|
|
|
9
9
|
autorequire:
|
|
10
10
|
bindir: bin
|
|
11
11
|
cert_chain: []
|
|
12
|
-
date: 2016-03-
|
|
12
|
+
date: 2016-03-16 00:00:00.000000000 Z
|
|
13
13
|
dependencies:
|
|
14
14
|
- !ruby/object:Gem::Dependency
|
|
15
15
|
name: fluentd
|
|
@@ -36,6 +36,9 @@ dependencies:
|
|
|
36
36
|
requirement: !ruby/object:Gem::Requirement
|
|
37
37
|
requirements:
|
|
38
38
|
- - "~>"
|
|
39
|
+
- !ruby/object:Gem::Version
|
|
40
|
+
version: '3.5'
|
|
41
|
+
- - ">="
|
|
39
42
|
- !ruby/object:Gem::Version
|
|
40
43
|
version: 3.5.1
|
|
41
44
|
type: :runtime
|
|
@@ -43,6 +46,9 @@ dependencies:
|
|
|
43
46
|
version_requirements: !ruby/object:Gem::Requirement
|
|
44
47
|
requirements:
|
|
45
48
|
- - "~>"
|
|
49
|
+
- !ruby/object:Gem::Version
|
|
50
|
+
version: '3.5'
|
|
51
|
+
- - ">="
|
|
46
52
|
- !ruby/object:Gem::Version
|
|
47
53
|
version: 3.5.1
|
|
48
54
|
- !ruby/object:Gem::Dependency
|
|
@@ -59,24 +65,13 @@ dependencies:
|
|
|
59
65
|
- - "~>"
|
|
60
66
|
- !ruby/object:Gem::Version
|
|
61
67
|
version: '1.0'
|
|
62
|
-
- !ruby/object:Gem::Dependency
|
|
63
|
-
name: fluent-mixin-config-placeholders
|
|
64
|
-
requirement: !ruby/object:Gem::Requirement
|
|
65
|
-
requirements:
|
|
66
|
-
- - ">="
|
|
67
|
-
- !ruby/object:Gem::Version
|
|
68
|
-
version: '0'
|
|
69
|
-
type: :runtime
|
|
70
|
-
prerelease: false
|
|
71
|
-
version_requirements: !ruby/object:Gem::Requirement
|
|
72
|
-
requirements:
|
|
73
|
-
- - ">="
|
|
74
|
-
- !ruby/object:Gem::Version
|
|
75
|
-
version: '0'
|
|
76
68
|
- !ruby/object:Gem::Dependency
|
|
77
69
|
name: rake
|
|
78
70
|
requirement: !ruby/object:Gem::Requirement
|
|
79
71
|
requirements:
|
|
72
|
+
- - "~>"
|
|
73
|
+
- !ruby/object:Gem::Version
|
|
74
|
+
version: '0.9'
|
|
80
75
|
- - ">="
|
|
81
76
|
- !ruby/object:Gem::Version
|
|
82
77
|
version: 0.9.2
|
|
@@ -84,6 +79,9 @@ dependencies:
|
|
|
84
79
|
prerelease: false
|
|
85
80
|
version_requirements: !ruby/object:Gem::Requirement
|
|
86
81
|
requirements:
|
|
82
|
+
- - "~>"
|
|
83
|
+
- !ruby/object:Gem::Version
|
|
84
|
+
version: '0.9'
|
|
87
85
|
- - ">="
|
|
88
86
|
- !ruby/object:Gem::Version
|
|
89
87
|
version: 0.9.2
|
|
@@ -91,6 +89,9 @@ dependencies:
|
|
|
91
89
|
name: flexmock
|
|
92
90
|
requirement: !ruby/object:Gem::Requirement
|
|
93
91
|
requirements:
|
|
92
|
+
- - "~>"
|
|
93
|
+
- !ruby/object:Gem::Version
|
|
94
|
+
version: '1.2'
|
|
94
95
|
- - ">="
|
|
95
96
|
- !ruby/object:Gem::Version
|
|
96
97
|
version: 1.2.0
|
|
@@ -98,6 +99,9 @@ dependencies:
|
|
|
98
99
|
prerelease: false
|
|
99
100
|
version_requirements: !ruby/object:Gem::Requirement
|
|
100
101
|
requirements:
|
|
102
|
+
- - "~>"
|
|
103
|
+
- !ruby/object:Gem::Version
|
|
104
|
+
version: '1.2'
|
|
101
105
|
- - ">="
|
|
102
106
|
- !ruby/object:Gem::Version
|
|
103
107
|
version: 1.2.0
|
|
@@ -105,6 +109,9 @@ dependencies:
|
|
|
105
109
|
name: test-unit
|
|
106
110
|
requirement: !ruby/object:Gem::Requirement
|
|
107
111
|
requirements:
|
|
112
|
+
- - "~>"
|
|
113
|
+
- !ruby/object:Gem::Version
|
|
114
|
+
version: '3.0'
|
|
108
115
|
- - ">="
|
|
109
116
|
- !ruby/object:Gem::Version
|
|
110
117
|
version: 3.0.8
|
|
@@ -112,6 +119,9 @@ dependencies:
|
|
|
112
119
|
prerelease: false
|
|
113
120
|
version_requirements: !ruby/object:Gem::Requirement
|
|
114
121
|
requirements:
|
|
122
|
+
- - "~>"
|
|
123
|
+
- !ruby/object:Gem::Version
|
|
124
|
+
version: '3.0'
|
|
115
125
|
- - ">="
|
|
116
126
|
- !ruby/object:Gem::Version
|
|
117
127
|
version: 3.0.8
|
|
@@ -129,6 +139,8 @@ files:
|
|
|
129
139
|
- README.md
|
|
130
140
|
- Rakefile
|
|
131
141
|
- VERSION
|
|
142
|
+
- build.sh
|
|
143
|
+
- example.conf
|
|
132
144
|
- fluent-plugin-aliyun-odps.gemspec
|
|
133
145
|
- lib/fluent/plugin/conf/config.rb
|
|
134
146
|
- lib/fluent/plugin/crc/crc.rb
|
|
@@ -139,7 +151,6 @@ files:
|
|
|
139
151
|
- lib/fluent/plugin/crc/origin/crc32c.rb
|
|
140
152
|
- lib/fluent/plugin/crc/src/crc32c.c
|
|
141
153
|
- lib/fluent/plugin/crc/src/crc32c.h
|
|
142
|
-
- lib/fluent/plugin/crc/src/extconf.rb
|
|
143
154
|
- lib/fluent/plugin/exceptions.rb
|
|
144
155
|
- lib/fluent/plugin/http/http_connection.rb
|
|
145
156
|
- lib/fluent/plugin/http/http_flag.rb
|
|
@@ -154,7 +165,6 @@ files:
|
|
|
154
165
|
- lib/fluent/plugin/stream_client.rb
|
|
155
166
|
- lib/fluent/plugin/stream_reader.rb
|
|
156
167
|
- lib/fluent/plugin/stream_writer.rb
|
|
157
|
-
- odps_example.conf
|
|
158
168
|
homepage: https://github.com/aliyun/aliyun-odps-fluentd-plugin
|
|
159
169
|
licenses:
|
|
160
170
|
- Apache-2.0
|
data/odps_example.conf
DELETED
|
@@ -1,31 +0,0 @@
|
|
|
1
|
-
####
|
|
2
|
-
## Output descriptions:
|
|
3
|
-
##
|
|
4
|
-
|
|
5
|
-
<source>
|
|
6
|
-
type tail
|
|
7
|
-
path /opt/log/in/in.log
|
|
8
|
-
refresh_interval 5s
|
|
9
|
-
tag in.log
|
|
10
|
-
format csv
|
|
11
|
-
keys dt,week,r1,r2,r3,r4,r5,r6,r7,blue
|
|
12
|
-
</source>
|
|
13
|
-
|
|
14
|
-
<match in.**>
|
|
15
|
-
type aliyun_odps
|
|
16
|
-
aliyun_access_id ************
|
|
17
|
-
aliyun_access_key *********
|
|
18
|
-
aliyun_odps_endpoint http://service.odps.aliyun.com/api
|
|
19
|
-
aliyun_odps_hub_endpoint http://dh.odps.aliyun.com
|
|
20
|
-
buffer_chunk_limit 2m
|
|
21
|
-
buffer_queue_limit 128
|
|
22
|
-
flush_interval 5s
|
|
23
|
-
project your_projectName
|
|
24
|
-
enable_fast_crc false
|
|
25
|
-
<table in.log>
|
|
26
|
-
table your_tableName
|
|
27
|
-
fields r1,r2,r3,r4,r5,r6,blue
|
|
28
|
-
partition ctime=${dt.strftime('%Y%m%d')}
|
|
29
|
-
shard_number 1
|
|
30
|
-
</table>
|
|
31
|
-
</match>
|