fluent-plugin-aliyun-odps 0.1.2 → 0.1.3
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/.gitignore +1 -0
- data/CHANGELOG.md +5 -3
- data/README.cn.md +163 -87
- data/README.md +3 -3
- data/VERSION +1 -1
- data/build.sh +10 -0
- data/example.conf +26 -0
- data/fluent-plugin-aliyun-odps.gemspec +5 -6
- data/lib/fluent/plugin/http/http_flag.rb +1 -1
- data/lib/fluent/plugin/out_aliyun_odps.rb +54 -45
- metadata +28 -18
- data/lib/fluent/plugin/crc/src/extconf.rb +0 -3
- data/odps_example.conf +0 -31
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 5b96ab3b2194318e749ee184cd5bdf8bb4a4875c
|
4
|
+
data.tar.gz: a846f80a43b87c5551491a5c8e4c3d387b8e003f
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 592b3a66fa676ecd783e2e999c82a6757d1a387fe3a0731c5f195e8d44ba2065f55df0b90535611040ff6bbdfda0449d4d16f9c6e6c61db6153e8b7fe5076403
|
7
|
+
data.tar.gz: 56164832ec7b335daf343ba0a02a464e89830028dbad1430177fa19bf153177e31d81938da81708fb59a18e1374249cd013451061c8594f528b651a9fa4dcb80
|
data/.gitignore
CHANGED
data/CHANGELOG.md
CHANGED
@@ -3,9 +3,9 @@ Fix datetime format bug, support String, DateTime, Time type when write to a dat
|
|
3
3
|
0.0.5
|
4
4
|
Add reload shard when import fails, and remove unload shard operation when shut down.
|
5
5
|
0.0.6
|
6
|
-
Add decimal support
|
6
|
+
Add decimal support,fix string input while setting double and int.
|
7
7
|
0.0.7
|
8
|
-
Add error msg when add partition fail, support fast crc
|
8
|
+
Add error msg when add partition fail, support fast crc, remove pack size limit.
|
9
9
|
0.0.8
|
10
10
|
Add abandon mode, fix fluent retry bug, fix partition mixed mode bug.
|
11
11
|
0.0.9
|
@@ -15,4 +15,6 @@ Add partition when catch NoSuchPartition.
|
|
15
15
|
0.1.1
|
16
16
|
Fix some log format.
|
17
17
|
0.1.2
|
18
|
-
Use XStreamPack.
|
18
|
+
Use XStreamPack.
|
19
|
+
0.1.3
|
20
|
+
Drop record with error log when parse partition failed.
|
data/README.cn.md
CHANGED
@@ -1,47 +1,122 @@
|
|
1
1
|
# Aliyun ODPS Plugin for Fluentd
|
2
2
|
|
3
|
-
##
|
3
|
+
## 开始使用
|
4
4
|
---
|
5
5
|
|
6
|
-
###
|
6
|
+
### 介绍
|
7
7
|
|
8
|
-
-
|
9
|
-
- ODPS DataHub Service(DHS)
|
8
|
+
- 开放数据处理服务(Open Data Processing Service,简称ODPS)是阿里巴巴自主研发的海量数据处理平台。主要服务于批量结构化数据的存储和计算,可以提供海量数据仓库的解决方案以及针对大数据的分析建模服务。
|
9
|
+
- ODPS DataHub Service(DHS)是一个ODPS的内建服务,向用户提供实时数据的发布(Publish)和订阅(Subscribe)的功能。发布的数据会自动被写入ODPS表中。所以DHS也可以做为ODPS导入数据的一个入口。
|
10
|
+
- 本插件提供向odps表通过DataHub服务写入数据的能力,并具备按用户要求的格式自动创建分区的功能。
|
10
11
|
|
11
12
|
|
12
|
-
###
|
13
|
+
### 环境要求
|
13
14
|
|
14
|
-
|
15
|
+
使用此插件,需要具备如下环境:
|
15
16
|
|
16
|
-
1. Ruby 2.1.0
|
17
|
-
2. Gem 2.4.5
|
18
|
-
3. Fluentd-0.10.49
|
19
|
-
4. Protobuf-3.5.1
|
17
|
+
1. Ruby 2.1.0 或更新
|
18
|
+
2. Gem 2.4.5 或更新
|
19
|
+
3. Fluentd-0.10.49 或更新 (*[Home Page](http://www.fluentd.org/)*)
|
20
|
+
4. Protobuf-3.5.1 或更新(Ruby protobuf)
|
20
21
|
5. Ruby-devel
|
21
22
|
|
22
|
-
###
|
23
|
+
### 安装部署
|
24
|
+
安装部署Fluentd可以选择以下两种方式之一。
|
25
|
+
1. 一键安装包适用于第一次安装Ruby&Fluentd环境的用户或局域网用户,一键安装包包含了所需的Ruby环境以及Fluentd。目前一键安装包仅支持Linux环境。
|
26
|
+
2. 通过网络安装适用于对Ruby有了解的用户,需要提前确认Ruby版本,若低于2.1.0则需要升级或安装更高级的Ruby环境,然后通过RubyGem安装Fluentd。
|
23
27
|
|
24
|
-
|
28
|
+
注:
|
29
|
+
* RubyGem源建议更改为https://ruby.taobao.org/
|
30
|
+
* 局域网环境安装可以通过本地安装Gem文件
|
31
|
+
```
|
32
|
+
gem install --local fluent-plugin-aliyun-odps-0.1.2.gem
|
33
|
+
```
|
25
34
|
|
35
|
+
#### 安装方式一:一键安装包安装
|
36
|
+
1. 下载解压 [fluentd_package.tar.gz](http://gitlab.alibaba-inc.com/aliopensource/aliyun-odps-fluentd-plugin/blob/master/package/fluentd_package.tar.gz)
|
37
|
+
2. 可以修改install_agent.sh中$DIR为你想安装ruby的路径,默认会安装在当前路径下面
|
38
|
+
3. 执行如下命令,提示“Success”表示安装成功
|
26
39
|
```
|
27
|
-
|
40
|
+
bash install_agent.sh
|
28
41
|
```
|
42
|
+
4. fluentd程序会被安装在当前目录的bin目录下面
|
29
43
|
|
30
|
-
|
44
|
+
#### 安装方式二:通过网络安装
|
45
|
+
1. Ruby安装(已经存在Ruby 2.1.0以上环境可忽略此步骤):
|
46
|
+
```
|
47
|
+
wget https://cache.ruby-lang.org/pub/ruby/2.3/ruby-2.3.0.tar.gz
|
48
|
+
tar xzvf ruby-2.3.0.tar.gz
|
49
|
+
cd ruby-2.3.0
|
50
|
+
./configure --prefix=DIR
|
51
|
+
make
|
52
|
+
make install
|
53
|
+
```
|
54
|
+
2 Fluentd以及插件安装
|
55
|
+
```
|
56
|
+
$ gem install fluent-plugin-aliyun-odps
|
57
|
+
```
|
31
58
|
|
59
|
+
### 插件使用示例
|
60
|
+
#### 示例一 上传csv文件中的数据
|
61
|
+
1. 首先需要在odps准备一张表,在这里假设表名为 students, 包含三个字段 id, name, score, 类型分别为string, string, bigint
|
62
|
+
2. 准备csv数据文件, 假设数据文件内容如下
|
63
|
+
```
|
64
|
+
1, jack ma, 90
|
65
|
+
2, pony zhang, 85
|
66
|
+
3, lucy wang, 88
|
67
|
+
```
|
68
|
+
3. 准备fluentd配置文件, 保存以下内容为文件fluentd.conf。
|
32
69
|
```
|
33
70
|
<source>
|
34
71
|
type tail
|
35
|
-
path /
|
36
|
-
|
37
|
-
|
38
|
-
tag in.log
|
39
|
-
format /^(?<remote>[^ ]*) - - \[(?<datetime>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^\"]*?)(?: +\S*)?)?" (?<code>[^ ]*) (?<size>[^ ]*) "-" "(?<agent>[^\"]*)"$/
|
40
|
-
time_format %Y%b%d %H:%M:%S %z
|
72
|
+
path /path/to/students.csv
|
73
|
+
tag input.csv
|
74
|
+
format csv
|
41
75
|
</source>
|
76
|
+
<match input.*>
|
77
|
+
type aliyun_odps
|
78
|
+
aliyun_access_id ************
|
79
|
+
aliyun_access_key *********
|
80
|
+
aliyun_odps_endpoint http://service.odps.aliyun.com/api
|
81
|
+
aliyun_odps_hub_endpoint http://dh.odps.aliyun.com
|
82
|
+
buffer_chunk_limit 2m
|
83
|
+
buffer_queue_limit 128
|
84
|
+
flush_interval 5s
|
85
|
+
project your_projectName #填写需要导入数据的project名称
|
86
|
+
enable_fast_crc true
|
87
|
+
<table input.csv>
|
88
|
+
table students
|
89
|
+
fields id,name,score
|
90
|
+
shard_number 1
|
91
|
+
retry_time 3
|
92
|
+
retry_interval 1
|
93
|
+
abandon_mode true
|
94
|
+
</table>
|
95
|
+
</match>
|
96
|
+
```
|
97
|
+
4. 执行fluentd命令,并用-c指定配置文件
|
98
|
+
```
|
99
|
+
fluentd -c fluentd.conf
|
100
|
+
```
|
101
|
+
5. 完成后可用如下sql命令查询数据
|
42
102
|
```
|
103
|
+
select * from students;
|
43
104
|
```
|
44
|
-
|
105
|
+
|
106
|
+
#### 示例二 抓取上传实时nginx日志文件
|
107
|
+
1. 对于nginx日志文件,fluentd可用采用正则表达式的方式来解析数据。
|
108
|
+
2. 参考使用如下配置文件,执行命令同示例一。
|
109
|
+
```
|
110
|
+
<source>
|
111
|
+
type tail
|
112
|
+
path /home/admin/nginx/logs/access.log #nginx log 地址
|
113
|
+
pos_file /tmp/nginx.access.pos
|
114
|
+
refresh_interval 5s
|
115
|
+
tag nginx.access
|
116
|
+
format /^(?<remote>[^ ]*) - \[(?<dt>[^\]]*)\] "(?<method>\S+) ((?<path>[^\"]*) +\S*)?" (?<code>[^ ]*) (?<size>[^ ]*) "(?<agent>[^\"]*)" "(?<requesttime>[^\"]*)"? $/ #解析日志的正则表达式
|
117
|
+
time_format %d/%b/%Y:%H:%M:%S %z
|
118
|
+
</source>
|
119
|
+
<match nginx.access>
|
45
120
|
type aliyun_odps
|
46
121
|
aliyun_access_id ************
|
47
122
|
aliyun_access_key *********
|
@@ -52,9 +127,10 @@ $ gem install fluent-plugin-aliyun-odps
|
|
52
127
|
flush_interval 5s
|
53
128
|
project your_projectName
|
54
129
|
enable_fast_crc true
|
55
|
-
<table
|
56
|
-
|
57
|
-
|
130
|
+
<table nginx.access>
|
131
|
+
table nginx_logs #对应日志写入的odps表
|
132
|
+
fields remote,method,path,code,size,agent,requesttime
|
133
|
+
shard_number 5
|
58
134
|
partition ctime=${datetime.strftime('%Y%m%d')}
|
59
135
|
time_format %d/%b/%Y:%H:%M:%S %z
|
60
136
|
shard_number 1
|
@@ -64,68 +140,68 @@ $ gem install fluent-plugin-aliyun-odps
|
|
64
140
|
</table>
|
65
141
|
</match>
|
66
142
|
```
|
67
|
-
### ����˵��
|
68
|
-
|
69
|
-
- type(Fixed): �̶�ֵ aliyun_odps.
|
70
|
-
- aliyun_access_id(Required):������access_id.
|
71
|
-
- aliyun_access_key(Required):������access key.
|
72
|
-
- aliyun_odps_hub_endpoint(Required):�����ķ�������ECS�ϣ���ѱ�ֵ�趨Ϊ http://dh-ext.odps.aliyun-inc.com, ��������Ϊ http://dh.odps.aliyun.com.
|
73
|
-
- aliyunodps_endpoint(Required):�����ķ�������ECS�ϣ���ѱ�ֵ�趨Ϊ http://odps-ext.aiyun-inc.com/api, ��������Ϊ http://service.odps.aliyun.com/api .
|
74
|
-
- buffer_chunk_limit(Optional): ���С��֧�֡�k��(KB),��m��(MB)��λ��Ĭ�� 8MB������ֵ2MB, Ŀǰ���֧��20MB.
|
75
|
-
- buffer_queue_limit(Optional): ����д�С����ֵ��buffer_chunk_limit��ͬ����������������С��
|
76
|
-
- flush_interval(Optional): ǿ�Ʒ��ͼ�����ﵽʱ��������δ����ǿ�Ʒ���, Ĭ�� 60s.
|
77
|
-
- abandon_mode(Optional):�����������κ�������pack���ݡ�
|
78
|
-
- project(Required): project����.
|
79
|
-
- table(Required): table����.
|
80
|
-
- fields(Required): ��source��Ӧ���ֶ������������source֮��.
|
81
|
-
- partition(Optional)����Ϊ�������������ô���.
|
82
|
-
- ������֧�ֵ�����ģʽ:
|
83
|
-
- �̶�ֵ: partition ctime=20150804
|
84
|
-
- �ؼ���: partition ctime=${remote} ������remoteΪsource��ij�ֶΣ�
|
85
|
-
- ʱ���ʽ�ؼ���: partition ctime=${datetime.strftime('%Y%m%d')} ������datetimeΪsource��ijʱ���ʽ�ֶΣ����Ϊ%Y%m%d��ʽ��Ϊ�������ƣ�
|
86
|
-
- time_format(Optional):
|
87
|
-
- ���ʹ��ʱ���ʽ�ؼ���Ϊ<partition>, �����ñ�����. ����: source[datetime]="29/Aug/2015:11:10:16 +0800",������<time_format>Ϊ"%d/%b/%Y:%H:%M:%S %z"
|
88
|
-
- shard_number(Optional):ָ��shard���������������shard[0,shard_number-1]��Χ�ڵ�shard��д�����ݣ�����Ϊ����0��С��table��Ӧshard������������.
|
89
|
-
- enable_fast_crc(Optional): ʹ�ÿ���crc���㣬�⽫�����������ܣ���������ʹ�����ⲿ���صĶ�̬���ӿ⣬Ŀǰ��֧��64λlinux��windowsϵͳ.
|
90
|
-
- retry_time(Optional): ����ÿ��pack����ʱ�������Դ�����Ĭ��3��.
|
91
|
-
- retry_interval(Optional): ���Լ����Ĭ��1s.
|
92
|
-
- abandon_mode(Optional): Ĭ��Ϊfalse�����ó�true��������retry_time�����������ݰ�������Ὣ�쳣����fluentd������fluentd�����Ի������ԣ�����������ܻᵼ�������ظ�.
|
93
|
-
|
94
|
-
## �ٷ���վ
|
95
|
-
---
|
96
|
-
|
97
|
-
- [Fluentd User Guide](http://docs.fluentd.org/)
|
98
|
-
|
99
|
-
## ����
|
100
|
-
---
|
101
143
|
|
102
|
-
|
103
|
-
|
104
|
-
|
105
|
-
|
106
|
-
|
107
|
-
|
108
|
-
|
109
|
-
|
110
|
-
|
111
|
-
|
112
|
-
|
144
|
+
#### 示例三导入MySQL中的数据
|
145
|
+
1. mysql
|
146
|
+
|
147
|
+
### 参数说明
|
148
|
+
|
149
|
+
- type(Fixed): 固定值 aliyun_odps.
|
150
|
+
- aliyun_access_id(Required):阿里云access_id.
|
151
|
+
- aliyun_access_key(Required):阿里云access key.
|
152
|
+
- aliyun_odps_hub_endpoint(Required):如果你的服务部署在ECS上,请把本值设定为 http://dh-ext.odps.aliyun-inc.com, 否则设置为 http://dh.odps.aliyun.com.
|
153
|
+
- aliyunodps_endpoint(Required):如果你的服务部署在ECS上,请把本值设定为 http://odps-ext.aiyun-inc.com/api, 否则设置为 http://service.odps.aliyun.com/api .
|
154
|
+
- buffer_chunk_limit(Optional): 块大小,支持“k”(KB),“m”(MB)单位,默认 8MB,建议值2MB, 目前最大支持20MB.
|
155
|
+
- buffer_queue_limit(Optional): 块队列大小,此值与buffer_chunk_limit共同决定整个缓冲区大小。
|
156
|
+
- flush_interval(Optional): 强制发送间隔,达到时间后块数据未满则强制发送, 默认 60s.
|
157
|
+
- abandon_mode(Optional):内置重试三次后抛弃该pack数据。
|
158
|
+
- project(Required): project名称.
|
159
|
+
- table(Required): table名称.
|
160
|
+
- fields(Required): 与source对应,字段名必须存在于source之中.
|
161
|
+
- partition(Optional):若为分区表,则设置此项.
|
162
|
+
- 分区名支持的设置模式:
|
163
|
+
- 固定值: partition ctime=20150804
|
164
|
+
- 关键字: partition ctime=${remote} (其中remote为source中某字段)
|
165
|
+
- 时间格式关键字: partition ctime=${datetime.strftime('%Y%m%d')} (其中datetime为source中某时间格式字段,输出为%Y%m%d格式作为分区名称)
|
166
|
+
- time_format(Optional):
|
167
|
+
- 如果使用时间格式关键字为<partition>, 请设置本参数. 例如: source[datetime]="29/Aug/2015:11:10:16 +0800",则设置<time_format>为"%d/%b/%Y:%H:%M:%S %z"
|
168
|
+
- shard_number(Optional):指定shard数量,将会随机向shard[0,shard_number-1]范围内的shard中写入数据,必须为大于0且小于table对应shard数量上限的整数.
|
169
|
+
- enable_fast_crc(Optional): 使用快速crc计算,这将极大提升性能,但是由于使用了外部加载的动态链接库,目前仅支持64位linux、windows系统.
|
170
|
+
- retry_time(Optional): 发送每个pack数据时内置重试次数,默认3次.
|
171
|
+
- retry_interval(Optional): 重试间隔,默认1s.
|
172
|
+
- abandon_mode(Optional): 默认为false,设置成true会在重试retry_time后抛弃该数据包,否则会将异常抛送给fluentd,利用fluentd的重试机制重试,这种情况可能会导致数据重复.
|
173
|
+
|
174
|
+
## 常见使用问题以及异常描述
|
113
175
|
---
|
114
|
-
*
|
115
|
-
-
|
116
|
-
- fluentd
|
117
|
-
-
|
118
|
-
* enable_fast_crc
|
119
|
-
-
|
120
|
-
* retry_time/retry_interval
|
121
|
-
- fluentd
|
122
|
-
* Warning
|
123
|
-
-
|
124
|
-
* Fluent::BufferQueueLimitError error="queue size exceeds limit"
|
125
|
-
- fluentd
|
126
|
-
*
|
127
|
-
-
|
128
|
-
* partition has no corresponding source key or the partition expression is wrong
|
129
|
-
-
|
130
|
-
* Failed to format the data
|
131
|
-
-
|
176
|
+
* 程序抛出异常InvalidShardId/ShardNotReady是什么原因导致?
|
177
|
+
- 可能系统正在升级,会短暂出现这个问题,会在短时间内恢复;
|
178
|
+
- fluentd如果存在多个进程请查看配置项shard_num是否都配置成了一样的值(或都不配置),如果配置不一样是会导致这个问题的,shard_number少的进程会把多余shard Unload掉;
|
179
|
+
- 可能存在另外的使用sdk等方式进行了loadshard/unloadshard等操作。
|
180
|
+
* enable_fast_crc如何检查是否兼容?
|
181
|
+
- 开启此配置后再启动fluentd进程,启动时会验证,如果失败会抛出错误原因(reload不会进行验证),或进入插件目录后利用ldd查看aliyun-odps-fluentd-plugin/lib/fluent/plugin/crc/lib/linux/crc32c.so。
|
182
|
+
* retry_time/retry_interval与fluentd自带的retry有何区别?
|
183
|
+
- fluentd自带retry默认持续36小时,会将整个buffer_chunk重发,配置动态partition情况下重发全部数据可能造成数据重复。配置这两项就会使用插件内部重试,如果重试失败,会再根据abandon_mode的值判定放弃该pack的数据还是交给fluentd重发整个buffer。
|
184
|
+
* Warning:ErrorCode: NoSuchPartition, Message: write failed because The specified partition does not exist.是什么意思?
|
185
|
+
- 本插件会再catch到Odps的NoSuchPartition时会主动创建分区,如果遇到这个warn表示Odps表中不存在数据对应分区,会自动创建,如果创建成功会有信息提示。
|
186
|
+
* Fluent::BufferQueueLimitError error="queue size exceeds limit"是什么原因?
|
187
|
+
- fluentd在读取数据-发送数据过程中,会先读取到一个buffer中,具体大小根据配置中buffer_chunk_limit与buffer_queue_limit共同决定,如果遇到这个错误,很可能是因为堆积数据导致buffer不足,可以尝试增大buffer_queue_limit解决这个问题。
|
188
|
+
* 多个config文件如何分别启动一个fluentd进程?
|
189
|
+
- 如果存在多个config文件,可以使用in_multiprocess这个插件同时启动不同的进程来服务。
|
190
|
+
* partition has no corresponding source key or the partition expression is wrong.这个异常是什么原因?
|
191
|
+
- 这个异常表示在source data中找不到配置在partition字段中的值,例如partition ctime=${remote},而remote没有出现在source中,请检查配置。
|
192
|
+
* Failed to format the data.这个异常是什么原因?
|
193
|
+
- 这个错误信息抛出代表解析partition过程出现问题,请检查partition配置,如果数据中存在脏数据也可能遇到这个问题。
|
194
|
+
* 如何更改为淘宝源RubyGem?
|
195
|
+
- RubyGems 镜像[https://ruby.taobao.org/]
|
196
|
+
|
197
|
+
## 官方网站
|
198
|
+
- [Fluentd User Guide](http://docs.fluentd.org/)
|
199
|
+
|
200
|
+
## 作者
|
201
|
+
- [Sun Zongtao]()
|
202
|
+
- [Cai Ying]()
|
203
|
+
- [Dong Xiao](https://github.com/dongxiao1198)
|
204
|
+
- [Yang Hongbo](https://github.com/hongbosoftware)
|
205
|
+
|
206
|
+
## License
|
207
|
+
licensed under the [Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0.html)
|
data/README.md
CHANGED
@@ -69,14 +69,14 @@ $ gem install fluent-plugin-aliyun-odps
|
|
69
69
|
- aliyun_access_key(Required):your aliyun access key.
|
70
70
|
- aliyun_odps_hub_endpoint(Required):if you are using ECS, set it as http://dh-ext.odps.aliyun-inc.com, otherwise using http://dh.odps.aliyun.com.
|
71
71
|
- aliyunodps_endpoint(Required):if you are using ECS, set it as http://odps-ext.aiyun-inc.com/api, otherwise using http://service.odps.aliyun.com/api .
|
72
|
-
- buffer_chunk_limit(Optional):chunk size
|
73
|
-
- buffer_queue_limit(Optional):buffer chunk size
|
72
|
+
- buffer_chunk_limit(Optional):chunk size,“k” (KB), “m” (MB), and “g” (GB) ,default 8MB,recommended number is 2MB, max size is 20MB.
|
73
|
+
- buffer_queue_limit(Optional):buffer chunk size,example: buffer_chunk_limit2m,buffer_queue_limit 128,then the total buffer size is 2*128MB.
|
74
74
|
- flush_interval(Optional):interval to flush data buffer, default 60s.
|
75
75
|
- abandon_mode(Optional):drop pack after retry 3 times.
|
76
76
|
- project(Required):your project name.
|
77
77
|
- table(Required):your table name.
|
78
78
|
- fields(Required): must match the keys in source.
|
79
|
-
- partition(Optional)
|
79
|
+
- partition(Optional):set this if your table is partitioned.
|
80
80
|
- partition format:
|
81
81
|
- fix string: partition ctime=20150804
|
82
82
|
- key words: partition ctime=${remote}
|
data/VERSION
CHANGED
@@ -1 +1 @@
|
|
1
|
-
0.1.
|
1
|
+
0.1.3
|
data/build.sh
ADDED
@@ -0,0 +1,10 @@
|
|
1
|
+
#!/bin/bash
|
2
|
+
mkdir package
|
3
|
+
mkdir package/temp
|
4
|
+
mkdir package/temp/fluentd
|
5
|
+
gem build fluent-plugin-aliyun-odps.gemspec
|
6
|
+
cp ext/* ./package/temp/fluentd/ -r
|
7
|
+
cp README.cn.md ./package/temp/fluentd/README.cn.md
|
8
|
+
cp example.conf ./package/temp/fluentd/example.conf
|
9
|
+
cp fluent-plugin-aliyun-odps-*.gem ./package/temp/fluentd/dependency_gem/
|
10
|
+
tar zcvf ./package/fluentd_package.tar.gz -C ./package/temp/ .
|
data/example.conf
ADDED
@@ -0,0 +1,26 @@
|
|
1
|
+
<source>
|
2
|
+
type tail
|
3
|
+
path /path/to/students.csv
|
4
|
+
tag input.csv
|
5
|
+
format csv
|
6
|
+
</source>
|
7
|
+
<match input.*>
|
8
|
+
type aliyun_odps
|
9
|
+
aliyun_access_id ************
|
10
|
+
aliyun_access_key *********
|
11
|
+
aliyun_odps_endpoint http://service.odps.aliyun.com/api
|
12
|
+
aliyun_odps_hub_endpoint http://dh.odps.aliyun.com
|
13
|
+
buffer_chunk_limit 2m
|
14
|
+
buffer_queue_limit 128
|
15
|
+
flush_interval 5s
|
16
|
+
project your_projectName #填写需要导入数据的project名称
|
17
|
+
enable_fast_crc true
|
18
|
+
<table input.csv>
|
19
|
+
table students
|
20
|
+
fields id,name,score
|
21
|
+
shard_number 1
|
22
|
+
retry_time 3
|
23
|
+
retry_interval 1
|
24
|
+
abandon_mode true
|
25
|
+
</table>
|
26
|
+
</match>
|
@@ -12,16 +12,15 @@ Gem::Specification.new do |gem|
|
|
12
12
|
gem.email = "dongxiao.dx@alibaba-inc.com"
|
13
13
|
gem.has_rdoc = false
|
14
14
|
#gem.platform = Gem::Platform::RUBY
|
15
|
-
gem.files = `git ls-files`.split("\n")
|
15
|
+
gem.files = `git ls-files | grep -v ext | grep -v package`.split("\n")
|
16
16
|
gem.test_files = `git ls-files -- {test,spec,features}/*`.split("\n")
|
17
17
|
gem.executables = `git ls-files -- bin/*`.split("\n").map{ |f| File.basename(f) }
|
18
18
|
gem.require_paths = ['lib']
|
19
19
|
|
20
20
|
gem.add_dependency "fluentd", [">= 0.10.49", "< 2"]
|
21
|
-
gem.add_dependency "protobuf",
|
21
|
+
gem.add_dependency "protobuf", '~> 3.5', '>= 3.5.1'
|
22
22
|
gem.add_dependency "yajl-ruby", "~> 1.0"
|
23
|
-
gem.
|
24
|
-
gem.add_development_dependency "
|
25
|
-
gem.add_development_dependency "
|
26
|
-
gem.add_development_dependency "test-unit", ">= 3.0.8"
|
23
|
+
gem.add_development_dependency "rake", '~> 0.9', '>= 0.9.2'
|
24
|
+
gem.add_development_dependency "flexmock", '~> 1.2', '>= 1.2.0'
|
25
|
+
gem.add_development_dependency "test-unit", '~> 3.0', '>= 3.0.8'
|
27
26
|
end
|
@@ -110,66 +110,73 @@ module Fluent
|
|
110
110
|
begin
|
111
111
|
#if partition is not empty
|
112
112
|
unless @partition.blank? then
|
113
|
-
|
114
|
-
|
115
|
-
|
116
|
-
|
117
|
-
|
118
|
-
|
119
|
-
|
120
|
-
|
121
|
-
|
122
|
-
|
123
|
-
|
124
|
-
|
125
|
-
|
126
|
-
if
|
127
|
-
|
113
|
+
begin
|
114
|
+
#if partition has params in it
|
115
|
+
if @partition.include? "=${"
|
116
|
+
#split partition
|
117
|
+
partition_arrays=@partition.split(',')
|
118
|
+
partition_name=''
|
119
|
+
i=1
|
120
|
+
for p in partition_arrays do
|
121
|
+
#if partition is time formated
|
122
|
+
if p.include? "strftime"
|
123
|
+
key=p[p.index("{")+1, p.index(".strftime")-1-p.index("{")]
|
124
|
+
partition_column=p[0, p.index("=")]
|
125
|
+
timeFormat=p[p.index("(")+2, p.index(")")-3-p.index("(")]
|
126
|
+
if data.has_key?(key)
|
127
|
+
if time_format == nil
|
128
|
+
partition_value=Time.parse(data[key]).strftime(timeFormat)
|
129
|
+
else
|
130
|
+
partition_value=Time.strptime(data[key], time_format).strftime(timeFormat)
|
131
|
+
end
|
132
|
+
if i==1
|
133
|
+
partition_name+=partition_column+"="+partition_value
|
134
|
+
else
|
135
|
+
partition_name+=","+partition_column+"="+partition_value
|
136
|
+
end
|
128
137
|
else
|
129
|
-
|
138
|
+
raise "partition has no corresponding source key or the partition expression is wrong,"+data.to_s
|
130
139
|
end
|
131
|
-
|
132
|
-
|
140
|
+
elsif p.include? "=${"
|
141
|
+
key=p[p.index("{")+1, p.index("}")-1-p.index("{")]
|
142
|
+
partition_column=p[0, p.index("=")]
|
143
|
+
if data.has_key?(key)
|
144
|
+
partition_value=data[key]
|
145
|
+
if i==1
|
146
|
+
partition_name+=partition_column+"="+partition_value
|
147
|
+
else
|
148
|
+
partition_name+=","+partition_column+"="+partition_value
|
149
|
+
end
|
133
150
|
else
|
134
|
-
|
151
|
+
raise "partition has no corresponding source key or the partition expression is wrong,"+data.to_s
|
135
152
|
end
|
136
153
|
else
|
137
|
-
raise "partition has no corresponding source key or the partition expression is wrong,"+data.to_s
|
138
|
-
end
|
139
|
-
elsif p.include? "=${"
|
140
|
-
key=p[p.index("{")+1, p.index("}")-1-p.index("{")]
|
141
|
-
partition_column=p[0, p.index("=")]
|
142
|
-
if data.has_key?(key)
|
143
|
-
partition_value=data[key]
|
144
154
|
if i==1
|
145
|
-
partition_name+=
|
155
|
+
partition_name+=p
|
146
156
|
else
|
147
|
-
partition_name+=","+
|
157
|
+
partition_name+=","+p
|
148
158
|
end
|
149
|
-
else
|
150
|
-
raise "partition has no corresponding source key or the partition expression is wrong,"+data.to_s
|
151
|
-
end
|
152
|
-
else
|
153
|
-
if i==1
|
154
|
-
partition_name+=p
|
155
|
-
else
|
156
|
-
partition_name+=","+p
|
157
159
|
end
|
160
|
+
i+=1
|
158
161
|
end
|
159
|
-
|
162
|
+
else
|
163
|
+
partition_name=@partition
|
164
|
+
end
|
165
|
+
if partitions[partition_name]==nil
|
166
|
+
partitions[partition_name]=[]
|
167
|
+
end
|
168
|
+
partitions[partition_name] << @format_proc.call(data)
|
169
|
+
rescue => ex
|
170
|
+
if (@abandon_mode)
|
171
|
+
@log.error "Format partition failed, abandon this record. Msg:" +ex.message + " Table:" + @table
|
172
|
+
@log.error "Drop data:" + data.to_s
|
173
|
+
else
|
174
|
+
raise e
|
160
175
|
end
|
161
|
-
else
|
162
|
-
partition_name=@partition
|
163
|
-
end
|
164
|
-
if partitions[partition_name]==nil
|
165
|
-
partitions[partition_name]=[]
|
166
176
|
end
|
167
|
-
partitions[partition_name] << @format_proc.call(data)
|
168
|
-
|
169
177
|
else
|
170
178
|
records << @format_proc.call(data)
|
171
179
|
end
|
172
|
-
|
173
180
|
rescue => e
|
174
181
|
raise "Failed to format the data:"+ e.message + " " +e.backtrace.inspect.to_s
|
175
182
|
end
|
@@ -214,6 +221,7 @@ module Fluent
|
|
214
221
|
else
|
215
222
|
if (@abandon_mode)
|
216
223
|
@log.error "Retry failed, abandon this pack. Msg:" + e.message + " partitions:" + k.to_s + " table:" + @table
|
224
|
+
@log.error v[sendCount*threadId..sendCount*(threadId+1)+restCount-1]
|
217
225
|
else
|
218
226
|
raise e
|
219
227
|
end
|
@@ -252,6 +260,7 @@ module Fluent
|
|
252
260
|
else
|
253
261
|
if (@abandon_mode)
|
254
262
|
@log.error "Retry failed, abandon this pack. Msg:" + e.message + " Table:" + @table
|
263
|
+
@log.error records[sendCount*threadId..sendCount*(threadId+1)+restCount-1]
|
255
264
|
else
|
256
265
|
raise e
|
257
266
|
end
|
metadata
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: fluent-plugin-aliyun-odps
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.1.
|
4
|
+
version: 0.1.3
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Xiao Dong
|
@@ -9,7 +9,7 @@ authors:
|
|
9
9
|
autorequire:
|
10
10
|
bindir: bin
|
11
11
|
cert_chain: []
|
12
|
-
date: 2016-03-
|
12
|
+
date: 2016-03-16 00:00:00.000000000 Z
|
13
13
|
dependencies:
|
14
14
|
- !ruby/object:Gem::Dependency
|
15
15
|
name: fluentd
|
@@ -36,6 +36,9 @@ dependencies:
|
|
36
36
|
requirement: !ruby/object:Gem::Requirement
|
37
37
|
requirements:
|
38
38
|
- - "~>"
|
39
|
+
- !ruby/object:Gem::Version
|
40
|
+
version: '3.5'
|
41
|
+
- - ">="
|
39
42
|
- !ruby/object:Gem::Version
|
40
43
|
version: 3.5.1
|
41
44
|
type: :runtime
|
@@ -43,6 +46,9 @@ dependencies:
|
|
43
46
|
version_requirements: !ruby/object:Gem::Requirement
|
44
47
|
requirements:
|
45
48
|
- - "~>"
|
49
|
+
- !ruby/object:Gem::Version
|
50
|
+
version: '3.5'
|
51
|
+
- - ">="
|
46
52
|
- !ruby/object:Gem::Version
|
47
53
|
version: 3.5.1
|
48
54
|
- !ruby/object:Gem::Dependency
|
@@ -59,24 +65,13 @@ dependencies:
|
|
59
65
|
- - "~>"
|
60
66
|
- !ruby/object:Gem::Version
|
61
67
|
version: '1.0'
|
62
|
-
- !ruby/object:Gem::Dependency
|
63
|
-
name: fluent-mixin-config-placeholders
|
64
|
-
requirement: !ruby/object:Gem::Requirement
|
65
|
-
requirements:
|
66
|
-
- - ">="
|
67
|
-
- !ruby/object:Gem::Version
|
68
|
-
version: '0'
|
69
|
-
type: :runtime
|
70
|
-
prerelease: false
|
71
|
-
version_requirements: !ruby/object:Gem::Requirement
|
72
|
-
requirements:
|
73
|
-
- - ">="
|
74
|
-
- !ruby/object:Gem::Version
|
75
|
-
version: '0'
|
76
68
|
- !ruby/object:Gem::Dependency
|
77
69
|
name: rake
|
78
70
|
requirement: !ruby/object:Gem::Requirement
|
79
71
|
requirements:
|
72
|
+
- - "~>"
|
73
|
+
- !ruby/object:Gem::Version
|
74
|
+
version: '0.9'
|
80
75
|
- - ">="
|
81
76
|
- !ruby/object:Gem::Version
|
82
77
|
version: 0.9.2
|
@@ -84,6 +79,9 @@ dependencies:
|
|
84
79
|
prerelease: false
|
85
80
|
version_requirements: !ruby/object:Gem::Requirement
|
86
81
|
requirements:
|
82
|
+
- - "~>"
|
83
|
+
- !ruby/object:Gem::Version
|
84
|
+
version: '0.9'
|
87
85
|
- - ">="
|
88
86
|
- !ruby/object:Gem::Version
|
89
87
|
version: 0.9.2
|
@@ -91,6 +89,9 @@ dependencies:
|
|
91
89
|
name: flexmock
|
92
90
|
requirement: !ruby/object:Gem::Requirement
|
93
91
|
requirements:
|
92
|
+
- - "~>"
|
93
|
+
- !ruby/object:Gem::Version
|
94
|
+
version: '1.2'
|
94
95
|
- - ">="
|
95
96
|
- !ruby/object:Gem::Version
|
96
97
|
version: 1.2.0
|
@@ -98,6 +99,9 @@ dependencies:
|
|
98
99
|
prerelease: false
|
99
100
|
version_requirements: !ruby/object:Gem::Requirement
|
100
101
|
requirements:
|
102
|
+
- - "~>"
|
103
|
+
- !ruby/object:Gem::Version
|
104
|
+
version: '1.2'
|
101
105
|
- - ">="
|
102
106
|
- !ruby/object:Gem::Version
|
103
107
|
version: 1.2.0
|
@@ -105,6 +109,9 @@ dependencies:
|
|
105
109
|
name: test-unit
|
106
110
|
requirement: !ruby/object:Gem::Requirement
|
107
111
|
requirements:
|
112
|
+
- - "~>"
|
113
|
+
- !ruby/object:Gem::Version
|
114
|
+
version: '3.0'
|
108
115
|
- - ">="
|
109
116
|
- !ruby/object:Gem::Version
|
110
117
|
version: 3.0.8
|
@@ -112,6 +119,9 @@ dependencies:
|
|
112
119
|
prerelease: false
|
113
120
|
version_requirements: !ruby/object:Gem::Requirement
|
114
121
|
requirements:
|
122
|
+
- - "~>"
|
123
|
+
- !ruby/object:Gem::Version
|
124
|
+
version: '3.0'
|
115
125
|
- - ">="
|
116
126
|
- !ruby/object:Gem::Version
|
117
127
|
version: 3.0.8
|
@@ -129,6 +139,8 @@ files:
|
|
129
139
|
- README.md
|
130
140
|
- Rakefile
|
131
141
|
- VERSION
|
142
|
+
- build.sh
|
143
|
+
- example.conf
|
132
144
|
- fluent-plugin-aliyun-odps.gemspec
|
133
145
|
- lib/fluent/plugin/conf/config.rb
|
134
146
|
- lib/fluent/plugin/crc/crc.rb
|
@@ -139,7 +151,6 @@ files:
|
|
139
151
|
- lib/fluent/plugin/crc/origin/crc32c.rb
|
140
152
|
- lib/fluent/plugin/crc/src/crc32c.c
|
141
153
|
- lib/fluent/plugin/crc/src/crc32c.h
|
142
|
-
- lib/fluent/plugin/crc/src/extconf.rb
|
143
154
|
- lib/fluent/plugin/exceptions.rb
|
144
155
|
- lib/fluent/plugin/http/http_connection.rb
|
145
156
|
- lib/fluent/plugin/http/http_flag.rb
|
@@ -154,7 +165,6 @@ files:
|
|
154
165
|
- lib/fluent/plugin/stream_client.rb
|
155
166
|
- lib/fluent/plugin/stream_reader.rb
|
156
167
|
- lib/fluent/plugin/stream_writer.rb
|
157
|
-
- odps_example.conf
|
158
168
|
homepage: https://github.com/aliyun/aliyun-odps-fluentd-plugin
|
159
169
|
licenses:
|
160
170
|
- Apache-2.0
|
data/odps_example.conf
DELETED
@@ -1,31 +0,0 @@
|
|
1
|
-
####
|
2
|
-
## Output descriptions:
|
3
|
-
##
|
4
|
-
|
5
|
-
<source>
|
6
|
-
type tail
|
7
|
-
path /opt/log/in/in.log
|
8
|
-
refresh_interval 5s
|
9
|
-
tag in.log
|
10
|
-
format csv
|
11
|
-
keys dt,week,r1,r2,r3,r4,r5,r6,r7,blue
|
12
|
-
</source>
|
13
|
-
|
14
|
-
<match in.**>
|
15
|
-
type aliyun_odps
|
16
|
-
aliyun_access_id ************
|
17
|
-
aliyun_access_key *********
|
18
|
-
aliyun_odps_endpoint http://service.odps.aliyun.com/api
|
19
|
-
aliyun_odps_hub_endpoint http://dh.odps.aliyun.com
|
20
|
-
buffer_chunk_limit 2m
|
21
|
-
buffer_queue_limit 128
|
22
|
-
flush_interval 5s
|
23
|
-
project your_projectName
|
24
|
-
enable_fast_crc false
|
25
|
-
<table in.log>
|
26
|
-
table your_tableName
|
27
|
-
fields r1,r2,r3,r4,r5,r6,blue
|
28
|
-
partition ctime=${dt.strftime('%Y%m%d')}
|
29
|
-
shard_number 1
|
30
|
-
</table>
|
31
|
-
</match>
|