fluent-plugin-aliyun-odps 0.1.2 → 0.1.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: f583712920902f51d0f4b2d50f1350e73ef43c37
4
- data.tar.gz: 420ac14c5256bd3aae8be1693b856c7f18d6520d
3
+ metadata.gz: 5b96ab3b2194318e749ee184cd5bdf8bb4a4875c
4
+ data.tar.gz: a846f80a43b87c5551491a5c8e4c3d387b8e003f
5
5
  SHA512:
6
- metadata.gz: cd2b20d9faa6e79f3ea1d8f5e61f2aab0f3f2ccdb7e2ceee83974a70d64abe30dcad0e352ff42479e499bf43ae21f0553cd997ef4c7025797f3e1f9445740fd5
7
- data.tar.gz: b7cea3a104ec7a849741738977f6f164621637a4e9e3196639eefb56a10ee5eb8b3519dcd5b00a07487d5ed5d5193c2e4b2e267482b4adac2aa2f93f60d7e309
6
+ metadata.gz: 592b3a66fa676ecd783e2e999c82a6757d1a387fe3a0731c5f195e8d44ba2065f55df0b90535611040ff6bbdfda0449d4d16f9c6e6c61db6153e8b7fe5076403
7
+ data.tar.gz: 56164832ec7b335daf343ba0a02a464e89830028dbad1430177fa19bf153177e31d81938da81708fb59a18e1374249cd013451061c8594f528b651a9fa4dcb80
data/.gitignore CHANGED
@@ -4,5 +4,6 @@
4
4
  *.iml
5
5
  .idea
6
6
  target/
7
+ package/
7
8
  .DS_Store
8
9
  *.pyc
data/CHANGELOG.md CHANGED
@@ -3,9 +3,9 @@ Fix datetime format bug, support String, DateTime, Time type when write to a dat
3
3
  0.0.5
4
4
  Add reload shard when import fails, and remove unload shard operation when shut down.
5
5
  0.0.6
6
- Add decimal support��fix string input while setting double and int.
6
+ Add decimal supportfix string input while setting double and int.
7
7
  0.0.7
8
- Add error msg when add partition fail, support fast crc�� remove pack size limit.
8
+ Add error msg when add partition fail, support fast crc remove pack size limit.
9
9
  0.0.8
10
10
  Add abandon mode, fix fluent retry bug, fix partition mixed mode bug.
11
11
  0.0.9
@@ -15,4 +15,6 @@ Add partition when catch NoSuchPartition.
15
15
  0.1.1
16
16
  Fix some log format.
17
17
  0.1.2
18
- Use XStreamPack.
18
+ Use XStreamPack.
19
+ 0.1.3
20
+ Drop record with error log when parse partition failed.
data/README.cn.md CHANGED
@@ -1,47 +1,122 @@
1
1
  # Aliyun ODPS Plugin for Fluentd
2
2
 
3
- ## ��ʼʹ��
3
+ ## 开始使用
4
4
  ---
5
5
 
6
- ### ����
6
+ ### 介绍
7
7
 
8
- - �������ݴ�������(Open Data Processing Service�����ODPS)�ǰ���Ͱ������з��ĺ������ݴ���ƽ̨����Ҫ�����������ṹ�����ݵĴ洢�ͼ��㣬�����ṩ�������ݲֿ�Ľ�������Լ���Դ����ݵķ�����ģ����
9
- - ODPS DataHub Service(DHS)��һ��ODPS���ڽ��������û��ṩʵʱ���ݵķ���(Publish)�Ͷ���(Subscribe)�Ĺ��ܡ�
8
+ - 开放数据处理服务(Open Data Processing Service,简称ODPS)是阿里巴巴自主研发的海量数据处理平台。主要服务于批量结构化数据的存储和计算,可以提供海量数据仓库的解决方案以及针对大数据的分析建模服务。
9
+ - ODPS DataHub Service(DHS)是一个ODPS的内建服务,向用户提供实时数据的发布(Publish)和订阅(Subscribe)的功能。发布的数据会自动被写入ODPS表中。所以DHS也可以做为ODPS导入数据的一个入口。
10
+ - 本插件提供向odps表通过DataHub服务写入数据的能力,并具备按用户要求的格式自动创建分区的功能。
10
11
 
11
12
 
12
- ### ����Ҫ��
13
+ ### 环境要求
13
14
 
14
- ʹ�ô˲������Ҫ�߱����»���:
15
+ 使用此插件,需要具备如下环境:
15
16
 
16
- 1. Ruby 2.1.0 �����
17
- 2. Gem 2.4.5 �����
18
- 3. Fluentd-0.10.49 ����� (*[Home Page](http://www.fluentd.org/)*)
19
- 4. Protobuf-3.5.1 �����(Ruby protobuf)
17
+ 1. Ruby 2.1.0 或更新
18
+ 2. Gem 2.4.5 或更新
19
+ 3. Fluentd-0.10.49 或更新 (*[Home Page](http://www.fluentd.org/)*)
20
+ 4. Protobuf-3.5.1 或更新(Ruby protobuf)
20
21
  5. Ruby-devel
21
22
 
22
- ### GEM��װ
23
+ ### 安装部署
24
+ 安装部署Fluentd可以选择以下两种方式之一。
25
+ 1. 一键安装包适用于第一次安装Ruby&Fluentd环境的用户或局域网用户,一键安装包包含了所需的Ruby环境以及Fluentd。目前一键安装包仅支持Linux环境。
26
+ 2. 通过网络安装适用于对Ruby有了解的用户,需要提前确认Ruby版本,若低于2.1.0则需要升级或安装更高级的Ruby环境,然后通过RubyGem安装Fluentd。
23
27
 
24
- ��ruby gem��װʹ��:
28
+ 注:
29
+ * RubyGem源建议更改为https://ruby.taobao.org/
30
+ * 局域网环境安装可以通过本地安装Gem文件
31
+ ```
32
+ gem install --local fluent-plugin-aliyun-odps-0.1.2.gem
33
+ ```
25
34
 
35
+ #### 安装方式一:一键安装包安装
36
+ 1. 下载解压 [fluentd_package.tar.gz](http://gitlab.alibaba-inc.com/aliopensource/aliyun-odps-fluentd-plugin/blob/master/package/fluentd_package.tar.gz)
37
+ 2. 可以修改install_agent.sh中$DIR为你想安装ruby的路径,默认会安装在当前路径下面
38
+ 3. 执行如下命令,提示“Success”表示安装成功
26
39
  ```
27
- $ gem install fluent-plugin-aliyun-odps
40
+ bash install_agent.sh
28
41
  ```
42
+ 4. fluentd程序会被安装在当前目录的bin目录下面
29
43
 
30
- ### ���ʹ��ʾ��
44
+ #### 安装方式二:通过网络安装
45
+ 1. Ruby安装(已经存在Ruby 2.1.0以上环境可忽略此步骤):
46
+ ```
47
+ wget https://cache.ruby-lang.org/pub/ruby/2.3/ruby-2.3.0.tar.gz
48
+ tar xzvf ruby-2.3.0.tar.gz
49
+ cd ruby-2.3.0
50
+ ./configure --prefix=DIR
51
+ make
52
+ make install
53
+ ```
54
+ 2 Fluentd以及插件安装
55
+ ```
56
+ $ gem install fluent-plugin-aliyun-odps
57
+ ```
31
58
 
59
+ ### 插件使用示例
60
+ #### 示例一 上传csv文件中的数据
61
+ 1. 首先需要在odps准备一张表,在这里假设表名为 students, 包含三个字段 id, name, score, 类型分别为string, string, bigint
62
+ 2. 准备csv数据文件, 假设数据文件内容如下
63
+ ```
64
+ 1, jack ma, 90
65
+ 2, pony zhang, 85
66
+ 3, lucy wang, 88
67
+ ```
68
+ 3. 准备fluentd配置文件, 保存以下内容为文件fluentd.conf。
32
69
  ```
33
70
  <source>
34
71
  type tail
35
- path /opt/log/in/in.log
36
- pos_file /opt/log/in/in.log.pos
37
- refresh_interval 5s
38
- tag in.log
39
- format /^(?<remote>[^ ]*) - - \[(?<datetime>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^\"]*?)(?: +\S*)?)?" (?<code>[^ ]*) (?<size>[^ ]*) "-" "(?<agent>[^\"]*)"$/
40
- time_format %Y%b%d %H:%M:%S %z
72
+ path /path/to/students.csv
73
+ tag input.csv
74
+ format csv
41
75
  </source>
76
+ <match input.*>
77
+ type aliyun_odps
78
+ aliyun_access_id ************
79
+ aliyun_access_key *********
80
+ aliyun_odps_endpoint http://service.odps.aliyun.com/api
81
+ aliyun_odps_hub_endpoint http://dh.odps.aliyun.com
82
+ buffer_chunk_limit 2m
83
+ buffer_queue_limit 128
84
+ flush_interval 5s
85
+ project your_projectName #填写需要导入数据的project名称
86
+ enable_fast_crc true
87
+ <table input.csv>
88
+ table students
89
+ fields id,name,score
90
+ shard_number 1
91
+ retry_time 3
92
+ retry_interval 1
93
+ abandon_mode true
94
+ </table>
95
+ </match>
96
+ ```
97
+ 4. 执行fluentd命令,并用-c指定配置文件
98
+ ```
99
+ fluentd -c fluentd.conf
100
+ ```
101
+ 5. 完成后可用如下sql命令查询数据
42
102
  ```
103
+ select * from students;
43
104
  ```
44
- <match in.**>
105
+
106
+ #### 示例二 抓取上传实时nginx日志文件
107
+ 1. 对于nginx日志文件,fluentd可用采用正则表达式的方式来解析数据。
108
+ 2. 参考使用如下配置文件,执行命令同示例一。
109
+ ```
110
+ <source>
111
+ type tail
112
+ path /home/admin/nginx/logs/access.log #nginx log 地址
113
+ pos_file /tmp/nginx.access.pos
114
+ refresh_interval 5s
115
+ tag nginx.access
116
+ format /^(?<remote>[^ ]*) - \[(?<dt>[^\]]*)\] "(?<method>\S+) ((?<path>[^\"]*) +\S*)?" (?<code>[^ ]*) (?<size>[^ ]*) "(?<agent>[^\"]*)" "(?<requesttime>[^\"]*)"? $/ #解析日志的正则表达式
117
+ time_format %d/%b/%Y:%H:%M:%S %z
118
+ </source>
119
+ <match nginx.access>
45
120
  type aliyun_odps
46
121
  aliyun_access_id ************
47
122
  aliyun_access_key *********
@@ -52,9 +127,10 @@ $ gem install fluent-plugin-aliyun-odps
52
127
  flush_interval 5s
53
128
  project your_projectName
54
129
  enable_fast_crc true
55
- <table in.log>
56
- table your_tableName
57
- fields remote,method,path,code,size,agent
130
+ <table nginx.access>
131
+ table nginx_logs #对应日志写入的odps表
132
+ fields remote,method,path,code,size,agent,requesttime
133
+ shard_number 5
58
134
  partition ctime=${datetime.strftime('%Y%m%d')}
59
135
  time_format %d/%b/%Y:%H:%M:%S %z
60
136
  shard_number 1
@@ -64,68 +140,68 @@ $ gem install fluent-plugin-aliyun-odps
64
140
  </table>
65
141
  </match>
66
142
  ```
67
- ### ����˵��
68
-
69
- - type(Fixed): �̶�ֵ aliyun_odps.
70
- - aliyun_access_id(Required):������access_id.
71
- - aliyun_access_key(Required):������access key.
72
- - aliyun_odps_hub_endpoint(Required):�����ķ�������ECS�ϣ���ѱ�ֵ�趨Ϊ http://dh-ext.odps.aliyun-inc.com, ��������Ϊ http://dh.odps.aliyun.com.
73
- - aliyunodps_endpoint(Required):�����ķ�������ECS�ϣ���ѱ�ֵ�趨Ϊ http://odps-ext.aiyun-inc.com/api, ��������Ϊ http://service.odps.aliyun.com/api .
74
- - buffer_chunk_limit(Optional): ���С��֧�֡�k��(KB),��m��(MB)��λ��Ĭ�� 8MB������ֵ2MB, Ŀǰ���֧��20MB.
75
- - buffer_queue_limit(Optional): ����д�С����ֵ��buffer_chunk_limit��ͬ����������������С��
76
- - flush_interval(Optional): ǿ�Ʒ��ͼ�����ﵽʱ��������δ����ǿ�Ʒ���, Ĭ�� 60s.
77
- - abandon_mode(Optional):�����������κ�������pack���ݡ�
78
- - project(Required): project����.
79
- - table(Required): table����.
80
- - fields(Required): ��source��Ӧ���ֶ������������source֮��.
81
- - partition(Optional)����Ϊ�������������ô���.
82
- - ������֧�ֵ�����ģʽ:
83
- - �̶�ֵ: partition ctime=20150804
84
- - �ؼ���: partition ctime=${remote} ������remoteΪsource��ij�ֶΣ�
85
- - ʱ���ʽ�ؼ���: partition ctime=${datetime.strftime('%Y%m%d')} ������datetimeΪsource��ijʱ���ʽ�ֶΣ����Ϊ%Y%m%d��ʽ��Ϊ�������ƣ�
86
- - time_format(Optional):
87
- - ���ʹ��ʱ���ʽ�ؼ���Ϊ<partition>, �����ñ�����. ����: source[datetime]="29/Aug/2015:11:10:16 +0800",������<time_format>Ϊ"%d/%b/%Y:%H:%M:%S %z"
88
- - shard_number(Optional):ָ��shard���������������shard[0,shard_number-1]��Χ�ڵ�shard��д�����ݣ�����Ϊ����0��С��table��Ӧshard�������޵�����.
89
- - enable_fast_crc(Optional): ʹ�ÿ���crc���㣬�⽫�����������ܣ���������ʹ�����ⲿ���صĶ�̬���ӿ⣬Ŀǰ��֧��64λlinux��windowsϵͳ.
90
- - retry_time(Optional): ����ÿ��pack����ʱ�������Դ�����Ĭ��3��.
91
- - retry_interval(Optional): ���Լ����Ĭ��1s.
92
- - abandon_mode(Optional): Ĭ��Ϊfalse�����ó�true��������retry_time�����������ݰ�������Ὣ�쳣���͸�fluentd������fluentd�����Ի������ԣ�����������ܻᵼ�������ظ�.
93
-
94
- ## �ٷ���վ
95
- ---
96
-
97
- - [Fluentd User Guide](http://docs.fluentd.org/)
98
-
99
- ## ����
100
- ---
101
143
 
102
- - [Sun Zongtao]()
103
- - [Cai Ying]()
104
- - [Dong Xiao](https://github.com/dongxiao1198)
105
- - [Yang Hongbo](https://github.com/hongbosoftware)
106
-
107
- ## License
108
- ---
109
-
110
- licensed under the [Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0.html)
111
-
112
- ## ����ʹ�������Լ��쳣����
144
+ #### 示例三导入MySQL中的数据
145
+ 1. mysql
146
+
147
+ ### 参数说明
148
+
149
+ - type(Fixed): 固定值 aliyun_odps.
150
+ - aliyun_access_id(Required):阿里云access_id.
151
+ - aliyun_access_key(Required):阿里云access key.
152
+ - aliyun_odps_hub_endpoint(Required):如果你的服务部署在ECS上,请把本值设定为 http://dh-ext.odps.aliyun-inc.com, 否则设置为 http://dh.odps.aliyun.com.
153
+ - aliyunodps_endpoint(Required):如果你的服务部署在ECS上,请把本值设定为 http://odps-ext.aiyun-inc.com/api, 否则设置为 http://service.odps.aliyun.com/api .
154
+ - buffer_chunk_limit(Optional): 块大小,支持“k”(KB),“m”(MB)单位,默认 8MB,建议值2MB, 目前最大支持20MB.
155
+ - buffer_queue_limit(Optional): 块队列大小,此值与buffer_chunk_limit共同决定整个缓冲区大小。
156
+ - flush_interval(Optional): 强制发送间隔,达到时间后块数据未满则强制发送, 默认 60s.
157
+ - abandon_mode(Optional):内置重试三次后抛弃该pack数据。
158
+ - project(Required): project名称.
159
+ - table(Required): table名称.
160
+ - fields(Required): 与source对应,字段名必须存在于source之中.
161
+ - partition(Optional):若为分区表,则设置此项.
162
+ - 分区名支持的设置模式:
163
+ - 固定值: partition ctime=20150804
164
+ - 关键字: partition ctime=${remote} (其中remote为source中某字段)
165
+ - 时间格式关键字: partition ctime=${datetime.strftime('%Y%m%d')} (其中datetime为source中某时间格式字段,输出为%Y%m%d格式作为分区名称)
166
+ - time_format(Optional):
167
+ - 如果使用时间格式关键字为<partition>, 请设置本参数. 例如: source[datetime]="29/Aug/2015:11:10:16 +0800",则设置<time_format>为"%d/%b/%Y:%H:%M:%S %z"
168
+ - shard_number(Optional):指定shard数量,将会随机向shard[0,shard_number-1]范围内的shard中写入数据,必须为大于0且小于table对应shard数量上限的整数.
169
+ - enable_fast_crc(Optional): 使用快速crc计算,这将极大提升性能,但是由于使用了外部加载的动态链接库,目前仅支持64位linux、windows系统.
170
+ - retry_time(Optional): 发送每个pack数据时内置重试次数,默认3次.
171
+ - retry_interval(Optional): 重试间隔,默认1s.
172
+ - abandon_mode(Optional): 默认为false,设置成true会在重试retry_time后抛弃该数据包,否则会将异常抛送给fluentd,利用fluentd的重试机制重试,这种情况可能会导致数据重复.
173
+
174
+ ## 常见使用问题以及异常描述
113
175
  ---
114
- * �����׳��쳣InvalidShardId/ShardNotReady��ʲôԭ���£�
115
- - ����ϵͳ��������������ݳ���������⣬���ڶ�ʱ���ڻָ���
116
- - fluentd������ڶ��������鿴������shard_num�Ƿ����ó���һ����ֵ���򶼲����ã���������ò�һ���ǻᵼ���������ģ�shard_number�ٵĽ��̻�Ѷ���shard Unload����
117
- - ���ܴ��������ʹ��sdk�ȷ�ʽ������loadshard/unloadshard�Ȳ�����
118
- * enable_fast_crc��μ���Ƿ���ݣ�
119
- - ���������ú�������fluentd���̣�����ʱ����֤�����ʧ�ܻ��׳�����ԭ��reload���������֤�����������Ŀ¼������ldd�鿴aliyun-odps-fluentd-plugin/lib/fluent/plugin/crc/lib/linux/crc32c.so��
120
- * retry_time/retry_interval��fluentd�Դ���retry�к�����
121
- - fluentd�Դ�retryĬ�ϳ���36Сʱ���Ὣ����buffer_chunk�ط������ö�̬partition������ط�ȫ�����ݿ�����������ظ�������������ͻ�ʹ�ò���ڲ����ԣ��������ʧ�ܣ����ٸ���abandon_mode��ֵ�ж�������pack�����ݻ��ǽ���fluentd�ط�����buffer��
122
- * Warning��ErrorCode: NoSuchPartition, Message: write failed because The specified partition does not exist.��ʲô��˼��
123
- - ���������catch��Odps��NoSuchPartitionʱ��������������������������warn��ʾOdps���в��������ݶ�Ӧ���������Զ���������������ɹ�������Ϣ��ʾ��
124
- * Fluent::BufferQueueLimitError error="queue size exceeds limit"��ʲôԭ��
125
- - fluentd�ڶ�ȡ����-�������ݹ����У����ȶ�ȡ��һ��buffer�У������С����������buffer_chunk_limit��buffer_queue_limit��ͬ�������������������󣬺ܿ�������Ϊ�ѻ����ݵ���buffer���㣬���Գ�������buffer_queue_limit���������⡣
126
- * ���config�ļ���ηֱ�����һ��fluentd���̣�
127
- - ������ڶ��config�ļ�������ʹ��in_multiprocess������ͬʱ������ͬ�Ľ���������
128
- * partition has no corresponding source key or the partition expression is wrong.����쳣��ʲôԭ��
129
- - ����쳣��ʾ��source data���Ҳ���������partition�ֶ��е�ֵ������partition ctime=${remote}����remoteû�г�����source�У��������á�
130
- * Failed to format the data.����쳣��ʲôԭ��
131
- - ���������Ϣ�׳���������partition���̳������⣬����partition���ã���������д���������Ҳ��������������⡣
176
+ * 程序抛出异常InvalidShardId/ShardNotReady是什么原因导致?
177
+ - 可能系统正在升级,会短暂出现这个问题,会在短时间内恢复;
178
+ - fluentd如果存在多个进程请查看配置项shard_num是否都配置成了一样的值(或都不配置),如果配置不一样是会导致这个问题的,shard_number少的进程会把多余shard Unload掉;
179
+ - 可能存在另外的使用sdk等方式进行了loadshard/unloadshard等操作。
180
+ * enable_fast_crc如何检查是否兼容?
181
+ - 开启此配置后再启动fluentd进程,启动时会验证,如果失败会抛出错误原因(reload不会进行验证),或进入插件目录后利用ldd查看aliyun-odps-fluentd-plugin/lib/fluent/plugin/crc/lib/linux/crc32c.so
182
+ * retry_time/retry_intervalfluentd自带的retry有何区别?
183
+ - fluentd自带retry默认持续36小时,会将整个buffer_chunk重发,配置动态partition情况下重发全部数据可能造成数据重复。配置这两项就会使用插件内部重试,如果重试失败,会再根据abandon_mode的值判定放弃该pack的数据还是交给fluentd重发整个buffer
184
+ * WarningErrorCode: NoSuchPartition, Message: write failed because The specified partition does not exist.是什么意思?
185
+ - 本插件会再catchOdpsNoSuchPartition时会主动创建分区,如果遇到这个warn表示Odps表中不存在数据对应分区,会自动创建,如果创建成功会有信息提示。
186
+ * Fluent::BufferQueueLimitError error="queue size exceeds limit"是什么原因?
187
+ - fluentd在读取数据-发送数据过程中,会先读取到一个buffer中,具体大小根据配置中buffer_chunk_limitbuffer_queue_limit共同决定,如果遇到这个错误,很可能是因为堆积数据导致buffer不足,可以尝试增大buffer_queue_limit解决这个问题。
188
+ * 多个config文件如何分别启动一个fluentd进程?
189
+ - 如果存在多个config文件,可以使用in_multiprocess这个插件同时启动不同的进程来服务。
190
+ * partition has no corresponding source key or the partition expression is wrong.这个异常是什么原因?
191
+ - 这个异常表示在source data中找不到配置在partition字段中的值,例如partition ctime=${remote},而remote没有出现在source中,请检查配置。
192
+ * Failed to format the data.这个异常是什么原因?
193
+ - 这个错误信息抛出代表解析partition过程出现问题,请检查partition配置,如果数据中存在脏数据也可能遇到这个问题。
194
+ * 如何更改为淘宝源RubyGem?
195
+ - RubyGems 镜像[https://ruby.taobao.org/]
196
+
197
+ ## 官方网站
198
+ - [Fluentd User Guide](http://docs.fluentd.org/)
199
+
200
+ ## 作者
201
+ - [Sun Zongtao]()
202
+ - [Cai Ying]()
203
+ - [Dong Xiao](https://github.com/dongxiao1198)
204
+ - [Yang Hongbo](https://github.com/hongbosoftware)
205
+
206
+ ## License
207
+ licensed under the [Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0.html)
data/README.md CHANGED
@@ -69,14 +69,14 @@ $ gem install fluent-plugin-aliyun-odps
69
69
  - aliyun_access_key(Required):your aliyun access key.
70
70
  - aliyun_odps_hub_endpoint(Required):if you are using ECS, set it as http://dh-ext.odps.aliyun-inc.com, otherwise using http://dh.odps.aliyun.com.
71
71
  - aliyunodps_endpoint(Required):if you are using ECS, set it as http://odps-ext.aiyun-inc.com/api, otherwise using http://service.odps.aliyun.com/api .
72
- - buffer_chunk_limit(Optional):chunk size,��k�� (KB), ��m�� (MB), and ��g�� (GB) ��default 8MB��recommended number is 2MB�� max size is 20MB.
73
- - buffer_queue_limit(Optional):buffer chunk size��example: buffer_chunk_limit2m��buffer_queue_limit 128��then the total buffer size is 2*128MB.
72
+ - buffer_chunk_limit(Optional):chunk size,“k (KB), m (MB), and g (GB) default 8MBrecommended number is 2MB max size is 20MB.
73
+ - buffer_queue_limit(Optional):buffer chunk sizeexample: buffer_chunk_limit2mbuffer_queue_limit 128then the total buffer size is 2*128MB.
74
74
  - flush_interval(Optional):interval to flush data buffer, default 60s.
75
75
  - abandon_mode(Optional):drop pack after retry 3 times.
76
76
  - project(Required):your project name.
77
77
  - table(Required):your table name.
78
78
  - fields(Required): must match the keys in source.
79
- - partition(Optional)��set this if your table is partitioned.
79
+ - partition(Optional)set this if your table is partitioned.
80
80
  - partition format:
81
81
  - fix string: partition ctime=20150804
82
82
  - key words: partition ctime=${remote}
data/VERSION CHANGED
@@ -1 +1 @@
1
- 0.1.2
1
+ 0.1.3
data/build.sh ADDED
@@ -0,0 +1,10 @@
1
+ #!/bin/bash
2
+ mkdir package
3
+ mkdir package/temp
4
+ mkdir package/temp/fluentd
5
+ gem build fluent-plugin-aliyun-odps.gemspec
6
+ cp ext/* ./package/temp/fluentd/ -r
7
+ cp README.cn.md ./package/temp/fluentd/README.cn.md
8
+ cp example.conf ./package/temp/fluentd/example.conf
9
+ cp fluent-plugin-aliyun-odps-*.gem ./package/temp/fluentd/dependency_gem/
10
+ tar zcvf ./package/fluentd_package.tar.gz -C ./package/temp/ .
data/example.conf ADDED
@@ -0,0 +1,26 @@
1
+ <source>
2
+ type tail
3
+ path /path/to/students.csv
4
+ tag input.csv
5
+ format csv
6
+ </source>
7
+ <match input.*>
8
+ type aliyun_odps
9
+ aliyun_access_id ************
10
+ aliyun_access_key *********
11
+ aliyun_odps_endpoint http://service.odps.aliyun.com/api
12
+ aliyun_odps_hub_endpoint http://dh.odps.aliyun.com
13
+ buffer_chunk_limit 2m
14
+ buffer_queue_limit 128
15
+ flush_interval 5s
16
+ project your_projectName #填写需要导入数据的project名称
17
+ enable_fast_crc true
18
+ <table input.csv>
19
+ table students
20
+ fields id,name,score
21
+ shard_number 1
22
+ retry_time 3
23
+ retry_interval 1
24
+ abandon_mode true
25
+ </table>
26
+ </match>
@@ -12,16 +12,15 @@ Gem::Specification.new do |gem|
12
12
  gem.email = "dongxiao.dx@alibaba-inc.com"
13
13
  gem.has_rdoc = false
14
14
  #gem.platform = Gem::Platform::RUBY
15
- gem.files = `git ls-files`.split("\n")
15
+ gem.files = `git ls-files | grep -v ext | grep -v package`.split("\n")
16
16
  gem.test_files = `git ls-files -- {test,spec,features}/*`.split("\n")
17
17
  gem.executables = `git ls-files -- bin/*`.split("\n").map{ |f| File.basename(f) }
18
18
  gem.require_paths = ['lib']
19
19
 
20
20
  gem.add_dependency "fluentd", [">= 0.10.49", "< 2"]
21
- gem.add_dependency "protobuf", "~> 3.5.1"
21
+ gem.add_dependency "protobuf", '~> 3.5', '>= 3.5.1'
22
22
  gem.add_dependency "yajl-ruby", "~> 1.0"
23
- gem.add_dependency "fluent-mixin-config-placeholders"
24
- gem.add_development_dependency "rake", ">= 0.9.2"
25
- gem.add_development_dependency "flexmock", ">= 1.2.0"
26
- gem.add_development_dependency "test-unit", ">= 3.0.8"
23
+ gem.add_development_dependency "rake", '~> 0.9', '>= 0.9.2'
24
+ gem.add_development_dependency "flexmock", '~> 1.2', '>= 1.2.0'
25
+ gem.add_development_dependency "test-unit", '~> 3.0', '>= 3.0.8'
27
26
  end
@@ -17,7 +17,7 @@
17
17
  #under the License.
18
18
  #
19
19
  module OdpsDatahub
20
- $SDK_UA_STR = "ODPS Ruby SDK v0.1"
20
+ $SDK_UA_STR = "ODPS Ruby SDK v0.1.2"
21
21
  $MAX_PACK_SIZE = 2048*10*1024
22
22
  class HttpHeaders
23
23
  $AUTHORIZATION = "Authorization"
@@ -110,66 +110,73 @@ module Fluent
110
110
  begin
111
111
  #if partition is not empty
112
112
  unless @partition.blank? then
113
- #if partition has params in it
114
- if @partition.include? "=${"
115
- #split partition
116
- partition_arrays=@partition.split(',')
117
- partition_name=''
118
- i=1
119
- for p in partition_arrays do
120
- #if partition is time formated
121
- if p.include? "strftime"
122
- key=p[p.index("{")+1, p.index(".strftime")-1-p.index("{")]
123
- partition_column=p[0, p.index("=")]
124
- timeFormat=p[p.index("(")+2, p.index(")")-3-p.index("(")]
125
- if data.has_key?(key)
126
- if time_format == nil
127
- partition_value=Time.parse(data[key]).strftime(timeFormat)
113
+ begin
114
+ #if partition has params in it
115
+ if @partition.include? "=${"
116
+ #split partition
117
+ partition_arrays=@partition.split(',')
118
+ partition_name=''
119
+ i=1
120
+ for p in partition_arrays do
121
+ #if partition is time formated
122
+ if p.include? "strftime"
123
+ key=p[p.index("{")+1, p.index(".strftime")-1-p.index("{")]
124
+ partition_column=p[0, p.index("=")]
125
+ timeFormat=p[p.index("(")+2, p.index(")")-3-p.index("(")]
126
+ if data.has_key?(key)
127
+ if time_format == nil
128
+ partition_value=Time.parse(data[key]).strftime(timeFormat)
129
+ else
130
+ partition_value=Time.strptime(data[key], time_format).strftime(timeFormat)
131
+ end
132
+ if i==1
133
+ partition_name+=partition_column+"="+partition_value
134
+ else
135
+ partition_name+=","+partition_column+"="+partition_value
136
+ end
128
137
  else
129
- partition_value=Time.strptime(data[key], time_format).strftime(timeFormat)
138
+ raise "partition has no corresponding source key or the partition expression is wrong,"+data.to_s
130
139
  end
131
- if i==1
132
- partition_name+=partition_column+"="+partition_value
140
+ elsif p.include? "=${"
141
+ key=p[p.index("{")+1, p.index("}")-1-p.index("{")]
142
+ partition_column=p[0, p.index("=")]
143
+ if data.has_key?(key)
144
+ partition_value=data[key]
145
+ if i==1
146
+ partition_name+=partition_column+"="+partition_value
147
+ else
148
+ partition_name+=","+partition_column+"="+partition_value
149
+ end
133
150
  else
134
- partition_name+=","+partition_column+"="+partition_value
151
+ raise "partition has no corresponding source key or the partition expression is wrong,"+data.to_s
135
152
  end
136
153
  else
137
- raise "partition has no corresponding source key or the partition expression is wrong,"+data.to_s
138
- end
139
- elsif p.include? "=${"
140
- key=p[p.index("{")+1, p.index("}")-1-p.index("{")]
141
- partition_column=p[0, p.index("=")]
142
- if data.has_key?(key)
143
- partition_value=data[key]
144
154
  if i==1
145
- partition_name+=partition_column+"="+partition_value
155
+ partition_name+=p
146
156
  else
147
- partition_name+=","+partition_column+"="+partition_value
157
+ partition_name+=","+p
148
158
  end
149
- else
150
- raise "partition has no corresponding source key or the partition expression is wrong,"+data.to_s
151
- end
152
- else
153
- if i==1
154
- partition_name+=p
155
- else
156
- partition_name+=","+p
157
159
  end
160
+ i+=1
158
161
  end
159
- i+=1
162
+ else
163
+ partition_name=@partition
164
+ end
165
+ if partitions[partition_name]==nil
166
+ partitions[partition_name]=[]
167
+ end
168
+ partitions[partition_name] << @format_proc.call(data)
169
+ rescue => ex
170
+ if (@abandon_mode)
171
+ @log.error "Format partition failed, abandon this record. Msg:" +ex.message + " Table:" + @table
172
+ @log.error "Drop data:" + data.to_s
173
+ else
174
+ raise e
160
175
  end
161
- else
162
- partition_name=@partition
163
- end
164
- if partitions[partition_name]==nil
165
- partitions[partition_name]=[]
166
176
  end
167
- partitions[partition_name] << @format_proc.call(data)
168
-
169
177
  else
170
178
  records << @format_proc.call(data)
171
179
  end
172
-
173
180
  rescue => e
174
181
  raise "Failed to format the data:"+ e.message + " " +e.backtrace.inspect.to_s
175
182
  end
@@ -214,6 +221,7 @@ module Fluent
214
221
  else
215
222
  if (@abandon_mode)
216
223
  @log.error "Retry failed, abandon this pack. Msg:" + e.message + " partitions:" + k.to_s + " table:" + @table
224
+ @log.error v[sendCount*threadId..sendCount*(threadId+1)+restCount-1]
217
225
  else
218
226
  raise e
219
227
  end
@@ -252,6 +260,7 @@ module Fluent
252
260
  else
253
261
  if (@abandon_mode)
254
262
  @log.error "Retry failed, abandon this pack. Msg:" + e.message + " Table:" + @table
263
+ @log.error records[sendCount*threadId..sendCount*(threadId+1)+restCount-1]
255
264
  else
256
265
  raise e
257
266
  end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: fluent-plugin-aliyun-odps
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.2
4
+ version: 0.1.3
5
5
  platform: ruby
6
6
  authors:
7
7
  - Xiao Dong
@@ -9,7 +9,7 @@ authors:
9
9
  autorequire:
10
10
  bindir: bin
11
11
  cert_chain: []
12
- date: 2016-03-04 00:00:00.000000000 Z
12
+ date: 2016-03-16 00:00:00.000000000 Z
13
13
  dependencies:
14
14
  - !ruby/object:Gem::Dependency
15
15
  name: fluentd
@@ -36,6 +36,9 @@ dependencies:
36
36
  requirement: !ruby/object:Gem::Requirement
37
37
  requirements:
38
38
  - - "~>"
39
+ - !ruby/object:Gem::Version
40
+ version: '3.5'
41
+ - - ">="
39
42
  - !ruby/object:Gem::Version
40
43
  version: 3.5.1
41
44
  type: :runtime
@@ -43,6 +46,9 @@ dependencies:
43
46
  version_requirements: !ruby/object:Gem::Requirement
44
47
  requirements:
45
48
  - - "~>"
49
+ - !ruby/object:Gem::Version
50
+ version: '3.5'
51
+ - - ">="
46
52
  - !ruby/object:Gem::Version
47
53
  version: 3.5.1
48
54
  - !ruby/object:Gem::Dependency
@@ -59,24 +65,13 @@ dependencies:
59
65
  - - "~>"
60
66
  - !ruby/object:Gem::Version
61
67
  version: '1.0'
62
- - !ruby/object:Gem::Dependency
63
- name: fluent-mixin-config-placeholders
64
- requirement: !ruby/object:Gem::Requirement
65
- requirements:
66
- - - ">="
67
- - !ruby/object:Gem::Version
68
- version: '0'
69
- type: :runtime
70
- prerelease: false
71
- version_requirements: !ruby/object:Gem::Requirement
72
- requirements:
73
- - - ">="
74
- - !ruby/object:Gem::Version
75
- version: '0'
76
68
  - !ruby/object:Gem::Dependency
77
69
  name: rake
78
70
  requirement: !ruby/object:Gem::Requirement
79
71
  requirements:
72
+ - - "~>"
73
+ - !ruby/object:Gem::Version
74
+ version: '0.9'
80
75
  - - ">="
81
76
  - !ruby/object:Gem::Version
82
77
  version: 0.9.2
@@ -84,6 +79,9 @@ dependencies:
84
79
  prerelease: false
85
80
  version_requirements: !ruby/object:Gem::Requirement
86
81
  requirements:
82
+ - - "~>"
83
+ - !ruby/object:Gem::Version
84
+ version: '0.9'
87
85
  - - ">="
88
86
  - !ruby/object:Gem::Version
89
87
  version: 0.9.2
@@ -91,6 +89,9 @@ dependencies:
91
89
  name: flexmock
92
90
  requirement: !ruby/object:Gem::Requirement
93
91
  requirements:
92
+ - - "~>"
93
+ - !ruby/object:Gem::Version
94
+ version: '1.2'
94
95
  - - ">="
95
96
  - !ruby/object:Gem::Version
96
97
  version: 1.2.0
@@ -98,6 +99,9 @@ dependencies:
98
99
  prerelease: false
99
100
  version_requirements: !ruby/object:Gem::Requirement
100
101
  requirements:
102
+ - - "~>"
103
+ - !ruby/object:Gem::Version
104
+ version: '1.2'
101
105
  - - ">="
102
106
  - !ruby/object:Gem::Version
103
107
  version: 1.2.0
@@ -105,6 +109,9 @@ dependencies:
105
109
  name: test-unit
106
110
  requirement: !ruby/object:Gem::Requirement
107
111
  requirements:
112
+ - - "~>"
113
+ - !ruby/object:Gem::Version
114
+ version: '3.0'
108
115
  - - ">="
109
116
  - !ruby/object:Gem::Version
110
117
  version: 3.0.8
@@ -112,6 +119,9 @@ dependencies:
112
119
  prerelease: false
113
120
  version_requirements: !ruby/object:Gem::Requirement
114
121
  requirements:
122
+ - - "~>"
123
+ - !ruby/object:Gem::Version
124
+ version: '3.0'
115
125
  - - ">="
116
126
  - !ruby/object:Gem::Version
117
127
  version: 3.0.8
@@ -129,6 +139,8 @@ files:
129
139
  - README.md
130
140
  - Rakefile
131
141
  - VERSION
142
+ - build.sh
143
+ - example.conf
132
144
  - fluent-plugin-aliyun-odps.gemspec
133
145
  - lib/fluent/plugin/conf/config.rb
134
146
  - lib/fluent/plugin/crc/crc.rb
@@ -139,7 +151,6 @@ files:
139
151
  - lib/fluent/plugin/crc/origin/crc32c.rb
140
152
  - lib/fluent/plugin/crc/src/crc32c.c
141
153
  - lib/fluent/plugin/crc/src/crc32c.h
142
- - lib/fluent/plugin/crc/src/extconf.rb
143
154
  - lib/fluent/plugin/exceptions.rb
144
155
  - lib/fluent/plugin/http/http_connection.rb
145
156
  - lib/fluent/plugin/http/http_flag.rb
@@ -154,7 +165,6 @@ files:
154
165
  - lib/fluent/plugin/stream_client.rb
155
166
  - lib/fluent/plugin/stream_reader.rb
156
167
  - lib/fluent/plugin/stream_writer.rb
157
- - odps_example.conf
158
168
  homepage: https://github.com/aliyun/aliyun-odps-fluentd-plugin
159
169
  licenses:
160
170
  - Apache-2.0
@@ -1,3 +0,0 @@
1
- require 'mkmf'
2
- extension_name = 'crc32c'
3
- create_makefile(extension_name)
data/odps_example.conf DELETED
@@ -1,31 +0,0 @@
1
- ####
2
- ## Output descriptions:
3
- ##
4
-
5
- <source>
6
- type tail
7
- path /opt/log/in/in.log
8
- refresh_interval 5s
9
- tag in.log
10
- format csv
11
- keys dt,week,r1,r2,r3,r4,r5,r6,r7,blue
12
- </source>
13
-
14
- <match in.**>
15
- type aliyun_odps
16
- aliyun_access_id ************
17
- aliyun_access_key *********
18
- aliyun_odps_endpoint http://service.odps.aliyun.com/api
19
- aliyun_odps_hub_endpoint http://dh.odps.aliyun.com
20
- buffer_chunk_limit 2m
21
- buffer_queue_limit 128
22
- flush_interval 5s
23
- project your_projectName
24
- enable_fast_crc false
25
- <table in.log>
26
- table your_tableName
27
- fields r1,r2,r3,r4,r5,r6,blue
28
- partition ctime=${dt.strftime('%Y%m%d')}
29
- shard_number 1
30
- </table>
31
- </match>