fluent-plugin-aliyun-odps 0.1.2 → 0.1.3

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: f583712920902f51d0f4b2d50f1350e73ef43c37
4
- data.tar.gz: 420ac14c5256bd3aae8be1693b856c7f18d6520d
3
+ metadata.gz: 5b96ab3b2194318e749ee184cd5bdf8bb4a4875c
4
+ data.tar.gz: a846f80a43b87c5551491a5c8e4c3d387b8e003f
5
5
  SHA512:
6
- metadata.gz: cd2b20d9faa6e79f3ea1d8f5e61f2aab0f3f2ccdb7e2ceee83974a70d64abe30dcad0e352ff42479e499bf43ae21f0553cd997ef4c7025797f3e1f9445740fd5
7
- data.tar.gz: b7cea3a104ec7a849741738977f6f164621637a4e9e3196639eefb56a10ee5eb8b3519dcd5b00a07487d5ed5d5193c2e4b2e267482b4adac2aa2f93f60d7e309
6
+ metadata.gz: 592b3a66fa676ecd783e2e999c82a6757d1a387fe3a0731c5f195e8d44ba2065f55df0b90535611040ff6bbdfda0449d4d16f9c6e6c61db6153e8b7fe5076403
7
+ data.tar.gz: 56164832ec7b335daf343ba0a02a464e89830028dbad1430177fa19bf153177e31d81938da81708fb59a18e1374249cd013451061c8594f528b651a9fa4dcb80
data/.gitignore CHANGED
@@ -4,5 +4,6 @@
4
4
  *.iml
5
5
  .idea
6
6
  target/
7
+ package/
7
8
  .DS_Store
8
9
  *.pyc
data/CHANGELOG.md CHANGED
@@ -3,9 +3,9 @@ Fix datetime format bug, support String, DateTime, Time type when write to a dat
3
3
  0.0.5
4
4
  Add reload shard when import fails, and remove unload shard operation when shut down.
5
5
  0.0.6
6
- Add decimal support��fix string input while setting double and int.
6
+ Add decimal supportfix string input while setting double and int.
7
7
  0.0.7
8
- Add error msg when add partition fail, support fast crc�� remove pack size limit.
8
+ Add error msg when add partition fail, support fast crc remove pack size limit.
9
9
  0.0.8
10
10
  Add abandon mode, fix fluent retry bug, fix partition mixed mode bug.
11
11
  0.0.9
@@ -15,4 +15,6 @@ Add partition when catch NoSuchPartition.
15
15
  0.1.1
16
16
  Fix some log format.
17
17
  0.1.2
18
- Use XStreamPack.
18
+ Use XStreamPack.
19
+ 0.1.3
20
+ Drop record with error log when parse partition failed.
data/README.cn.md CHANGED
@@ -1,47 +1,122 @@
1
1
  # Aliyun ODPS Plugin for Fluentd
2
2
 
3
- ## ��ʼʹ��
3
+ ## 开始使用
4
4
  ---
5
5
 
6
- ### ����
6
+ ### 介绍
7
7
 
8
- - �������ݴ�������(Open Data Processing Service�����ODPS)�ǰ���Ͱ������з��ĺ������ݴ���ƽ̨����Ҫ�����������ṹ�����ݵĴ洢�ͼ��㣬�����ṩ�������ݲֿ�Ľ�������Լ���Դ����ݵķ�����ģ����
9
- - ODPS DataHub Service(DHS)��һ��ODPS���ڽ��������û��ṩʵʱ���ݵķ���(Publish)�Ͷ���(Subscribe)�Ĺ��ܡ�
8
+ - 开放数据处理服务(Open Data Processing Service,简称ODPS)是阿里巴巴自主研发的海量数据处理平台。主要服务于批量结构化数据的存储和计算,可以提供海量数据仓库的解决方案以及针对大数据的分析建模服务。
9
+ - ODPS DataHub Service(DHS)是一个ODPS的内建服务,向用户提供实时数据的发布(Publish)和订阅(Subscribe)的功能。发布的数据会自动被写入ODPS表中。所以DHS也可以做为ODPS导入数据的一个入口。
10
+ - 本插件提供向odps表通过DataHub服务写入数据的能力,并具备按用户要求的格式自动创建分区的功能。
10
11
 
11
12
 
12
- ### ����Ҫ��
13
+ ### 环境要求
13
14
 
14
- ʹ�ô˲������Ҫ�߱����»���:
15
+ 使用此插件,需要具备如下环境:
15
16
 
16
- 1. Ruby 2.1.0 �����
17
- 2. Gem 2.4.5 �����
18
- 3. Fluentd-0.10.49 ����� (*[Home Page](http://www.fluentd.org/)*)
19
- 4. Protobuf-3.5.1 �����(Ruby protobuf)
17
+ 1. Ruby 2.1.0 或更新
18
+ 2. Gem 2.4.5 或更新
19
+ 3. Fluentd-0.10.49 或更新 (*[Home Page](http://www.fluentd.org/)*)
20
+ 4. Protobuf-3.5.1 或更新(Ruby protobuf)
20
21
  5. Ruby-devel
21
22
 
22
- ### GEM��װ
23
+ ### 安装部署
24
+ 安装部署Fluentd可以选择以下两种方式之一。
25
+ 1. 一键安装包适用于第一次安装Ruby&Fluentd环境的用户或局域网用户,一键安装包包含了所需的Ruby环境以及Fluentd。目前一键安装包仅支持Linux环境。
26
+ 2. 通过网络安装适用于对Ruby有了解的用户,需要提前确认Ruby版本,若低于2.1.0则需要升级或安装更高级的Ruby环境,然后通过RubyGem安装Fluentd。
23
27
 
24
- ��ruby gem��װʹ��:
28
+ 注:
29
+ * RubyGem源建议更改为https://ruby.taobao.org/
30
+ * 局域网环境安装可以通过本地安装Gem文件
31
+ ```
32
+ gem install --local fluent-plugin-aliyun-odps-0.1.2.gem
33
+ ```
25
34
 
35
+ #### 安装方式一:一键安装包安装
36
+ 1. 下载解压 [fluentd_package.tar.gz](http://gitlab.alibaba-inc.com/aliopensource/aliyun-odps-fluentd-plugin/blob/master/package/fluentd_package.tar.gz)
37
+ 2. 可以修改install_agent.sh中$DIR为你想安装ruby的路径,默认会安装在当前路径下面
38
+ 3. 执行如下命令,提示“Success”表示安装成功
26
39
  ```
27
- $ gem install fluent-plugin-aliyun-odps
40
+ bash install_agent.sh
28
41
  ```
42
+ 4. fluentd程序会被安装在当前目录的bin目录下面
29
43
 
30
- ### ���ʹ��ʾ��
44
+ #### 安装方式二:通过网络安装
45
+ 1. Ruby安装(已经存在Ruby 2.1.0以上环境可忽略此步骤):
46
+ ```
47
+ wget https://cache.ruby-lang.org/pub/ruby/2.3/ruby-2.3.0.tar.gz
48
+ tar xzvf ruby-2.3.0.tar.gz
49
+ cd ruby-2.3.0
50
+ ./configure --prefix=DIR
51
+ make
52
+ make install
53
+ ```
54
+ 2 Fluentd以及插件安装
55
+ ```
56
+ $ gem install fluent-plugin-aliyun-odps
57
+ ```
31
58
 
59
+ ### 插件使用示例
60
+ #### 示例一 上传csv文件中的数据
61
+ 1. 首先需要在odps准备一张表,在这里假设表名为 students, 包含三个字段 id, name, score, 类型分别为string, string, bigint
62
+ 2. 准备csv数据文件, 假设数据文件内容如下
63
+ ```
64
+ 1, jack ma, 90
65
+ 2, pony zhang, 85
66
+ 3, lucy wang, 88
67
+ ```
68
+ 3. 准备fluentd配置文件, 保存以下内容为文件fluentd.conf。
32
69
  ```
33
70
  <source>
34
71
  type tail
35
- path /opt/log/in/in.log
36
- pos_file /opt/log/in/in.log.pos
37
- refresh_interval 5s
38
- tag in.log
39
- format /^(?<remote>[^ ]*) - - \[(?<datetime>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^\"]*?)(?: +\S*)?)?" (?<code>[^ ]*) (?<size>[^ ]*) "-" "(?<agent>[^\"]*)"$/
40
- time_format %Y%b%d %H:%M:%S %z
72
+ path /path/to/students.csv
73
+ tag input.csv
74
+ format csv
41
75
  </source>
76
+ <match input.*>
77
+ type aliyun_odps
78
+ aliyun_access_id ************
79
+ aliyun_access_key *********
80
+ aliyun_odps_endpoint http://service.odps.aliyun.com/api
81
+ aliyun_odps_hub_endpoint http://dh.odps.aliyun.com
82
+ buffer_chunk_limit 2m
83
+ buffer_queue_limit 128
84
+ flush_interval 5s
85
+ project your_projectName #填写需要导入数据的project名称
86
+ enable_fast_crc true
87
+ <table input.csv>
88
+ table students
89
+ fields id,name,score
90
+ shard_number 1
91
+ retry_time 3
92
+ retry_interval 1
93
+ abandon_mode true
94
+ </table>
95
+ </match>
96
+ ```
97
+ 4. 执行fluentd命令,并用-c指定配置文件
98
+ ```
99
+ fluentd -c fluentd.conf
100
+ ```
101
+ 5. 完成后可用如下sql命令查询数据
42
102
  ```
103
+ select * from students;
43
104
  ```
44
- <match in.**>
105
+
106
+ #### 示例二 抓取上传实时nginx日志文件
107
+ 1. 对于nginx日志文件,fluentd可用采用正则表达式的方式来解析数据。
108
+ 2. 参考使用如下配置文件,执行命令同示例一。
109
+ ```
110
+ <source>
111
+ type tail
112
+ path /home/admin/nginx/logs/access.log #nginx log 地址
113
+ pos_file /tmp/nginx.access.pos
114
+ refresh_interval 5s
115
+ tag nginx.access
116
+ format /^(?<remote>[^ ]*) - \[(?<dt>[^\]]*)\] "(?<method>\S+) ((?<path>[^\"]*) +\S*)?" (?<code>[^ ]*) (?<size>[^ ]*) "(?<agent>[^\"]*)" "(?<requesttime>[^\"]*)"? $/ #解析日志的正则表达式
117
+ time_format %d/%b/%Y:%H:%M:%S %z
118
+ </source>
119
+ <match nginx.access>
45
120
  type aliyun_odps
46
121
  aliyun_access_id ************
47
122
  aliyun_access_key *********
@@ -52,9 +127,10 @@ $ gem install fluent-plugin-aliyun-odps
52
127
  flush_interval 5s
53
128
  project your_projectName
54
129
  enable_fast_crc true
55
- <table in.log>
56
- table your_tableName
57
- fields remote,method,path,code,size,agent
130
+ <table nginx.access>
131
+ table nginx_logs #对应日志写入的odps表
132
+ fields remote,method,path,code,size,agent,requesttime
133
+ shard_number 5
58
134
  partition ctime=${datetime.strftime('%Y%m%d')}
59
135
  time_format %d/%b/%Y:%H:%M:%S %z
60
136
  shard_number 1
@@ -64,68 +140,68 @@ $ gem install fluent-plugin-aliyun-odps
64
140
  </table>
65
141
  </match>
66
142
  ```
67
- ### ����˵��
68
-
69
- - type(Fixed): �̶�ֵ aliyun_odps.
70
- - aliyun_access_id(Required):������access_id.
71
- - aliyun_access_key(Required):������access key.
72
- - aliyun_odps_hub_endpoint(Required):�����ķ�������ECS�ϣ���ѱ�ֵ�趨Ϊ http://dh-ext.odps.aliyun-inc.com, ��������Ϊ http://dh.odps.aliyun.com.
73
- - aliyunodps_endpoint(Required):�����ķ�������ECS�ϣ���ѱ�ֵ�趨Ϊ http://odps-ext.aiyun-inc.com/api, ��������Ϊ http://service.odps.aliyun.com/api .
74
- - buffer_chunk_limit(Optional): ���С��֧�֡�k��(KB),��m��(MB)��λ��Ĭ�� 8MB������ֵ2MB, Ŀǰ���֧��20MB.
75
- - buffer_queue_limit(Optional): ����д�С����ֵ��buffer_chunk_limit��ͬ����������������С��
76
- - flush_interval(Optional): ǿ�Ʒ��ͼ�����ﵽʱ��������δ����ǿ�Ʒ���, Ĭ�� 60s.
77
- - abandon_mode(Optional):�����������κ�������pack���ݡ�
78
- - project(Required): project����.
79
- - table(Required): table����.
80
- - fields(Required): ��source��Ӧ���ֶ������������source֮��.
81
- - partition(Optional)����Ϊ�������������ô���.
82
- - ������֧�ֵ�����ģʽ:
83
- - �̶�ֵ: partition ctime=20150804
84
- - �ؼ���: partition ctime=${remote} ������remoteΪsource��ij�ֶΣ�
85
- - ʱ���ʽ�ؼ���: partition ctime=${datetime.strftime('%Y%m%d')} ������datetimeΪsource��ijʱ���ʽ�ֶΣ����Ϊ%Y%m%d��ʽ��Ϊ�������ƣ�
86
- - time_format(Optional):
87
- - ���ʹ��ʱ���ʽ�ؼ���Ϊ<partition>, �����ñ�����. ����: source[datetime]="29/Aug/2015:11:10:16 +0800",������<time_format>Ϊ"%d/%b/%Y:%H:%M:%S %z"
88
- - shard_number(Optional):ָ��shard���������������shard[0,shard_number-1]��Χ�ڵ�shard��д�����ݣ�����Ϊ����0��С��table��Ӧshard�������޵�����.
89
- - enable_fast_crc(Optional): ʹ�ÿ���crc���㣬�⽫�����������ܣ���������ʹ�����ⲿ���صĶ�̬���ӿ⣬Ŀǰ��֧��64λlinux��windowsϵͳ.
90
- - retry_time(Optional): ����ÿ��pack����ʱ�������Դ�����Ĭ��3��.
91
- - retry_interval(Optional): ���Լ����Ĭ��1s.
92
- - abandon_mode(Optional): Ĭ��Ϊfalse�����ó�true��������retry_time�����������ݰ�������Ὣ�쳣���͸�fluentd������fluentd�����Ի������ԣ�����������ܻᵼ�������ظ�.
93
-
94
- ## �ٷ���վ
95
- ---
96
-
97
- - [Fluentd User Guide](http://docs.fluentd.org/)
98
-
99
- ## ����
100
- ---
101
143
 
102
- - [Sun Zongtao]()
103
- - [Cai Ying]()
104
- - [Dong Xiao](https://github.com/dongxiao1198)
105
- - [Yang Hongbo](https://github.com/hongbosoftware)
106
-
107
- ## License
108
- ---
109
-
110
- licensed under the [Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0.html)
111
-
112
- ## ����ʹ�������Լ��쳣����
144
+ #### 示例三导入MySQL中的数据
145
+ 1. mysql
146
+
147
+ ### 参数说明
148
+
149
+ - type(Fixed): 固定值 aliyun_odps.
150
+ - aliyun_access_id(Required):阿里云access_id.
151
+ - aliyun_access_key(Required):阿里云access key.
152
+ - aliyun_odps_hub_endpoint(Required):如果你的服务部署在ECS上,请把本值设定为 http://dh-ext.odps.aliyun-inc.com, 否则设置为 http://dh.odps.aliyun.com.
153
+ - aliyunodps_endpoint(Required):如果你的服务部署在ECS上,请把本值设定为 http://odps-ext.aiyun-inc.com/api, 否则设置为 http://service.odps.aliyun.com/api .
154
+ - buffer_chunk_limit(Optional): 块大小,支持“k”(KB),“m”(MB)单位,默认 8MB,建议值2MB, 目前最大支持20MB.
155
+ - buffer_queue_limit(Optional): 块队列大小,此值与buffer_chunk_limit共同决定整个缓冲区大小。
156
+ - flush_interval(Optional): 强制发送间隔,达到时间后块数据未满则强制发送, 默认 60s.
157
+ - abandon_mode(Optional):内置重试三次后抛弃该pack数据。
158
+ - project(Required): project名称.
159
+ - table(Required): table名称.
160
+ - fields(Required): 与source对应,字段名必须存在于source之中.
161
+ - partition(Optional):若为分区表,则设置此项.
162
+ - 分区名支持的设置模式:
163
+ - 固定值: partition ctime=20150804
164
+ - 关键字: partition ctime=${remote} (其中remote为source中某字段)
165
+ - 时间格式关键字: partition ctime=${datetime.strftime('%Y%m%d')} (其中datetime为source中某时间格式字段,输出为%Y%m%d格式作为分区名称)
166
+ - time_format(Optional):
167
+ - 如果使用时间格式关键字为<partition>, 请设置本参数. 例如: source[datetime]="29/Aug/2015:11:10:16 +0800",则设置<time_format>为"%d/%b/%Y:%H:%M:%S %z"
168
+ - shard_number(Optional):指定shard数量,将会随机向shard[0,shard_number-1]范围内的shard中写入数据,必须为大于0且小于table对应shard数量上限的整数.
169
+ - enable_fast_crc(Optional): 使用快速crc计算,这将极大提升性能,但是由于使用了外部加载的动态链接库,目前仅支持64位linux、windows系统.
170
+ - retry_time(Optional): 发送每个pack数据时内置重试次数,默认3次.
171
+ - retry_interval(Optional): 重试间隔,默认1s.
172
+ - abandon_mode(Optional): 默认为false,设置成true会在重试retry_time后抛弃该数据包,否则会将异常抛送给fluentd,利用fluentd的重试机制重试,这种情况可能会导致数据重复.
173
+
174
+ ## 常见使用问题以及异常描述
113
175
  ---
114
- * �����׳��쳣InvalidShardId/ShardNotReady��ʲôԭ���£�
115
- - ����ϵͳ��������������ݳ���������⣬���ڶ�ʱ���ڻָ���
116
- - fluentd������ڶ��������鿴������shard_num�Ƿ����ó���һ����ֵ���򶼲����ã���������ò�һ���ǻᵼ���������ģ�shard_number�ٵĽ��̻�Ѷ���shard Unload����
117
- - ���ܴ��������ʹ��sdk�ȷ�ʽ������loadshard/unloadshard�Ȳ�����
118
- * enable_fast_crc��μ���Ƿ���ݣ�
119
- - ���������ú�������fluentd���̣�����ʱ����֤�����ʧ�ܻ��׳�����ԭ��reload���������֤�����������Ŀ¼������ldd�鿴aliyun-odps-fluentd-plugin/lib/fluent/plugin/crc/lib/linux/crc32c.so��
120
- * retry_time/retry_interval��fluentd�Դ���retry�к�����
121
- - fluentd�Դ�retryĬ�ϳ���36Сʱ���Ὣ����buffer_chunk�ط������ö�̬partition������ط�ȫ�����ݿ�����������ظ�������������ͻ�ʹ�ò���ڲ����ԣ��������ʧ�ܣ����ٸ���abandon_mode��ֵ�ж�������pack�����ݻ��ǽ���fluentd�ط�����buffer��
122
- * Warning��ErrorCode: NoSuchPartition, Message: write failed because The specified partition does not exist.��ʲô��˼��
123
- - ���������catch��Odps��NoSuchPartitionʱ��������������������������warn��ʾOdps���в��������ݶ�Ӧ���������Զ���������������ɹ�������Ϣ��ʾ��
124
- * Fluent::BufferQueueLimitError error="queue size exceeds limit"��ʲôԭ��
125
- - fluentd�ڶ�ȡ����-�������ݹ����У����ȶ�ȡ��һ��buffer�У������С����������buffer_chunk_limit��buffer_queue_limit��ͬ�������������������󣬺ܿ�������Ϊ�ѻ����ݵ���buffer���㣬���Գ�������buffer_queue_limit���������⡣
126
- * ���config�ļ���ηֱ�����һ��fluentd���̣�
127
- - ������ڶ��config�ļ�������ʹ��in_multiprocess������ͬʱ������ͬ�Ľ���������
128
- * partition has no corresponding source key or the partition expression is wrong.����쳣��ʲôԭ��
129
- - ����쳣��ʾ��source data���Ҳ���������partition�ֶ��е�ֵ������partition ctime=${remote}����remoteû�г�����source�У��������á�
130
- * Failed to format the data.����쳣��ʲôԭ��
131
- - ���������Ϣ�׳���������partition���̳������⣬����partition���ã���������д���������Ҳ��������������⡣
176
+ * 程序抛出异常InvalidShardId/ShardNotReady是什么原因导致?
177
+ - 可能系统正在升级,会短暂出现这个问题,会在短时间内恢复;
178
+ - fluentd如果存在多个进程请查看配置项shard_num是否都配置成了一样的值(或都不配置),如果配置不一样是会导致这个问题的,shard_number少的进程会把多余shard Unload掉;
179
+ - 可能存在另外的使用sdk等方式进行了loadshard/unloadshard等操作。
180
+ * enable_fast_crc如何检查是否兼容?
181
+ - 开启此配置后再启动fluentd进程,启动时会验证,如果失败会抛出错误原因(reload不会进行验证),或进入插件目录后利用ldd查看aliyun-odps-fluentd-plugin/lib/fluent/plugin/crc/lib/linux/crc32c.so
182
+ * retry_time/retry_intervalfluentd自带的retry有何区别?
183
+ - fluentd自带retry默认持续36小时,会将整个buffer_chunk重发,配置动态partition情况下重发全部数据可能造成数据重复。配置这两项就会使用插件内部重试,如果重试失败,会再根据abandon_mode的值判定放弃该pack的数据还是交给fluentd重发整个buffer
184
+ * WarningErrorCode: NoSuchPartition, Message: write failed because The specified partition does not exist.是什么意思?
185
+ - 本插件会再catchOdpsNoSuchPartition时会主动创建分区,如果遇到这个warn表示Odps表中不存在数据对应分区,会自动创建,如果创建成功会有信息提示。
186
+ * Fluent::BufferQueueLimitError error="queue size exceeds limit"是什么原因?
187
+ - fluentd在读取数据-发送数据过程中,会先读取到一个buffer中,具体大小根据配置中buffer_chunk_limitbuffer_queue_limit共同决定,如果遇到这个错误,很可能是因为堆积数据导致buffer不足,可以尝试增大buffer_queue_limit解决这个问题。
188
+ * 多个config文件如何分别启动一个fluentd进程?
189
+ - 如果存在多个config文件,可以使用in_multiprocess这个插件同时启动不同的进程来服务。
190
+ * partition has no corresponding source key or the partition expression is wrong.这个异常是什么原因?
191
+ - 这个异常表示在source data中找不到配置在partition字段中的值,例如partition ctime=${remote},而remote没有出现在source中,请检查配置。
192
+ * Failed to format the data.这个异常是什么原因?
193
+ - 这个错误信息抛出代表解析partition过程出现问题,请检查partition配置,如果数据中存在脏数据也可能遇到这个问题。
194
+ * 如何更改为淘宝源RubyGem?
195
+ - RubyGems 镜像[https://ruby.taobao.org/]
196
+
197
+ ## 官方网站
198
+ - [Fluentd User Guide](http://docs.fluentd.org/)
199
+
200
+ ## 作者
201
+ - [Sun Zongtao]()
202
+ - [Cai Ying]()
203
+ - [Dong Xiao](https://github.com/dongxiao1198)
204
+ - [Yang Hongbo](https://github.com/hongbosoftware)
205
+
206
+ ## License
207
+ licensed under the [Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0.html)
data/README.md CHANGED
@@ -69,14 +69,14 @@ $ gem install fluent-plugin-aliyun-odps
69
69
  - aliyun_access_key(Required):your aliyun access key.
70
70
  - aliyun_odps_hub_endpoint(Required):if you are using ECS, set it as http://dh-ext.odps.aliyun-inc.com, otherwise using http://dh.odps.aliyun.com.
71
71
  - aliyunodps_endpoint(Required):if you are using ECS, set it as http://odps-ext.aiyun-inc.com/api, otherwise using http://service.odps.aliyun.com/api .
72
- - buffer_chunk_limit(Optional):chunk size,��k�� (KB), ��m�� (MB), and ��g�� (GB) ��default 8MB��recommended number is 2MB�� max size is 20MB.
73
- - buffer_queue_limit(Optional):buffer chunk size��example: buffer_chunk_limit2m��buffer_queue_limit 128��then the total buffer size is 2*128MB.
72
+ - buffer_chunk_limit(Optional):chunk size,“k (KB), m (MB), and g (GB) default 8MBrecommended number is 2MB max size is 20MB.
73
+ - buffer_queue_limit(Optional):buffer chunk sizeexample: buffer_chunk_limit2mbuffer_queue_limit 128then the total buffer size is 2*128MB.
74
74
  - flush_interval(Optional):interval to flush data buffer, default 60s.
75
75
  - abandon_mode(Optional):drop pack after retry 3 times.
76
76
  - project(Required):your project name.
77
77
  - table(Required):your table name.
78
78
  - fields(Required): must match the keys in source.
79
- - partition(Optional)��set this if your table is partitioned.
79
+ - partition(Optional)set this if your table is partitioned.
80
80
  - partition format:
81
81
  - fix string: partition ctime=20150804
82
82
  - key words: partition ctime=${remote}
data/VERSION CHANGED
@@ -1 +1 @@
1
- 0.1.2
1
+ 0.1.3
data/build.sh ADDED
@@ -0,0 +1,10 @@
1
+ #!/bin/bash
2
+ mkdir package
3
+ mkdir package/temp
4
+ mkdir package/temp/fluentd
5
+ gem build fluent-plugin-aliyun-odps.gemspec
6
+ cp ext/* ./package/temp/fluentd/ -r
7
+ cp README.cn.md ./package/temp/fluentd/README.cn.md
8
+ cp example.conf ./package/temp/fluentd/example.conf
9
+ cp fluent-plugin-aliyun-odps-*.gem ./package/temp/fluentd/dependency_gem/
10
+ tar zcvf ./package/fluentd_package.tar.gz -C ./package/temp/ .
data/example.conf ADDED
@@ -0,0 +1,26 @@
1
+ <source>
2
+ type tail
3
+ path /path/to/students.csv
4
+ tag input.csv
5
+ format csv
6
+ </source>
7
+ <match input.*>
8
+ type aliyun_odps
9
+ aliyun_access_id ************
10
+ aliyun_access_key *********
11
+ aliyun_odps_endpoint http://service.odps.aliyun.com/api
12
+ aliyun_odps_hub_endpoint http://dh.odps.aliyun.com
13
+ buffer_chunk_limit 2m
14
+ buffer_queue_limit 128
15
+ flush_interval 5s
16
+ project your_projectName #填写需要导入数据的project名称
17
+ enable_fast_crc true
18
+ <table input.csv>
19
+ table students
20
+ fields id,name,score
21
+ shard_number 1
22
+ retry_time 3
23
+ retry_interval 1
24
+ abandon_mode true
25
+ </table>
26
+ </match>
@@ -12,16 +12,15 @@ Gem::Specification.new do |gem|
12
12
  gem.email = "dongxiao.dx@alibaba-inc.com"
13
13
  gem.has_rdoc = false
14
14
  #gem.platform = Gem::Platform::RUBY
15
- gem.files = `git ls-files`.split("\n")
15
+ gem.files = `git ls-files | grep -v ext | grep -v package`.split("\n")
16
16
  gem.test_files = `git ls-files -- {test,spec,features}/*`.split("\n")
17
17
  gem.executables = `git ls-files -- bin/*`.split("\n").map{ |f| File.basename(f) }
18
18
  gem.require_paths = ['lib']
19
19
 
20
20
  gem.add_dependency "fluentd", [">= 0.10.49", "< 2"]
21
- gem.add_dependency "protobuf", "~> 3.5.1"
21
+ gem.add_dependency "protobuf", '~> 3.5', '>= 3.5.1'
22
22
  gem.add_dependency "yajl-ruby", "~> 1.0"
23
- gem.add_dependency "fluent-mixin-config-placeholders"
24
- gem.add_development_dependency "rake", ">= 0.9.2"
25
- gem.add_development_dependency "flexmock", ">= 1.2.0"
26
- gem.add_development_dependency "test-unit", ">= 3.0.8"
23
+ gem.add_development_dependency "rake", '~> 0.9', '>= 0.9.2'
24
+ gem.add_development_dependency "flexmock", '~> 1.2', '>= 1.2.0'
25
+ gem.add_development_dependency "test-unit", '~> 3.0', '>= 3.0.8'
27
26
  end
@@ -17,7 +17,7 @@
17
17
  #under the License.
18
18
  #
19
19
  module OdpsDatahub
20
- $SDK_UA_STR = "ODPS Ruby SDK v0.1"
20
+ $SDK_UA_STR = "ODPS Ruby SDK v0.1.2"
21
21
  $MAX_PACK_SIZE = 2048*10*1024
22
22
  class HttpHeaders
23
23
  $AUTHORIZATION = "Authorization"
@@ -110,66 +110,73 @@ module Fluent
110
110
  begin
111
111
  #if partition is not empty
112
112
  unless @partition.blank? then
113
- #if partition has params in it
114
- if @partition.include? "=${"
115
- #split partition
116
- partition_arrays=@partition.split(',')
117
- partition_name=''
118
- i=1
119
- for p in partition_arrays do
120
- #if partition is time formated
121
- if p.include? "strftime"
122
- key=p[p.index("{")+1, p.index(".strftime")-1-p.index("{")]
123
- partition_column=p[0, p.index("=")]
124
- timeFormat=p[p.index("(")+2, p.index(")")-3-p.index("(")]
125
- if data.has_key?(key)
126
- if time_format == nil
127
- partition_value=Time.parse(data[key]).strftime(timeFormat)
113
+ begin
114
+ #if partition has params in it
115
+ if @partition.include? "=${"
116
+ #split partition
117
+ partition_arrays=@partition.split(',')
118
+ partition_name=''
119
+ i=1
120
+ for p in partition_arrays do
121
+ #if partition is time formated
122
+ if p.include? "strftime"
123
+ key=p[p.index("{")+1, p.index(".strftime")-1-p.index("{")]
124
+ partition_column=p[0, p.index("=")]
125
+ timeFormat=p[p.index("(")+2, p.index(")")-3-p.index("(")]
126
+ if data.has_key?(key)
127
+ if time_format == nil
128
+ partition_value=Time.parse(data[key]).strftime(timeFormat)
129
+ else
130
+ partition_value=Time.strptime(data[key], time_format).strftime(timeFormat)
131
+ end
132
+ if i==1
133
+ partition_name+=partition_column+"="+partition_value
134
+ else
135
+ partition_name+=","+partition_column+"="+partition_value
136
+ end
128
137
  else
129
- partition_value=Time.strptime(data[key], time_format).strftime(timeFormat)
138
+ raise "partition has no corresponding source key or the partition expression is wrong,"+data.to_s
130
139
  end
131
- if i==1
132
- partition_name+=partition_column+"="+partition_value
140
+ elsif p.include? "=${"
141
+ key=p[p.index("{")+1, p.index("}")-1-p.index("{")]
142
+ partition_column=p[0, p.index("=")]
143
+ if data.has_key?(key)
144
+ partition_value=data[key]
145
+ if i==1
146
+ partition_name+=partition_column+"="+partition_value
147
+ else
148
+ partition_name+=","+partition_column+"="+partition_value
149
+ end
133
150
  else
134
- partition_name+=","+partition_column+"="+partition_value
151
+ raise "partition has no corresponding source key or the partition expression is wrong,"+data.to_s
135
152
  end
136
153
  else
137
- raise "partition has no corresponding source key or the partition expression is wrong,"+data.to_s
138
- end
139
- elsif p.include? "=${"
140
- key=p[p.index("{")+1, p.index("}")-1-p.index("{")]
141
- partition_column=p[0, p.index("=")]
142
- if data.has_key?(key)
143
- partition_value=data[key]
144
154
  if i==1
145
- partition_name+=partition_column+"="+partition_value
155
+ partition_name+=p
146
156
  else
147
- partition_name+=","+partition_column+"="+partition_value
157
+ partition_name+=","+p
148
158
  end
149
- else
150
- raise "partition has no corresponding source key or the partition expression is wrong,"+data.to_s
151
- end
152
- else
153
- if i==1
154
- partition_name+=p
155
- else
156
- partition_name+=","+p
157
159
  end
160
+ i+=1
158
161
  end
159
- i+=1
162
+ else
163
+ partition_name=@partition
164
+ end
165
+ if partitions[partition_name]==nil
166
+ partitions[partition_name]=[]
167
+ end
168
+ partitions[partition_name] << @format_proc.call(data)
169
+ rescue => ex
170
+ if (@abandon_mode)
171
+ @log.error "Format partition failed, abandon this record. Msg:" +ex.message + " Table:" + @table
172
+ @log.error "Drop data:" + data.to_s
173
+ else
174
+ raise e
160
175
  end
161
- else
162
- partition_name=@partition
163
- end
164
- if partitions[partition_name]==nil
165
- partitions[partition_name]=[]
166
176
  end
167
- partitions[partition_name] << @format_proc.call(data)
168
-
169
177
  else
170
178
  records << @format_proc.call(data)
171
179
  end
172
-
173
180
  rescue => e
174
181
  raise "Failed to format the data:"+ e.message + " " +e.backtrace.inspect.to_s
175
182
  end
@@ -214,6 +221,7 @@ module Fluent
214
221
  else
215
222
  if (@abandon_mode)
216
223
  @log.error "Retry failed, abandon this pack. Msg:" + e.message + " partitions:" + k.to_s + " table:" + @table
224
+ @log.error v[sendCount*threadId..sendCount*(threadId+1)+restCount-1]
217
225
  else
218
226
  raise e
219
227
  end
@@ -252,6 +260,7 @@ module Fluent
252
260
  else
253
261
  if (@abandon_mode)
254
262
  @log.error "Retry failed, abandon this pack. Msg:" + e.message + " Table:" + @table
263
+ @log.error records[sendCount*threadId..sendCount*(threadId+1)+restCount-1]
255
264
  else
256
265
  raise e
257
266
  end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: fluent-plugin-aliyun-odps
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.2
4
+ version: 0.1.3
5
5
  platform: ruby
6
6
  authors:
7
7
  - Xiao Dong
@@ -9,7 +9,7 @@ authors:
9
9
  autorequire:
10
10
  bindir: bin
11
11
  cert_chain: []
12
- date: 2016-03-04 00:00:00.000000000 Z
12
+ date: 2016-03-16 00:00:00.000000000 Z
13
13
  dependencies:
14
14
  - !ruby/object:Gem::Dependency
15
15
  name: fluentd
@@ -36,6 +36,9 @@ dependencies:
36
36
  requirement: !ruby/object:Gem::Requirement
37
37
  requirements:
38
38
  - - "~>"
39
+ - !ruby/object:Gem::Version
40
+ version: '3.5'
41
+ - - ">="
39
42
  - !ruby/object:Gem::Version
40
43
  version: 3.5.1
41
44
  type: :runtime
@@ -43,6 +46,9 @@ dependencies:
43
46
  version_requirements: !ruby/object:Gem::Requirement
44
47
  requirements:
45
48
  - - "~>"
49
+ - !ruby/object:Gem::Version
50
+ version: '3.5'
51
+ - - ">="
46
52
  - !ruby/object:Gem::Version
47
53
  version: 3.5.1
48
54
  - !ruby/object:Gem::Dependency
@@ -59,24 +65,13 @@ dependencies:
59
65
  - - "~>"
60
66
  - !ruby/object:Gem::Version
61
67
  version: '1.0'
62
- - !ruby/object:Gem::Dependency
63
- name: fluent-mixin-config-placeholders
64
- requirement: !ruby/object:Gem::Requirement
65
- requirements:
66
- - - ">="
67
- - !ruby/object:Gem::Version
68
- version: '0'
69
- type: :runtime
70
- prerelease: false
71
- version_requirements: !ruby/object:Gem::Requirement
72
- requirements:
73
- - - ">="
74
- - !ruby/object:Gem::Version
75
- version: '0'
76
68
  - !ruby/object:Gem::Dependency
77
69
  name: rake
78
70
  requirement: !ruby/object:Gem::Requirement
79
71
  requirements:
72
+ - - "~>"
73
+ - !ruby/object:Gem::Version
74
+ version: '0.9'
80
75
  - - ">="
81
76
  - !ruby/object:Gem::Version
82
77
  version: 0.9.2
@@ -84,6 +79,9 @@ dependencies:
84
79
  prerelease: false
85
80
  version_requirements: !ruby/object:Gem::Requirement
86
81
  requirements:
82
+ - - "~>"
83
+ - !ruby/object:Gem::Version
84
+ version: '0.9'
87
85
  - - ">="
88
86
  - !ruby/object:Gem::Version
89
87
  version: 0.9.2
@@ -91,6 +89,9 @@ dependencies:
91
89
  name: flexmock
92
90
  requirement: !ruby/object:Gem::Requirement
93
91
  requirements:
92
+ - - "~>"
93
+ - !ruby/object:Gem::Version
94
+ version: '1.2'
94
95
  - - ">="
95
96
  - !ruby/object:Gem::Version
96
97
  version: 1.2.0
@@ -98,6 +99,9 @@ dependencies:
98
99
  prerelease: false
99
100
  version_requirements: !ruby/object:Gem::Requirement
100
101
  requirements:
102
+ - - "~>"
103
+ - !ruby/object:Gem::Version
104
+ version: '1.2'
101
105
  - - ">="
102
106
  - !ruby/object:Gem::Version
103
107
  version: 1.2.0
@@ -105,6 +109,9 @@ dependencies:
105
109
  name: test-unit
106
110
  requirement: !ruby/object:Gem::Requirement
107
111
  requirements:
112
+ - - "~>"
113
+ - !ruby/object:Gem::Version
114
+ version: '3.0'
108
115
  - - ">="
109
116
  - !ruby/object:Gem::Version
110
117
  version: 3.0.8
@@ -112,6 +119,9 @@ dependencies:
112
119
  prerelease: false
113
120
  version_requirements: !ruby/object:Gem::Requirement
114
121
  requirements:
122
+ - - "~>"
123
+ - !ruby/object:Gem::Version
124
+ version: '3.0'
115
125
  - - ">="
116
126
  - !ruby/object:Gem::Version
117
127
  version: 3.0.8
@@ -129,6 +139,8 @@ files:
129
139
  - README.md
130
140
  - Rakefile
131
141
  - VERSION
142
+ - build.sh
143
+ - example.conf
132
144
  - fluent-plugin-aliyun-odps.gemspec
133
145
  - lib/fluent/plugin/conf/config.rb
134
146
  - lib/fluent/plugin/crc/crc.rb
@@ -139,7 +151,6 @@ files:
139
151
  - lib/fluent/plugin/crc/origin/crc32c.rb
140
152
  - lib/fluent/plugin/crc/src/crc32c.c
141
153
  - lib/fluent/plugin/crc/src/crc32c.h
142
- - lib/fluent/plugin/crc/src/extconf.rb
143
154
  - lib/fluent/plugin/exceptions.rb
144
155
  - lib/fluent/plugin/http/http_connection.rb
145
156
  - lib/fluent/plugin/http/http_flag.rb
@@ -154,7 +165,6 @@ files:
154
165
  - lib/fluent/plugin/stream_client.rb
155
166
  - lib/fluent/plugin/stream_reader.rb
156
167
  - lib/fluent/plugin/stream_writer.rb
157
- - odps_example.conf
158
168
  homepage: https://github.com/aliyun/aliyun-odps-fluentd-plugin
159
169
  licenses:
160
170
  - Apache-2.0
@@ -1,3 +0,0 @@
1
- require 'mkmf'
2
- extension_name = 'crc32c'
3
- create_makefile(extension_name)
data/odps_example.conf DELETED
@@ -1,31 +0,0 @@
1
- ####
2
- ## Output descriptions:
3
- ##
4
-
5
- <source>
6
- type tail
7
- path /opt/log/in/in.log
8
- refresh_interval 5s
9
- tag in.log
10
- format csv
11
- keys dt,week,r1,r2,r3,r4,r5,r6,r7,blue
12
- </source>
13
-
14
- <match in.**>
15
- type aliyun_odps
16
- aliyun_access_id ************
17
- aliyun_access_key *********
18
- aliyun_odps_endpoint http://service.odps.aliyun.com/api
19
- aliyun_odps_hub_endpoint http://dh.odps.aliyun.com
20
- buffer_chunk_limit 2m
21
- buffer_queue_limit 128
22
- flush_interval 5s
23
- project your_projectName
24
- enable_fast_crc false
25
- <table in.log>
26
- table your_tableName
27
- fields r1,r2,r3,r4,r5,r6,blue
28
- partition ctime=${dt.strftime('%Y%m%d')}
29
- shard_number 1
30
- </table>
31
- </match>