kaggle 0.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA256:
3
+ metadata.gz: 3ad655835decf29a7a46e8b9c1d62a91bf05975bb8a694e3fac92b9f5f141eb7
4
+ data.tar.gz: '0803369672874a9a8f275a53fca15baeb66b1f2a121b2d5bdf91b39349a3bab1'
5
+ SHA512:
6
+ metadata.gz: 947d6474751ade9122c0ec9fcb7d7f533a1b23dc9903ba6f59d3e88f20c4b1ec38ff70c53bb242d3a9cafca7f374bee0b307c2fb74ccc52add31fa7555906864
7
+ data.tar.gz: ced4244587280c337dbab8455ffb133db8181631afed0de72bb2b0c773ea7f32b8eff6f30c0d7a6037d5d5ee59cd245e975f7c5f9bdf436da6c9ed9a490707cb
@@ -0,0 +1,13 @@
1
+ {
2
+ "permissions": {
3
+ "allow": [
4
+ "Read(//Users/trex22/development/url_categorise/**)",
5
+ "Bash(chmod:*)",
6
+ "Bash(mkdir:*)",
7
+ "Bash(bundle exec rake:*)",
8
+ "Bash(rake test)"
9
+ ],
10
+ "deny": [],
11
+ "ask": []
12
+ }
13
+ }
data/.ruby-version ADDED
@@ -0,0 +1 @@
1
+ 3.4.5
data/CLAUDE.md ADDED
@@ -0,0 +1,154 @@
1
+ # Claude Assistant Documentation
2
+
3
+ This file documents how Claude helped develop this Ruby gem and provides guidance for future development.
4
+
5
+ ## Development History
6
+
7
+ This Kaggle Ruby gem was created with assistance from Claude (Sonnet 4) on 2025-08-23. The development process followed established Ruby gem conventions and best practices.
8
+
9
+ ## Architecture Decisions
10
+
11
+ ### 1. Gem Structure
12
+ - Followed standard Ruby gem conventions based on successful gems like `url_categorise` and `luno`
13
+ - Used modular architecture with separate modules for constants, client logic, and error handling
14
+ - Implemented comprehensive test coverage using Minitest
15
+
16
+ ### 2. API Design
17
+ - Used HTTParty for HTTP client functionality due to its simplicity and Ruby idioms
18
+ - Implemented authentication via basic auth with username/API key
19
+ - Added configurable paths for downloads and caching to support different deployment scenarios
20
+
21
+ ### 3. Error Handling
22
+ - Created specific error classes inheriting from base `Kaggle::Error`
23
+ - Implemented graceful degradation for network and parsing failures
24
+ - Added comprehensive validation for user inputs
25
+
26
+ ### 4. Testing Strategy
27
+ - Used Minitest with WebMock for HTTP request stubbing
28
+ - Included test coverage reporting with SimpleCov (70% minimum)
29
+ - Added comprehensive test helpers and fixtures
30
+
31
+ ## Key Implementation Notes
32
+
33
+ ### Authentication
34
+ The gem supports two authentication methods:
35
+ 1. Environment variables (`KAGGLE_USERNAME`, `KAGGLE_KEY`)
36
+ 2. Explicit parameters during client initialization
37
+
38
+ ### Caching Strategy
39
+ - Simple file-based caching for parsed CSV data
40
+ - Cache keys generated from dataset paths
41
+ - Optional cache usage controlled by method parameters
42
+
43
+ ### CSV Parsing
44
+ - Uses Ruby's built-in CSV library for reliability
45
+ - Converts CSV to JSON array of hashes (row objects)
46
+ - Includes comprehensive error handling for malformed files
47
+
48
+ ## Commands for Development
49
+
50
+ ### Setup
51
+ ```bash
52
+ bin/setup # Install dependencies
53
+ ```
54
+
55
+ ### Testing
56
+ ```bash
57
+ rake test # Run all tests
58
+ rake test TEST=specific # Run specific test file
59
+ ```
60
+
61
+ ### Console
62
+ ```bash
63
+ bin/console # Interactive Ruby console with gem loaded
64
+ ```
65
+
66
+ ### Linting
67
+ ```bash
68
+ # Note: No specific linter configured yet - add rubocop in future versions
69
+ ```
70
+
71
+ ## Future Development Guidelines
72
+
73
+ ### 1. API Expansion
74
+ When adding new Kaggle API endpoints:
75
+ - Add constants to `lib/kaggle/constants.rb`
76
+ - Add methods to `lib/kaggle/client.rb`
77
+ - Follow existing error handling patterns
78
+ - Add comprehensive tests in `test/kaggle/`
79
+
80
+ ### 2. CLI Enhancement
81
+ The CLI tool (`bin/kaggle`) can be expanded with:
82
+ - More command options and flags
83
+ - Better output formatting
84
+ - Progress indicators for large downloads
85
+ - Configuration file support
86
+
87
+ ### 3. Performance Optimizations
88
+ Consider adding:
89
+ - Concurrent downloads for multiple files
90
+ - Streaming for large files
91
+ - More sophisticated caching strategies
92
+ - Connection pooling for API requests
93
+
94
+ ### 4. Error Handling Improvements
95
+ - Retry logic for transient failures
96
+ - Better error messages with suggested actions
97
+ - Logging capabilities for debugging
98
+
99
+ ## Common Development Patterns
100
+
101
+ ### Adding New API Methods
102
+ 1. Add endpoint to constants
103
+ 2. Implement client method with proper error handling
104
+ 3. Add comprehensive tests
105
+ 4. Update CLI if user-facing
106
+ 5. Document in README
107
+
108
+ ### Testing Network Interactions
109
+ - Always use WebMock to stub HTTP requests
110
+ - Test both success and failure scenarios
111
+ - Include edge cases like timeouts and malformed responses
112
+ - Verify authentication headers are sent correctly
113
+
114
+ ### Code Style Guidelines
115
+ - Follow Ruby community style conventions
116
+ - Use descriptive method and variable names
117
+ - Keep methods focused and single-purpose
118
+ - Include inline documentation for complex logic
119
+
120
+ ## Troubleshooting
121
+
122
+ ### Common Issues
123
+ 1. **Authentication Failures**: Verify credentials and API key permissions
124
+ 2. **Download Failures**: Check network connectivity and dataset availability
125
+ 3. **Parsing Errors**: Verify file format and encoding
126
+ 4. **Path Issues**: Ensure download/cache directories are writable
127
+
128
+ ### Debugging
129
+ - Use `bin/console` for interactive debugging
130
+ - Enable WebMock allow_localhost for integration testing
131
+ - Use Pry for breakpoint debugging in tests
132
+
133
+ ## Contributing Guidelines
134
+
135
+ When contributing to this gem:
136
+ 1. Follow existing code patterns and conventions
137
+ 2. Add tests for new functionality
138
+ 3. Update documentation (README, CLAUDE.md)
139
+ 4. Ensure backward compatibility
140
+ 5. Consider performance implications
141
+
142
+ ## Claude-Specific Notes
143
+
144
+ This gem was developed iteratively with Claude assistance. The AI helper was particularly useful for:
145
+ - Analyzing existing gem patterns and conventions
146
+ - Implementing comprehensive test coverage
147
+ - Creating consistent error handling patterns
148
+ - Structuring CLI tools and bin scripts
149
+
150
+ Future Claude interactions should:
151
+ - Reference this documentation for context
152
+ - Maintain existing architectural decisions
153
+ - Follow established patterns for new features
154
+ - Update this file with new insights or changes
@@ -0,0 +1,74 @@
1
+ # Contributor Covenant Code of Conduct
2
+
3
+ ## Our Pledge
4
+
5
+ In the interest of fostering an open and welcoming environment, we as
6
+ contributors and maintainers pledge to making participation in our project and
7
+ our community a harassment-free experience for everyone, regardless of age, body
8
+ size, disability, ethnicity, gender identity and expression, level of experience,
9
+ nationality, personal appearance, race, religion, or sexual identity and
10
+ orientation.
11
+
12
+ ## Our Standards
13
+
14
+ Examples of behavior that contributes to creating a positive environment
15
+ include:
16
+
17
+ * Using welcoming and inclusive language
18
+ * Being respectful of differing viewpoints and experiences
19
+ * Gracefully accepting constructive criticism
20
+ * Focusing on what is best for the community
21
+ * Showing empathy towards other community members
22
+
23
+ Examples of unacceptable behavior by participants include:
24
+
25
+ * The use of sexualized language or imagery and unwelcome sexual attention or
26
+ advances
27
+ * Trolling, insulting/derogatory comments, and personal or political attacks
28
+ * Public or private harassment
29
+ * Publishing others' private information, such as a physical or electronic
30
+ address, without explicit permission
31
+ * Other conduct which could reasonably be considered inappropriate in a
32
+ professional setting
33
+
34
+ ## Our Responsibilities
35
+
36
+ Project maintainers are responsible for clarifying the standards of acceptable
37
+ behavior and are expected to take appropriate and fair corrective action in
38
+ response to any instances of unacceptable behavior.
39
+
40
+ Project maintainers have the right and responsibility to remove, edit, or
41
+ reject comments, commits, code, wiki edits, issues, and other contributions
42
+ that are not aligned to this Code of Conduct, or to ban temporarily or
43
+ permanently any contributor for other behaviors that they deem inappropriate,
44
+ threatening, offensive, or harmful.
45
+
46
+ ## Scope
47
+
48
+ This Code of Conduct applies both within project spaces and in public spaces
49
+ when an individual is representing the project or its community. Examples of
50
+ representing a project or community include using an official project e-mail
51
+ address, posting via an official social media account, or acting as an appointed
52
+ representative at an online or offline event. Representation of a project may be
53
+ further defined and clarified by project maintainers.
54
+
55
+ ## Enforcement
56
+
57
+ Instances of abusive, harassing, or otherwise unacceptable behavior may be
58
+ reported by contacting the project team at contact@jasonchalom.com. All
59
+ complaints will be reviewed and investigated and will result in a response that
60
+ is deemed necessary and appropriate to the circumstances. The project team is
61
+ obligated to maintain confidentiality with regard to the reporter of an incident.
62
+ Further details of specific enforcement policies may be posted separately.
63
+
64
+ Project maintainers who do not follow or enforce the Code of Conduct in good
65
+ faith may face temporary or permanent repercussions as determined by other
66
+ members of the project's leadership.
67
+
68
+ ## Attribution
69
+
70
+ This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4,
71
+ available at [http://contributor-covenant.org/version/1/4][version]
72
+
73
+ [homepage]: http://contributor-covenant.org
74
+ [version]: http://contributor-covenant.org/version/1/4/
data/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2025 Jason Chalom
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
data/README.md ADDED
@@ -0,0 +1,210 @@
1
+ # Kaggle
2
+ A Ruby client for the Kaggle API with support for datasets, competitions, models, and more. See: https://www.kaggle.com/docs/api
3
+
4
+ This is an unofficial project and still a work in progress (WIP) ... more to come soon.
5
+
6
+ ## Features
7
+
8
+ - 📊 Download Kaggle datasets programmatically
9
+ - 📄 Parse CSV datasets to JSON format
10
+ - 💾 Configurable caching to avoid re-downloading
11
+ - 🔧 Flexible download and cache paths
12
+ - ⚡ Built-in error handling and validation
13
+ - 🛠️ Command-line interface for quick operations
14
+
15
+ ## Installation
16
+
17
+ Add this line to your application's Gemfile:
18
+
19
+ ```ruby
20
+ gem 'kaggle'
21
+ ```
22
+
23
+ And then execute:
24
+
25
+ $ bundle
26
+
27
+ Or install it yourself as:
28
+
29
+ $ gem install kaggle
30
+
31
+ ## Setup
32
+
33
+ You'll need Kaggle API credentials to use this gem. There are three ways to authenticate:
34
+
35
+ ### Option 1: JSON File (Recommended)
36
+ 1. Go to your [Kaggle account page](https://www.kaggle.com/account)
37
+ 2. Click "Create New API Token" to download `kaggle.json`
38
+ 3. Place the file in your project directory or specify the path
39
+
40
+ ### Option 2: Environment Variables
41
+ ```bash
42
+ export KAGGLE_USERNAME="yourusername"
43
+ export KAGGLE_KEY="your_api_key"
44
+ ```
45
+
46
+ ### Option 3: Direct Credentials
47
+ Pass credentials directly when initializing the client.
48
+
49
+ ### Kaggle JSON File Format
50
+ The `kaggle.json` file downloaded from Kaggle should have this format:
51
+ ```json
52
+ {
53
+ "username": "yourusername",
54
+ "key": "your_api_key"
55
+ }
56
+ ```
57
+
58
+ ## Usage
59
+
60
+ ### Basic Usage
61
+
62
+ ```ruby
63
+ require 'kaggle'
64
+
65
+ # Option 1: Use kaggle.json file (automatically detected)
66
+ client = Kaggle::Client.new
67
+
68
+ # Option 1b: Use custom JSON file path
69
+ client = Kaggle::Client.new(credentials_file: '/path/to/kaggle.json')
70
+
71
+ # Option 2: Use environment variables
72
+ client = Kaggle::Client.new
73
+
74
+ # Option 3: Use explicit credentials
75
+ client = Kaggle::Client.new(
76
+ username: 'your_username',
77
+ api_key: 'your_api_key'
78
+ )
79
+ ```
80
+
81
+ ### List Datasets
82
+
83
+ ```ruby
84
+ # List all datasets
85
+ datasets = client.list_datasets
86
+
87
+ # Search datasets
88
+ datasets = client.list_datasets(search: 'housing')
89
+
90
+ # Paginate results
91
+ datasets = client.list_datasets(page: 2, page_size: 10)
92
+ ```
93
+
94
+ ### Download Datasets
95
+
96
+ ```ruby
97
+ # Basic download
98
+ file_path = client.download_dataset('zillow', 'zecon')
99
+
100
+ # Download and parse CSV to JSON
101
+ data = client.download_dataset('zillow', 'zecon', parse_csv: true)
102
+
103
+ # Use caching to avoid re-downloading
104
+ data = client.download_dataset('zillow', 'zecon',
105
+ parse_csv: true,
106
+ use_cache: true)
107
+ ```
108
+
109
+ ### Custom Paths
110
+
111
+ ```ruby
112
+ client = Kaggle::Client.new(
113
+ credentials_file: '/path/to/kaggle.json',
114
+ download_path: '/custom/downloads',
115
+ cache_path: '/custom/cache'
116
+ )
117
+ ```
118
+
119
+ ### Dataset Information
120
+
121
+ ```ruby
122
+ # Get dataset files list
123
+ files = client.dataset_files('zillow', 'zecon')
124
+
125
+ # Parse existing CSV file
126
+ data = client.parse_csv_to_json('/path/to/file.csv')
127
+ ```
128
+
129
+ ## Command Line Interface
130
+
131
+ The gem includes a command-line interface:
132
+
133
+ ```bash
134
+ # List datasets
135
+ kaggle list
136
+
137
+ # Search datasets
138
+ kaggle list "housing"
139
+
140
+ # Download dataset
141
+ kaggle download zillow zecon
142
+
143
+ # Download and parse CSV
144
+ kaggle download zillow zecon --parse-csv
145
+
146
+ # Use custom credentials file
147
+ kaggle download zillow zecon --credentials-file /path/to/kaggle.json
148
+
149
+ # Use custom paths
150
+ kaggle download zillow zecon --download-path /custom --cache-path /custom/cache
151
+
152
+ # Show dataset files
153
+ kaggle files zillow zecon
154
+
155
+ # Show version
156
+ kaggle --version
157
+ ```
158
+
159
+ ## Configuration Options
160
+
161
+ | Option | Default | Description |
162
+ |--------|---------|-------------|
163
+ | `credentials_file` | `./kaggle.json` | Path to Kaggle credentials JSON file |
164
+ | `download_path` | `./downloads` | Where to save downloaded files |
165
+ | `cache_path` | `./cache` | Where to cache parsed data |
166
+ | `timeout` | `30` | HTTP request timeout in seconds |
167
+ | `use_cache` | `false` | Use cached parsed data when available |
168
+ | `parse_csv` | `false` | Automatically parse CSV files to JSON |
169
+
170
+ ## Error Handling
171
+
172
+ The gem includes specific error types:
173
+
174
+ ```ruby
175
+ begin
176
+ client.download_dataset('invalid', 'dataset')
177
+ rescue Kaggle::AuthenticationError
178
+ puts "Invalid credentials"
179
+ rescue Kaggle::DatasetNotFoundError
180
+ puts "Dataset not found"
181
+ rescue Kaggle::DownloadError
182
+ puts "Download failed"
183
+ rescue Kaggle::ParseError
184
+ puts "Failed to parse data"
185
+ end
186
+ ```
187
+
188
+ ## Development
189
+
190
+ After checking out the repo, run `bin/setup` to install dependencies. Then, run `rake test` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment.
191
+
192
+ To install this gem onto your local machine, run `bundle exec rake install`. To release a new version, update the version number in `version.rb`, and then run `bundle exec rake release`, which will create a git tag for the version, push git commits and tags, and push the `.gem` file to [rubygems.org](https://rubygems.org).
193
+
194
+ ### Tests
195
+
196
+ To run tests execute:
197
+
198
+ $ rake test
199
+
200
+ ## Contributing
201
+
202
+ Bug reports and pull requests are welcome on GitHub at https://github.com/yourusername/kaggle. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the [Contributor Covenant](http://contributor-covenant.org) code of conduct.
203
+
204
+ ## License
205
+
206
+ The gem is available as open source under the terms of the [MIT License](https://opensource.org/licenses/MIT).
207
+
208
+ ## Code of Conduct
209
+
210
+ Everyone interacting in the Kaggle project's codebases, issue trackers, chat rooms and mailing lists is expected to follow the [code of conduct](https://github.com/yourusername/kaggle/blob/main/CODE_OF_CONDUCT.md).
data/Rakefile ADDED
@@ -0,0 +1,10 @@
1
+ require 'bundler/gem_tasks'
2
+ require 'rake/testtask'
3
+
4
+ Rake::TestTask.new(:test) do |t|
5
+ t.libs << 'test'
6
+ t.libs << 'lib'
7
+ t.test_files = FileList['test/**/*_test.rb']
8
+ end
9
+
10
+ task default: :test
data/kaggle.gemspec ADDED
@@ -0,0 +1,46 @@
1
+ require_relative 'lib/kaggle/version'
2
+
3
+ Gem::Specification.new do |spec|
4
+ spec.name = 'kaggle'
5
+ spec.version = Kaggle::VERSION
6
+ spec.authors = ['Your Name']
7
+ spec.email = ['your.email@example.com']
8
+
9
+ spec.summary = 'Ruby client for the Kaggle API'
10
+ spec.description = 'A Ruby gem for interacting with the Kaggle API, including dataset downloads with caching support'
11
+ spec.homepage = 'https://github.com/yourusername/kaggle'
12
+ spec.license = 'MIT'
13
+ spec.required_ruby_version = '>= 3.0.0'
14
+
15
+ spec.metadata['allowed_push_host'] = 'https://rubygems.org'
16
+ spec.metadata['homepage_uri'] = spec.homepage
17
+ spec.metadata['source_code_uri'] = spec.homepage
18
+ spec.metadata['changelog_uri'] = "#{spec.homepage}/blob/main/CHANGELOG.md"
19
+
20
+ spec.files = Dir.chdir(__dir__) do
21
+ `git ls-files -z`.split("\x0").reject do |f|
22
+ (File.expand_path(f) == __FILE__) ||
23
+ f.start_with?(*%w[bin/ test/ spec/ features/ .git .circleci appveyor Gemfile])
24
+ end
25
+ end
26
+
27
+ spec.bindir = 'exe'
28
+ spec.executables = spec.files.grep(%r{\Aexe/}) { |f| File.basename(f) }
29
+ spec.require_paths = ['lib']
30
+
31
+ spec.add_dependency 'httparty', '>= 0.23'
32
+ spec.add_dependency 'csv', '>= 3.3'
33
+ spec.add_dependency 'oj', '3.16.11'
34
+ spec.add_dependency 'fileutils', '>= 1.7'
35
+ spec.add_dependency 'rubyzip', '>= 2.0'
36
+
37
+ spec.add_development_dependency 'rake', '~> 13.3.0'
38
+ spec.add_development_dependency 'minitest', '~> 5.25.5'
39
+ spec.add_development_dependency 'minitest-focus', '~> 1.4.0'
40
+ spec.add_development_dependency 'minitest-reporters', '~> 1.7.1'
41
+ spec.add_development_dependency 'webmock', '~> 3.24.0'
42
+ spec.add_development_dependency 'mocha', '~> 2.4.5'
43
+ spec.add_development_dependency 'pry', '~> 0.15.2'
44
+ spec.add_development_dependency 'simplecov', '~> 0.22.0'
45
+ spec.add_development_dependency 'timecop', '~> 0.9.10'
46
+ end