flatfish 0.3.1 → 0.3.2
Sign up to get free protection for your applications and to get access to all the features.
- data/Gemfile +2 -1
- data/README.md +9 -10
- data/lib/flatfish.rb +1 -1
- metadata +1 -1
data/Gemfile
CHANGED
data/README.md
CHANGED
@@ -1,30 +1,29 @@
|
|
1
|
-
# Flatfish
|
1
|
+
# Flatfish
|
2
2
|
Bottom-feeding fun!
|
3
3
|
|
4
|
-
## Description
|
5
|
-
Flatfish is a lib to scrape HTML based on a CSV
|
4
|
+
## Description
|
5
|
+
Flatfish is a lib to scrape HTML based on a CSV with CSS selectors and configurable attributes (eg, page titles).
|
6
6
|
The ultimate goal of Flatfish is to prep and load the HTML into Drupal.
|
7
7
|
|
8
8
|
## INSTALLATION
|
9
9
|
Flatfish is still in development, so it's not on Rubygems just yet. You'll need to build and install the gem manually, this is really pretty easy. Assuming you're starting from scratch:
|
10
10
|
|
11
11
|
1. We're using Ruby 1.9.3, so install that with RVM, rbenv+ruby-build, or on your own.
|
12
|
-
2. Flatfish has a few dependencies, which are listed in the Gemfile
|
13
|
-
3. We've set up a quick Rake task to build and install the Flatfish gem, so if you're using RVM (system-wide flavor) just run
|
12
|
+
2. Flatfish has a few dependencies, which are listed in the Gemfile. You can install the `bundler` gem and then use it to grab the rest of the gems at the versions specified in the Gemfile.lock by running `bundle install`--this is probably a good idea. The gems can also be installed by hand--there are only a few.
|
13
|
+
3. We've set up a quick Rake task to build and install the Flatfish gem, so if you're using RVM (system-wide flavor) just run `rake install_gem`. Otherwise, you can just `gem build flatfish.gemspec` and `gem install flatfish-VERSION.gem` according to your setup.
|
14
14
|
|
15
15
|
## NOTES
|
16
|
-
As Flatfish scrapes the HTML over-the-wire, it can be a bit slow (say 10 minutes for 500 pages), you can speed things up by pointing to a local copy of your site.
|
16
|
+
As Flatfish scrapes the HTML over-the-wire, it can be a bit slow (say 10 minutes for 500 pages), but you can speed things up by pointing to a local copy of your site by entering a value for `local_source` in the config.yml file (see the example directory).
|
17
17
|
|
18
18
|
## USAGE INSTRUCTIONS
|
19
19
|
1. Create a MySQL database
|
20
20
|
2. Make a directory for you CSV and configuration file
|
21
|
-
3. Create CSVs of URLs
|
22
|
-
4. Configure your yaml
|
23
|
-
5. Run
|
21
|
+
3. Create CSVs of URLs with CSS selectors (see the example directory), one for each Drupal Content Type
|
22
|
+
4. Configure your yaml with project specifics (see config.yml in the example directory)
|
23
|
+
5. Run `flatfish` in your project directory (with the CSV and the config file)
|
24
24
|
6. Additional Flatfish runs will update (IE, overwrite) database content based on URL.
|
25
25
|
|
26
26
|
## License
|
27
|
-
|
28
27
|
(The MIT License)
|
29
28
|
|
30
29
|
Copyright (c) 2012 Tim Loudon
|
data/lib/flatfish.rb
CHANGED