skyfall 0.4.1 → 0.5.0

Sign up to get free protection for your applications and to get access to all the features.
Files changed (36) hide show
  1. checksums.yaml +4 -4
  2. data/CHANGELOG.md +15 -0
  3. data/README.md +182 -29
  4. data/example/block_tracker.rb +1 -1
  5. data/example/{monitor_phrases.rb → jet_monitor_phrases.rb} +3 -2
  6. data/example/print_all_posts.rb +1 -1
  7. data/example/push_notifications.rb +1 -1
  8. data/lib/skyfall/collection.rb +26 -0
  9. data/lib/skyfall/{messages → firehose}/account_message.rb +3 -1
  10. data/lib/skyfall/{messages → firehose}/commit_message.rb +9 -3
  11. data/lib/skyfall/firehose/handle_message.rb +14 -0
  12. data/lib/skyfall/firehose/identity_message.rb +9 -0
  13. data/lib/skyfall/{messages → firehose}/info_message.rb +3 -1
  14. data/lib/skyfall/{messages → firehose}/labels_message.rb +2 -2
  15. data/lib/skyfall/{messages/websocket_message.rb → firehose/message.rb} +13 -11
  16. data/lib/skyfall/firehose/operation.rb +58 -0
  17. data/lib/skyfall/firehose/tombstone_message.rb +11 -0
  18. data/lib/skyfall/firehose/unknown_message.rb +6 -0
  19. data/lib/skyfall/firehose.rb +79 -0
  20. data/lib/skyfall/jetstream/account_message.rb +19 -0
  21. data/lib/skyfall/jetstream/commit_message.rb +16 -0
  22. data/lib/skyfall/jetstream/identity_message.rb +15 -0
  23. data/lib/skyfall/jetstream/message.rb +50 -0
  24. data/lib/skyfall/jetstream/operation.rb +58 -0
  25. data/lib/skyfall/jetstream/unknown_message.rb +6 -0
  26. data/lib/skyfall/jetstream.rb +121 -0
  27. data/lib/skyfall/stream.rb +39 -59
  28. data/lib/skyfall/version.rb +1 -1
  29. data/lib/skyfall.rb +4 -2
  30. metadata +21 -14
  31. data/example/follower_tracker.rb +0 -84
  32. data/lib/skyfall/messages/handle_message.rb +0 -12
  33. data/lib/skyfall/messages/identity_message.rb +0 -7
  34. data/lib/skyfall/messages/tombstone_message.rb +0 -9
  35. data/lib/skyfall/messages/unknown_message.rb +0 -4
  36. data/lib/skyfall/operation.rb +0 -74
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 1fa2648926fdc913472ec8fca4d6bdf3bbac188dbb0d9c82816538ca0c446d23
4
- data.tar.gz: f35348c4834fb9dc26544105074bdc4f9ba07c36d595e5c248651d47e5dc188e
3
+ metadata.gz: d272120c19a97451a6df364e1a7dbbb8191dc0247fd929129ecab0cf05902599
4
+ data.tar.gz: 242ebbbbf44aee36a883c2fc78be327d29c14fa27168fc1eac66ef38ad16cc16
5
5
  SHA512:
6
- metadata.gz: df775bdf85f5fe73f5fc5f27868b4ad8755219e98dfae72a32e7a9bb66295b463f65a2ebd9a8c956d74f200f667a9f02c00310e665a506f69a950db03eec4ad6
7
- data.tar.gz: 6dfe5acf800a9765d0ad6f3ce78e7a6aae445a2dde4028ef3af39ff81a5fc3947080ecfb0ceb0dcf439a2db3e6d65e17376b187455fa1eaf56889c87d0137be1
6
+ metadata.gz: cebb5837133466638a7009d52be516af961644c55f91fe3f3e9a6fe120bc8875f614bbf849cea02fe23b81f3f1f4652226214d375adff12198faf3dd81c1bedd
7
+ data.tar.gz: f1c8c48d58c344a67f4df27afc792b6d38272ecb0aefdf7cf4336269290e36c788d356547c2af0520a9d11664e62fb872157da88951cb669f7081bb0414cc029
data/CHANGELOG.md CHANGED
@@ -1,3 +1,18 @@
1
+ ## [0.5.0] - 2024-11-15
2
+
3
+ Jetstream support! You can now connect to [Jetstream](https://github.com/bluesky-social/jetstream) sources using `Skyfall::Jetstream` (see readme).
4
+
5
+ This required some breaking changes in the existing API:
6
+
7
+ - `Skyfall::Stream` has been renamed to `Skyfall::Firehose`, `Skyfall::Stream` is now a base class of both `Firehose` and `Jetstream`; the existing `Skyfall::Stream` constructor works for now but will be removed soon
8
+ - `Skyfall::WebsocketMessage` and its subclasses have been separated into two parallel families under `Skyfall::Firehose` and `Skyfall::Jetstream`, with the base classes just named `Message`
9
+ - same thing happened with `Skyfall::Operation`
10
+ - `data_object` and `type_object` properties in `WebsocketMessage` are considered semi-private API now ("nodoc")
11
+
12
+ In most cases, you should only need to update the `Skyfall::Stream` class name in the constructor. If you've referenced message classes like `Skyfall::CommitMessage` directly, it's probably better to just check the `#type` property instead.
13
+
14
+ Also, small change to the user agent API: `Skyfall::Stream` now has an additional metod `version_string`, which will always return `Skyfall/0.x.y` - it's recommended to use that instead of `default_user_agent` to build your own user agent string that includes the library version. `default_user_agent` now passes through to `version_string`, but it could be changed in future to return something else.
15
+
1
16
  ## [0.4.1] - 2024-10-04
2
17
 
3
18
  - performance fix - don't decode CAR sections which aren't needed, which is most of them; this cuts the amount of memory that GC has to free up by about one third, and should speed up processing by around ~10%
data/README.md CHANGED
@@ -1,6 +1,6 @@
1
1
  # Skyfall
2
2
 
3
- A Ruby gem for streaming data from the Bluesky/AtProto firehose 🦋
3
+ A Ruby gem for streaming data from the Bluesky/ATProto firehose 🦋
4
4
 
5
5
  > [!NOTE]
6
6
  > ATProto Ruby gems collection: [skyfall](https://github.com/mackuba/skyfall) | [blue_factory](https://github.com/mackuba/blue_factory) | [minisky](https://github.com/mackuba/minisky) | [didkit](https://github.com/mackuba/didkit)
@@ -8,57 +8,140 @@ A Ruby gem for streaming data from the Bluesky/AtProto firehose 🦋
8
8
 
9
9
  ## What does it do
10
10
 
11
- Skyfall is a Ruby library for connecting to the *"firehose"* of the Bluesky social network, i.e. a websocket which
12
- streams all new posts and everything else happening on the Bluesky network in real time. The code connects to the
13
- websocket endpoint, decodes the messages which are encoded in some binary formats like DAG-CBOR, and returns the data as Ruby objects, which you can filter and save to some kind of database (e.g. in order to create a custom feed).
11
+ Skyfall is a Ruby library for connecting to the *"[firehose](https://atproto.com/specs/event-stream)"* of the Bluesky social network, i.e. a websocket which streams all new posts and everything else happening on the Bluesky network in real time. The code connects to the websocket endpoint, decodes the messages which are encoded in some binary formats like DAG-CBOR, and returns the data as Ruby objects, which you can filter and save to some kind of database (e.g. in order to create a custom feed).
12
+
13
+ Since version 0.5, Skyfall also supports connecting to [Jetstream](https://github.com/bluesky-social/jetstream/) sources, which serve the same kind of stream, but as JSON messages instead of CBOR.
14
14
 
15
15
 
16
16
  ## Installation
17
17
 
18
+ From the command line:
19
+
18
20
  gem install skyfall
19
21
 
22
+ Or, add this to your `Gemfile`:
23
+
24
+ gem 'skyfall', '~> 0.5'
25
+
20
26
 
21
27
  ## Usage
22
28
 
23
- Start a connection to the firehose by creating a `Skyfall::Stream` object, passing the server hostname and endpoint name:
29
+ ### Standard ATProto firehose
30
+
31
+ To connect to the firehose, start by creating a `Skyfall::Firehose` object, specifying the server hostname and endpoint name:
24
32
 
25
33
  ```rb
26
34
  require 'skyfall'
27
35
 
28
- sky = Skyfall::Stream.new('bsky.network', :subscribe_repos)
36
+ sky = Skyfall::Firehose.new('bsky.network', :subscribe_repos)
29
37
  ```
30
38
 
31
- Add event listeners to handle incoming messages and get notified of errors:
39
+ The server name can be just a hostname, or a full URL with a `ws:` or `wss:` scheme, which is useful if you want to use a non-encrypted websocket connection, e.g. `"ws://localhost:8000"`. The endpoint can be either a full NSID string like `"com.atproto.sync.subscribeRepos"`, or one of the defined symbol shortcuts - you will almost always want to pass `:subscribe_repos` here.
40
+
41
+ Next, set up event listeners to handle incoming messages and get notified of errors. Here are all the available listeners (you will need at least either `on_message` or `on_raw_message`):
32
42
 
33
43
  ```rb
44
+ # this gives you a parsed message object, one of subclasses of Skyfall::Firehose::Message
45
+ sky.on_message { |msg| p msg }
46
+
47
+ # this gives you raw binary data as received from the websocket
48
+ sky.on_raw_message { |data| p data }
49
+
50
+ # lifecycle events
51
+ sky.on_connecting { |url| puts "Connecting to #{url}..." }
34
52
  sky.on_connect { puts "Connected" }
35
53
  sky.on_disconnect { puts "Disconnected" }
54
+ sky.on_reconnect { puts "Connection lost, trying to reconnect..." }
55
+ sky.on_timeout { puts "Connection stalled, triggering a reconnect..." }
36
56
 
37
- sky.on_message { |m| p m }
57
+ # handling errors (there's a default error handler that does exactly this)
38
58
  sky.on_error { |e| puts "ERROR: #{e}" }
39
59
  ```
40
60
 
61
+ You can also call these as setters accepting a `Proc` - e.g. to disable default error handling, you can do:
62
+
63
+ ```rb
64
+ sky.on_error = nil
65
+ ```
66
+
41
67
  When you're ready, open the connection by calling `connect`:
42
68
 
43
69
  ```rb
44
70
  sky.connect
45
71
  ```
46
72
 
73
+ The `#connect` method blocks until the connection is explicitly closed with `#disconnect` from an event or interrupt handler. Skyfall uses [EventMachine](https://github.com/eventmachine/eventmachine) under the hood, so in order to run some things in parallel, you can use e.g. `EM::PeriodicTimer`.
74
+
75
+
76
+ ### Using a Jetstream source
77
+
78
+ Alternatively, you can connect to a [Jetstream](https://github.com/bluesky-social/jetstream/) server. Jetstream is a firehose proxy that lets you stream data as simple JSON instead, which uses much less bandwidth, and allows you to pick only a subset of events that you're interested in, e.g. only posts or only from specific accounts. (See the [configuration section](#jetstream-filters) for more info on Jetstream filtering.)
79
+
80
+ Jetstream connections are made using a `Skyfall::Jetstream` instance, which has more or less the same API as `Skyfall::Firehose`, so it should be possible to switch between those by just changing the line that creates the client instance:
81
+
82
+ ```rb
83
+ sky = Skyfall::Jetstream.new('jetstream1.us-east.bsky.network')
84
+
85
+ sky.on_message { |msg| ... }
86
+ sky.on_error { |e| ... }
87
+ sky.on_connect { ... }
88
+ ...
89
+
90
+ sky.connect
91
+ ```
92
+
93
+ ### Cursors
94
+
95
+ ATProto websocket endpoints implement a "*cursor*" feature to help you make sure that you don't miss anything if your connection is down for a bit (because of a network issue, server restart, deploy etc.). Each message includes a `seq` field, which is the sequence number of the event. You can keep track of the last seq you've seen, and when you reconnect, you pass that number as a cursor parameter - the server will then "replay" all events you might have missed since that last one. (The `bsky.network` Relay firehose currently has a buffer of about 72 hours, though that's not something required by specification.)
96
+
97
+ To use a cursor when connecting to the firehose, pass it as the third parameter to `Skyfall::Firehose`. You should then regularly save the `seq` of the last event to some permanent storage, and then load it from there when reconnecting.
98
+
99
+ A full-network firehose sends many hundreds of events per second, so depending on your use case, it might be enough if you save it every n events (e.g. every 100 or 1000) and on clean shutdown:
100
+
101
+ ```rb
102
+ cursor = load_cursor
103
+
104
+ sky = Skyfall::Firehose.new('bsky.network', :subscribe_repos, cursor)
105
+ sky.on_message do |msg|
106
+ save_cursor(msg.seq) if msg.seq % 1000 == 0
107
+ process_message(msg)
108
+ end
109
+ ```
110
+
111
+ Jetstream has a similar mechanism, except the cursor is the event's timestamp in Unix time microseconds instead of just a number incrementing by 1. For `Skyfall::Jetstream`, pass the cursor as a key in an options hash:
112
+
113
+ ```rb
114
+ cursor = load_cursor
115
+
116
+ sky = Skyfall::Jetstream.new('jetstream1.us-east.bsky.network', { cursor: cursor })
117
+ sky.on_message do |msg|
118
+ save_cursor(msg.seq)
119
+ process_message(msg)
120
+ end
121
+ ```
122
+
47
123
 
48
124
  ### Processing messages
49
125
 
50
- Each message passed to `on_message` is an instance of a subclass of `WebsocketMessage`, depending on the message type. The supported message types are:
126
+ Each message passed to `on_message` is an instance of a subclass of either `Skyfall::Firehose::Message` or `Skyfall::Jetstream::Message`, depending on the selected source. The supported message types are:
51
127
 
52
128
  - `CommitMessage` (`#commit`) - represents a change in a user's repo; most messages are of this type
53
- - `HandleMessage` (`#handle`) - when a different handle is assigned to a user's DID
54
- - `TombstoneMessage` (`#tombstone`) - when an account is deleted
129
+ - `IdentityMessage` (`#identity`) - notifies about a change in user's DID document, e.g. a handle change or a migration to a new PDS
130
+ - `AccountMessage` (`#account`) - notifies about a change of an account's status (de/activation, suspension, deletion)
131
+ - `HandleMessage` (`#handle` - deprecated) - when a different handle is assigned to a user's DID
132
+ - `TombstoneMessage` (`#tombstone` - deprecated) - when an account is deleted
133
+ - `LabelsMessage` (`#labels`) - only used in `subscribe_labels` endpoint
55
134
  - `InfoMessage` (`#info`) - a protocol error message, e.g. about an invalid cursor parameter
56
135
  - `UnknownMessage` is used for other unrecognized message types
57
136
 
58
- All message objects have the following properties:
137
+ `#handle` and `#tombstone` events are considered deprecated, replaced by `#identity` and `#account` respectively. They are still being emitted at the moment (in parallel with the newer event types), but they might stop being sent at any moment, so it's recommended that you don't rely on those.
138
+
139
+ `Skyfall::Firehose::Message` and `Skyfall::Jetstream::Message` variants of message classes should have more or less the same interface, except when a given field is not included in one of the formats.
140
+
141
+ All message objects have the following shared properties:
59
142
 
60
143
  - `type` (symbol) - the message type identifier, e.g. `:commit`
61
- - `seq` (integer) - a sequential index of the message
144
+ - `seq` (integer) - a sequential index of the message; Jetstream messages instead have a `time_us` value, which is a Unix timestamp in microseconds (also aliased as `seq` for compatibility)
62
145
  - `repo` or `did` (string) - DID of the repository (user account)
63
146
  - `time` (Time) - timestamp of the described action
64
147
 
@@ -67,13 +150,17 @@ All properties except `type` may be nil for some message types that aren't relat
67
150
  Commit messages additionally have:
68
151
 
69
152
  - `commit` - CID of the commit
70
- - `prev` - CID of the previous commit in that repo
71
153
  - `operations` - list of operations (usually one)
72
154
 
73
- Handle messages additionally have:
155
+ Handle and Identity messages additionally have:
74
156
 
75
157
  - `handle` - the new handle assigned to the DID
76
158
 
159
+ Account messages additionally have:
160
+
161
+ - `active?` - whether the account is active, or inactive for any reason
162
+ - `status` - if not active, shows the status of the account (`:deactivated`, `:deleted`, `:takendown`)
163
+
77
164
  Info messages additionally have:
78
165
 
79
166
  - `name` - identifier of the message/error
@@ -82,7 +169,7 @@ Info messages additionally have:
82
169
 
83
170
  ### Commit operations
84
171
 
85
- Operations are objects of type `Operation` and have such properties:
172
+ Operations are objects of type `Skyfall::Firehose::Operation` or `Skyfall::Jetstream::Operation` and have such properties:
86
173
 
87
174
  - `repo` or `did` (string) - DID of the repository (user account)
88
175
  - `collection` (string) - name of the relevant collection in the repository, e.g. `app.bsky.feed.post` for posts
@@ -91,7 +178,7 @@ Operations are objects of type `Operation` and have such properties:
91
178
  - `path` (string) - the path part of the at:// URI - collection name + ID (rkey) of the item
92
179
  - `uri` (string) - the complete at:// URI
93
180
  - `action` (symbol) - `:create`, `:update` or `:delete`
94
- - `cid` - CID of the operation/record (`nil` for delete operations)
181
+ - `cid` (CID) - CID of the operation/record (`nil` for delete operations)
95
182
 
96
183
  Create and update operations will also have an attached record (JSON object) with details of the post, like etc. The record data is currently available as a Ruby hash via `raw_record` property (custom types will be added in future).
97
184
 
@@ -114,34 +201,33 @@ end
114
201
  For more examples, see the [example](https://github.com/mackuba/skyfall/blob/master/example) folder or the [bluesky-feeds-rb](https://github.com/mackuba/bluesky-feeds-rb/blob/master/app/firehose_stream.rb) project, which implements a feed generator service.
115
202
 
116
203
 
117
- ### Custom lexicons
204
+ ### Note on custom lexicons
118
205
 
119
- A note on custom lexicons: the `Skyfall::Operation` objects have two properties that tell you the kind of record they're about: `#collection`, which is a string containing the official name of the collection/lexicon, e.g. `"app.bsky.feed.post"`; and `#type`, which is a symbol meant to save you some typing, e.g. `:bsky_post`.
206
+ Note that the `Operation` objects have two properties that tell you the kind of record they're about: `#collection`, which is a string containing the official name of the collection/lexicon, e.g. `"app.bsky.feed.post"`; and `#type`, which is a symbol meant to save you some typing, e.g. `:bsky_post`.
120
207
 
121
208
  When Skyfall receives a message about a record type that's not on the list, whether in the `app.bsky` namespace or not, the operation `type` will be `:unknown`, while the `collection` will be the original string. So if an app like e.g. "Skygram" appears with a `zz.skygram.*` namespace that lets you share photos on ATProto, the operations will have a type `:unknown` and collection names like `zz.skygram.feed.photo`, and you can check the `collection` field for record types known to you and process them in some appropriate way, even if Skyfall doesn't recognize the record type.
122
209
 
123
210
  Do not however check if such operations have a `type` equal to `:unknown` first - just ignore the type and only check the `collection` string. The reason is that some next version of Skyfall might start recognizing those records and add a new `type` value for them like e.g. `:skygram_photo`, and then they won't match your condition anymore.
124
211
 
125
212
 
126
- ## Configuration
213
+ ## Reconnection logic
127
214
 
128
- ### User agent
215
+ In a perfect world, the websocket would never disconnect until you disconnect it, but unfortunately we don't live in a perfect world. The socket sometimes disconnects or stops responding, and Skyfall has some built-in protections to make sure it can operate without much oversight.
129
216
 
130
- `Skyfall::Stream` sends a user agent header when making a connection. This is set by default to `"Skyfall/0.x.y"`, but it's recommended that you override it using the `user_agent` field to something that identifies your app and its author – this will let the owner of the server you're connecting to know who to contact in case the client is causing some problems.
131
217
 
132
- You can also append your user agent info to the default value like this:
218
+ ### Broken connections
133
219
 
134
- ```rb
135
- sky.user_agent = "NewsBot (@news.bot) #{sky.default_user_agent}"
136
- ```
220
+ If the connection is randomly closed for some reason, Skyfall will by default try to reconnect automatically. If the reconnection fails (e.g. because the network is down), it will wait with an [exponential backoff](https://en.wikipedia.org/wiki/Exponential_backoff) up to 5 minute intervals and keep retrying forever until it connects again. The `on_reconnect` callback is triggered when the connection is closed (before the wait delay). This mechanism should generally solve most of the problem.
221
+
222
+ The auto reconnecting feature is enabled by default, but you can turn it off by setting `auto_reconnect` to `false`.
137
223
 
138
- ### Heartbeat and reconnecting
224
+ ### Stalled connections & heartbeat
139
225
 
140
226
  Occasionally, especially during times of very heavy traffic, the websocket can get into a stuck state where it stops receiving any data, but doesn't disconnect and just hangs like this forever. To work around this, there is a "heartbeat" feature which starts a background timer, which periodically checks how much time has passed since the last received event, and if the time exceeds a set limit, it manually disconnects and reconnects the stream.
141
227
 
142
- The option is not enabled by default, because there are some firehoses which will not be sending events often, possibly only once in a while – e.g. labellers and independent PDS firehoses – and in this case we don't want any heartbeat since it will be completely normal not to have any events for a long time. It's not really possible to detect easily if we're connecting to a full network relay or one of those, so in order to avoid false alarms, you need to enable this manually using the `check_heartbeat` property.
228
+ This feature is not enabled by default, because there are some firehoses which will not be sending events often, possibly only once in a while – e.g. labellers and independent PDS firehoses – and in this case we don't want any heartbeat since it will be completely normal not to have any events for a long time. It's not really possible to detect easily if we're connecting to a full network relay or one of those, so in order to avoid false alarms, you need to enable this manually using the `check_heartbeat` property.
143
229
 
144
- You can also change the `heartbeat_interval`, i.e. how often the timer is triggered (default: 10s), and the `heartbeat_timeout`, i.e. the amount of time passed without events when it reconnects (default: 5 min):
230
+ You can also change the `heartbeat_interval`, i.e. how often the timer is triggered (default: 10s), and the `heartbeat_timeout`, i.e. the amount of time passed without events needed to cause a reconnect (default: 5 min):
145
231
 
146
232
  ```rb
147
233
  sky.check_heartbeat = true
@@ -149,6 +235,73 @@ sky.heartbeat_interval = 5
149
235
  sky.heartbeat_timeout = 120
150
236
  ```
151
237
 
238
+ ### Cursors when reconnecting
239
+
240
+ Skyfall keeps track of the last event's `seq` internally in the `cursor` property, so if the client reconnects for whatever reason, it will automatically use the latest cursor in the URL.
241
+
242
+ > [!NOTE]
243
+ > This only happens if you use the `on_message` callback and not `on_raw_message`, since the event is not parsed from binary data into a `Message` object if you use `on_raw_message`, so Skyfall won't have access to the `seq` field then.
244
+
245
+
246
+ ## Streaming from labellers
247
+
248
+ Apart from `subscribe_repos`, there is a second endpoint `subscribe_labels`, which is used to stream labels from [labellers](https://atproto.com/specs/label) (ATProto moderation services). This endpoint only sends `#labels` events (and possibly `#info`).
249
+
250
+ To connect to a labeller, pass `:subscribe_labels` as the endpoint name to `Skyfall::Firehose`. The `on_message` callback will get called with `Skyfall::Firehose::LabelsMessage` events, each of which includes one or more labels as `Skyfall::Label`:
251
+
252
+ ```rb
253
+ cursor = load_cursor(service)
254
+ sky = Skyfall::Firehose.new(service, :subscribe_labels, cursor)
255
+ sky.on_message do |msg|
256
+ if msg.type == :labels
257
+ msg.labels.each do |l|
258
+ puts "[#{l.created_at}] #{l.subject} => #{l.value}"
259
+ end
260
+ end
261
+ end
262
+ ```
263
+
264
+ See [ATProto label docs](https://atproto.com/specs/label) for info on what fields are included with each label - `Skyfall::Label` includes properties with these original names, and also more friendly aliases for each (e.g. `value` instead of `val`).
265
+
266
+
267
+ ## Other configuration
268
+
269
+ ### User agent
270
+
271
+ Skyfall sends a user agent header when making a connection. This is set by default to `"Skyfall/0.x.y"`, but it's recommended that you override it using the `user_agent` field to something that identifies your app and its author – this will let the owner of the server you're connecting to know who to contact in case the client is causing some problems.
272
+
273
+ You can also append your user agent info to the default value like this:
274
+
275
+ ```rb
276
+ sky.user_agent = "NewsBot (@news.bot) #{sky.version_string}"
277
+ ```
278
+
279
+ ### Jetstream filters
280
+
281
+ Jetstream allows you to specify [filters](https://github.com/bluesky-social/jetstream?tab=readme-ov-file#consuming-jetstream) of collection types and/or tracked DIDs when you connect, so it will send you only the events you're interested in. You can e.g. ask only for posts and ignore likes, or only profile events and ignore everything else, or only listen for posts from a few specific accounts.
282
+
283
+ To use these filters, pass the "wantedCollections" and/or "wantedDids" parameters in the options hash when initializing `Skyfall::Jetstream`. You can use the original JavaScript param names, or a more Ruby-like snake_case form:
284
+
285
+ ```rb
286
+ sky = Skyfall::Jetstream.new('jetstream1.us-east.bsky.network', {
287
+ wanted_collections: 'app.bsky.feed.post',
288
+ wanted_dids: @dids
289
+ })
290
+ ```
291
+
292
+ For collections, you can also use the symbol codes used in `Operation#type`, e.g. `:bsky_post`:
293
+
294
+ ```rb
295
+ sky = Skyfall::Jetstream.new('jetstream1.us-east.bsky.network', {
296
+ wanted_collections: [:bsky_post]
297
+ })
298
+ ```
299
+
300
+ See [Jetstream docs](https://github.com/bluesky-social/jetstream?tab=readme-ov-file#consuming-jetstream) for more info on available filters.
301
+
302
+ > [!NOTE]
303
+ > The `compress` and `requireHello` options (and zstd compression) are not available at the moment. Also the "subscriber sourced messages" aren't implemented yet.
304
+
152
305
 
153
306
  ## Credits
154
307
 
@@ -19,7 +19,7 @@ elsif ARGV[0] !~ /^did:plc:[a-z0-9]{24}$/
19
19
  exit 1
20
20
  end
21
21
 
22
- sky = Skyfall::Stream.new('bsky.network', :subscribe_repos)
22
+ sky = Skyfall::Firehose.new('bsky.network', :subscribe_repos)
23
23
 
24
24
  sky.on_connect { puts "Connected, monitoring #{$monitored_did}" }
25
25
  sky.on_disconnect { puts "Disconnected" }
@@ -1,7 +1,7 @@
1
1
  #!/usr/bin/env ruby
2
2
 
3
3
  # Example: monitor new posts for mentions of one or more words or phrases (e.g. anyone mentioning your name or the name
4
- # of your company, project etc.).
4
+ # of your company, project etc.). This example uses a Jetstream connection.
5
5
 
6
6
  # load skyfall from a local folder - you normally won't need this
7
7
  $LOAD_PATH.unshift(File.expand_path('../lib', __dir__))
@@ -17,7 +17,8 @@ if terms.empty?
17
17
  exit 1
18
18
  end
19
19
 
20
- sky = Skyfall::Stream.new('bsky.network', :subscribe_repos)
20
+ # tell Jetstream to send us only post records
21
+ sky = Skyfall::Jetstream.new('jetstream1.us-east.bsky.network', { wanted_collections: [:bsky_post] })
21
22
 
22
23
  sky.on_message do |msg|
23
24
  # we're only interested in repo commit messages
@@ -7,7 +7,7 @@ $LOAD_PATH.unshift(File.expand_path('../lib', __dir__))
7
7
 
8
8
  require 'skyfall'
9
9
 
10
- sky = Skyfall::Stream.new('bsky.network', :subscribe_repos)
10
+ sky = Skyfall::Firehose.new('bsky.network', :subscribe_repos)
11
11
 
12
12
  sky.on_message do |msg|
13
13
  # we're only interested in repo commit messages
@@ -45,7 +45,7 @@ class NotificationEngine
45
45
  end
46
46
 
47
47
  def connect
48
- @sky = Skyfall::Stream.new('bsky.network', :subscribe_repos)
48
+ @sky = Skyfall::Firehose.new('bsky.network', :subscribe_repos)
49
49
 
50
50
  @sky.on_connect { puts "Connected, monitoring #{@user_did}" }
51
51
  @sky.on_disconnect { puts "Disconnected" }
@@ -16,5 +16,31 @@ module Skyfall
16
16
  BSKY_LABELER = "app.bsky.labeler.service"
17
17
 
18
18
  BSKY_CHAT_DECLARATION = "chat.bsky.actor.declaration"
19
+
20
+ SHORT_CODES = {
21
+ BSKY_BLOCK => :bsky_block,
22
+ BSKY_FEED => :bsky_feed,
23
+ BSKY_FOLLOW => :bsky_follow,
24
+ BSKY_LABELER => :bsky_labeler,
25
+ BSKY_LIKE => :bsky_like,
26
+ BSKY_LIST => :bsky_list,
27
+ BSKY_LISTBLOCK => :bsky_listblock,
28
+ BSKY_LISTITEM => :bsky_listitem,
29
+ BSKY_POST => :bsky_post,
30
+ BSKY_POSTGATE => :bsky_postgate,
31
+ BSKY_PROFILE => :bsky_profile,
32
+ BSKY_REPOST => :bsky_repost,
33
+ BSKY_STARTERPACK => :bsky_starterpack,
34
+ BSKY_THREADGATE => :bsky_threadgate,
35
+ BSKY_CHAT_DECLARATION => :bsky_chat_declaration,
36
+ }
37
+
38
+ def self.short_code(collection)
39
+ SHORT_CODES[collection] || :unknown
40
+ end
41
+
42
+ def self.from_short_code(code)
43
+ SHORT_CODES.detect { |k, v| v == code }&.first
44
+ end
19
45
  end
20
46
  end
@@ -1,5 +1,7 @@
1
+ require_relative '../firehose'
2
+
1
3
  module Skyfall
2
- class AccountMessage < WebsocketMessage
4
+ class Firehose::AccountMessage < Firehose::Message
3
5
  def active?
4
6
  @data_object['active']
5
7
  end
@@ -1,14 +1,16 @@
1
1
  require_relative '../car_archive'
2
2
  require_relative '../cid'
3
- require_relative '../operation'
3
+ require_relative '../firehose'
4
+ require_relative 'operation'
4
5
 
5
6
  module Skyfall
6
- class CommitMessage < WebsocketMessage
7
+ class Firehose::CommitMessage < Firehose::Message
7
8
  def commit
8
9
  @commit ||= @data_object['commit'] && CID.from_cbor_tag(@data_object['commit'])
9
10
  end
10
11
 
11
12
  def prev
13
+ STDERR.puts "Warning: `prev` property has been deprecated and will be removed in a future version."
12
14
  @prev ||= @data_object['prev'] && CID.from_cbor_tag(@data_object['prev'])
13
15
  end
14
16
 
@@ -17,7 +19,11 @@ module Skyfall
17
19
  end
18
20
 
19
21
  def operations
20
- @operations ||= @data_object['ops'].map { |op| Operation.new(self, op) }
22
+ @operations ||= @data_object['ops'].map { |op| Firehose::Operation.new(self, op) }
23
+ end
24
+
25
+ def raw_record_for_operation(op)
26
+ op.cid && blocks.section_with_cid(op.cid)
21
27
  end
22
28
  end
23
29
  end
@@ -0,0 +1,14 @@
1
+ require_relative '../firehose'
2
+
3
+ module Skyfall
4
+
5
+ #
6
+ # Note: this event type is deprecated and will stop being emitted at some point.
7
+ # You should instead listen for 'identity' events (Skyfall::Firehose::IdentityMessage).
8
+ #
9
+ class Firehose::HandleMessage < Firehose::Message
10
+ def handle
11
+ @data_object['handle']
12
+ end
13
+ end
14
+ end
@@ -0,0 +1,9 @@
1
+ require_relative '../firehose'
2
+
3
+ module Skyfall
4
+ class Firehose::IdentityMessage < Firehose::Message
5
+ def handle
6
+ @data_object['handle']
7
+ end
8
+ end
9
+ end
@@ -1,5 +1,7 @@
1
+ require_relative '../firehose'
2
+
1
3
  module Skyfall
2
- class InfoMessage < WebsocketMessage
4
+ class Firehose::InfoMessage < Firehose::Message
3
5
  attr_reader :name, :message
4
6
 
5
7
  OUTDATED_CURSOR = "OutdatedCursor"
@@ -1,8 +1,8 @@
1
- require_relative 'websocket_message'
1
+ require_relative '../firehose'
2
2
  require_relative '../label'
3
3
 
4
4
  module Skyfall
5
- class LabelsMessage
5
+ class Firehose::LabelsMessage
6
6
  using Skyfall::Extensions
7
7
 
8
8
  attr_reader :type_object, :data_object
@@ -1,11 +1,12 @@
1
1
  require_relative '../errors'
2
2
  require_relative '../extensions'
3
+ require_relative '../firehose'
3
4
 
4
5
  require 'cbor'
5
6
  require 'time'
6
7
 
7
8
  module Skyfall
8
- class WebsocketMessage
9
+ class Firehose::Message
9
10
  using Skyfall::Extensions
10
11
 
11
12
  require_relative 'account_message'
@@ -17,23 +18,24 @@ module Skyfall
17
18
  require_relative 'tombstone_message'
18
19
  require_relative 'unknown_message'
19
20
 
20
- attr_reader :type_object, :data_object
21
21
  attr_reader :type, :did, :seq
22
-
23
22
  alias repo did
24
23
 
24
+ # :nodoc: - consider this as semi-private API
25
+ attr_reader :type_object, :data_object
26
+
25
27
  def self.new(data)
26
28
  type_object, data_object = decode_cbor_objects(data)
27
29
 
28
30
  message_class = case type_object['t']
29
- when '#account' then AccountMessage
30
- when '#commit' then CommitMessage
31
- when '#handle' then HandleMessage
32
- when '#identity' then IdentityMessage
33
- when '#info' then InfoMessage
34
- when '#labels' then LabelsMessage
35
- when '#tombstone' then TombstoneMessage
36
- else UnknownMessage
31
+ when '#account' then Firehose::AccountMessage
32
+ when '#commit' then Firehose::CommitMessage
33
+ when '#handle' then Firehose::HandleMessage
34
+ when '#identity' then Firehose::IdentityMessage
35
+ when '#info' then Firehose::InfoMessage
36
+ when '#labels' then Firehose::LabelsMessage
37
+ when '#tombstone' then Firehose::TombstoneMessage
38
+ else Firehose::UnknownMessage
37
39
  end
38
40
 
39
41
  message = message_class.allocate
@@ -0,0 +1,58 @@
1
+ require_relative '../collection'
2
+ require_relative '../firehose'
3
+
4
+ module Skyfall
5
+ class Firehose::Operation
6
+ def initialize(message, json)
7
+ @message = message
8
+ @json = json
9
+ end
10
+
11
+ def repo
12
+ @message.repo
13
+ end
14
+
15
+ alias did repo
16
+
17
+ def path
18
+ @json['path']
19
+ end
20
+
21
+ def action
22
+ @json['action'].to_sym
23
+ end
24
+
25
+ def collection
26
+ @json['path'].split('/')[0]
27
+ end
28
+
29
+ def rkey
30
+ @json['path'].split('/')[1]
31
+ end
32
+
33
+ def uri
34
+ "at://#{repo}/#{path}"
35
+ end
36
+
37
+ def cid
38
+ @cid ||= @json['cid'] && CID.from_cbor_tag(@json['cid'])
39
+ end
40
+
41
+ def raw_record
42
+ @raw_record ||= @message.raw_record_for_operation(self)
43
+ end
44
+
45
+ def type
46
+ Collection.short_code(collection)
47
+ end
48
+
49
+ def inspectable_variables
50
+ instance_variables - [:@message]
51
+ end
52
+
53
+ def inspect
54
+ vars = inspectable_variables.map { |v| "#{v}=#{instance_variable_get(v).inspect}" }.join(", ")
55
+ "#<#{self.class}:0x#{object_id} #{vars}>"
56
+ end
57
+ end
58
+ end
@@ -0,0 +1,11 @@
1
+ require_relative '../firehose'
2
+
3
+ module Skyfall
4
+
5
+ #
6
+ # Note: this event type is deprecated and will stop being emitted at some point.
7
+ # You should instead listen for 'account' events (Skyfall::Firehose::AccountMessage).
8
+ #
9
+ class Firehose::TombstoneMessage < Firehose::Message
10
+ end
11
+ end
@@ -0,0 +1,6 @@
1
+ require_relative '../firehose'
2
+
3
+ module Skyfall
4
+ class Firehose::UnknownMessage < Firehose::Message
5
+ end
6
+ end