8f965b3d06
WIP frontend isn't completely broken now
...
Still more work left with integrating apalis and need to fully update
it.
These changes are mostly for fixing the frontend I broke by eagerly
updating everything.
2025-02-10 00:52:26 -05:00
e41085425a
Upgrade apalis, add fred pool to state, start publishing in jobs
2024-09-22 13:49:24 -04:00
6912ef9017
Add crawl_entry job
2024-08-27 21:54:14 -04:00
65eac1975c
Move feed fetching to crawl_feed job, DomainRequestLimiter
...
`DomainRequestLimiter` is a distributed version of `DomainLocks` based
on redis.
2024-08-26 01:12:18 -04:00
9c75a88c69
Start of a crawl_feed job
2024-08-25 22:24:02 -04:00
a3450e202a
Working apalis cron and worker with 0.6.0-rc.5
...
Also renamed `pool` variables throughout codebase to `db` for clarity.
2024-08-21 01:21:45 -04:00
764d3f23b8
WIP add apalis & split up main process
2024-07-27 13:55:08 -04:00
ec394fc170
Implement entry and feed pagination
2023-09-02 14:01:18 -04:00
276f0e17a8
Remove self from Crawls and Imports in actors at end of task
...
In case the user never listens to the stream so that I do not create
inifinitely growing hashmaps in the server memory.
2023-08-29 23:30:00 -04:00
d17f909312
Add CrawlScheduler actor, shared client w/ last modified headers
2023-07-15 21:40:31 -04:00
4837cbb903
Add crawl metadata to feed & improve model interface
2023-07-15 00:40:10 -04:00
3f028c3088
Store entry html content outside DB in file storage
...
Since the HTML content can get quite big and can have embeded images.
2023-07-05 23:45:49 -04:00
7e06d23bba
Replace argh with clap
...
Mostly for the more concise Config parsing and error handling.
2023-06-27 14:03:52 -04:00
abd540d2ff
Better database layout with uuid primary keys
...
Serialize and deserialize the uuid ids as base62 strings in the URLs.
2023-06-27 14:03:52 -04:00
758e644173
Add published_at to entries, begin to support pagination
...
Articles will be sorted by their published_at dates for now.
2023-06-08 01:20:21 -04:00
3f29138bd1
Fetch and save entry HTML content with metadata
...
And render the extracted HTML on the entry page in the frontend.
2023-06-07 01:06:03 -04:00
9059894021
Make titles optional on feeds and entries
2023-05-17 23:10:23 -04:00
bf40b803a9
Rename item to entry
2023-05-17 23:10:09 -04:00
6f364b4c44
Rename to crawlnicle
2023-05-10 00:00:48 -04:00
ae8f15f19b
Add very basic crawl job
...
Loops through feeds and adds items from each feed.
2023-05-09 23:55:42 -04:00
89fdf8f95a
Create cli binary
...
Just has `add-feed` command so far.
2023-05-09 00:08:55 -04:00