Commit Graph

21 Commits

Author SHA1 Message Date
8f965b3d06 WIP frontend isn't completely broken now
Still more work left with integrating apalis and need to fully update
it.

These changes are mostly for fixing the frontend I broke by eagerly
updating everything.
2025-02-10 00:52:26 -05:00
e41085425a Upgrade apalis, add fred pool to state, start publishing in jobs 2024-09-22 13:49:24 -04:00
6912ef9017 Add crawl_entry job 2024-08-27 21:54:14 -04:00
65eac1975c Move feed fetching to crawl_feed job, DomainRequestLimiter
`DomainRequestLimiter` is a distributed version of `DomainLocks` based
on redis.
2024-08-26 01:12:18 -04:00
9c75a88c69 Start of a crawl_feed job 2024-08-25 22:24:02 -04:00
a3450e202a Working apalis cron and worker with 0.6.0-rc.5
Also renamed `pool` variables throughout codebase to `db` for clarity.
2024-08-21 01:21:45 -04:00
764d3f23b8 WIP add apalis & split up main process 2024-07-27 13:55:08 -04:00
ec394fc170 Implement entry and feed pagination 2023-09-02 14:01:18 -04:00
276f0e17a8 Remove self from Crawls and Imports in actors at end of task
In case the user never listens to the stream so that I do not create
inifinitely growing hashmaps in the server memory.
2023-08-29 23:30:00 -04:00
d17f909312 Add CrawlScheduler actor, shared client w/ last modified headers 2023-07-15 21:40:31 -04:00
4837cbb903 Add crawl metadata to feed & improve model interface 2023-07-15 00:40:10 -04:00
3f028c3088 Store entry html content outside DB in file storage
Since the HTML content can get quite big and can have embeded images.
2023-07-05 23:45:49 -04:00
7e06d23bba Replace argh with clap
Mostly for the more concise Config parsing and error handling.
2023-06-27 14:03:52 -04:00
abd540d2ff Better database layout with uuid primary keys
Serialize and deserialize the uuid ids as base62 strings in the URLs.
2023-06-27 14:03:52 -04:00
758e644173 Add published_at to entries, begin to support pagination
Articles will be sorted by their published_at dates for now.
2023-06-08 01:20:21 -04:00
3f29138bd1 Fetch and save entry HTML content with metadata
And render the extracted HTML on the entry page in the frontend.
2023-06-07 01:06:03 -04:00
9059894021 Make titles optional on feeds and entries 2023-05-17 23:10:23 -04:00
bf40b803a9 Rename item to entry 2023-05-17 23:10:09 -04:00
6f364b4c44 Rename to crawlnicle 2023-05-10 00:00:48 -04:00
ae8f15f19b Add very basic crawl job
Loops through feeds and adds items from each feed.
2023-05-09 23:55:42 -04:00
89fdf8f95a Create cli binary
Just has `add-feed` command so far.
2023-05-09 00:08:55 -04:00