View accompanying code on Github
During my funemployment stint, I became inspired by the Great British Bake Off and picked up baking. During this GBBO frenzy, I discovered how adorable, yet impractical their website was, so felt a knead to recreate it! I'll outline some of the juicy bits here, but the finished results can be viewed at:
For those who want to follow along, the project is Turbo monorepo with 3 packages:
Fun fact: it was initially setup as a straight-forward python app, but I used
the Cursor IDE compose
feature to completely refactor
it, which was 🤯
On your marks... get set... scrape!
The project uses recipes
as the core data model with an series of
complementary models represented as one-to-many and many-to-many tables (i.e.
bakers
, diets
, categories
, and bake_types
). Luckily enough, all of the
data for the models can be extracted from a
single view. Since the steps
were essentially the same for each model, I thought the best approach was to
make a main WebScraper
class that each model could inherit from.
WebScraper
goes to a page, finds the nodes that hold the information, extract
meaningful data from each node, and then saves to the DB. To support pagination,
there's also a while-loop that will run, at most, 100 cycles.
Each instance of WebScraper
would just need to override _generate_page_url
,
_extract_items
, and _save_to_db
. Alternatively, I could have passed those as
arguments, but just found this solution easier to read and the scraper-to-model
contract was more apparent.
The API is a fairly cut and dry FastAPI app. Each file has a distinct
responsibility (models
, routes
, services
, etc.) and within each file,
there isn't much magic. models
is a direct one-to-one of the sql statements we
saw in the
startup script.
routes
is as dumb as dumb gets. It has 2 jobs: defined the API and call a
service. services
generates and executes a sql statement, then returns an HTTP
response or exception.
Because I have an API app and not a server, I though we could could roll a
two-bird solution and use Next.js. Again, this is more cake
and less gateau
,
but there is still a couple of pointer-outers:
I opted for the
useActionState hook in the
initial view to gracefully handle slow-connectivity. The root page
file is
RSC: it's an async function
that will render a single time on the server, then flown over to the client. The
body of the return statement has a single slot represented as a <Form />
element that will be rehydrated once it hits the client. The Form element
executers a
server action
on submit. We're invoking the server action through the action state hook
because it provides a nice isPending
state to let us know that the async
function hasn't resolved yet.
There's also a simple Next.js cache layer on all outbound server requests. The Next.js fetch module provides an easy way to config the Next.js Data Cache. In my case, I wanted to aggressively cache all request for an hour:
And finally, the /search
route contains a form to filter out recipes. I
thought it'd be nice to submit said form every time there's a state change.
Since the filters are primarily checkboxes and dropdowns, there isn't much
concern of causing a network traffic jam. With that said, there is a single text
input that I needed to handle. I wanted to keep the form uncontrolled, so I
applied form-level change handler and conditionally debounced the network
requests:
Well that was fun! Let me know what you think and, more importantly, what you're baking 🎂
Comments