How Content Curation Tools Actually Work at Scale

Table of Contents

Filtering out trash in noisy RSS feeds

There’s a weird tension with RSS tools. You want wide net coverage, but not so wide that you end up with 60%, like, AI-generated Medium detritus. Most curation tools will let you hook up feeds, sure, but almost none of them are good at prioritizing human-authored, high-signal stuff vs vague SEO slop.

Feedly’s Leo is maybe the closest thing to useful AI filtering, but I’ve found that if you don’t fine-tune the rankings weekly, it drifts hard. Like, one week it prioritized an outdated Python 3.6 tutorial from a spammy clipboard blog. Once I blacklisted one domain, it started pushing similar ones instead, almost like it was blind to content clusters.

Best workaround I’ve landed on: combine an OPML dump from old-school sources like Hacker News, Arxiv-sanity, and indie Substacks, and then use a local tool like newsboat or an in-browser pinboard script. Most of the SaaS ones abstract so hard they lose that low-level control you’ll actually need to prune effectively.

Tagging systems that truly suck at scale

I made the mistake of using Diigo for about a year thinking I’d be a tagging nerd. It held up for maybe 300 links, then it became a misery factory. No nested tags. No real merging. And worst of all, fully case-sensitive, so “Web3” was treated differently from “web3”. That’s just a cruel UI decision.

Raindrop is better, but it syncs too aggressively. Once I had a mobile crash while tagging a batch, and it overwrote everything mid-process. My workaround now is using Raindrop like an inbox: dump freely, then batch-tag weekly with an external script connecting through their API. Yup, had to build my own because their UI slags into molasses after 1,000 items.

Undocumented edge case: if you export your collection to JSON and re-import it (say, to migrate another account), it creates duplicates that the search engine internally ranks higher than your originals. I only noticed this because all my curated tweets flipped to showing the wrong thumbnails.

Buffer vs. Hypefury for scheduled insights dumping

Hypefury honestly surprised me. I expected another tweet-junking gimmick tool but it’s actually better laid out than Buffer for single-brain curation. Buffer is fine for consistent scheduling, multiple channels, all that agency cohort stuff. But Hypefury feels more designed for people working alone or as lean communicators.

Hypefury lets you store non-published ideas in a curated bank
Auto-retweets old winners without being annoying about it
Buffer still doesn’t support Twitter threads cleanly; you have to fudge it
Hypefury’s analytics are fake-granular (which is fine)
Buffer sync errors with LinkedIn are still a thing occasionally
You can import Notion pages as raw copy into Hypefury with less friction

Still, if you’re managing content across a newsletter plus social, Buffer might edge out unless you bolt on something custom

Where Notion-style databases break down for curation

Notion templates for thought organization are everywhere. And they’re seductive until you hit the first 40-page level. I had a tagging table for “Pacing Concepts in SaaS Writing” with hundreds of entries, each with source links, pulled quotes, authors, and little NLP flags. Gorgeous. Until I tried to sort by tag groups and noticed there’s no actual tag JOIN logic. You either manually group stuff with relations or give up on cross-topic filters.

The JSON export is disjointed, by the way. It doesn’t really preserve linked sources unless you hack in workaround fields. I lost tons of context when moving a knowledge base to Obsidian last fall. Also: database views don’t include archived pages, but they do count toward your workspace search scope. So you think something is gone or untagged — but turns out it’s just invisible in all your filtered views.

Actual aha moment:

When I searched for an old blog link and realized it didn’t appear in the “Published” view but was still found when doing a global search — because it had been renamed and archived automatically by a Zapier trigger I forgot existed six months ago.

Abusing Pocket’s “Recommended” tab to reverse-engineer network bias

Pocket looks harmless at first… until you realize the “Recommended” tab is a behavioral algorithm that adapts to what everyone in your loose Google account cloud has saved. One week I ran a test: I fed Pocket 20 articles from a known private equity newsletter feed and watched the recommended queue turn into absolute finance toilet soup. Then, over the next few days of starring tech anthropology pieces, it swung back to Wired/Quartz/high-brow Twitter fare so fast it gave me RSS whiplash.

This isn’t documented anywhere, but your Pocket bias footprint is deeply sync’d with what you read while logged into other Firefox-related services. I noticed my recommended queue include two articles I never saved, but had read while logged into my Firefox sync account on mobile.

TL;DR — I use Pocket in incognito now, mostly to avoid polluting its model. Then I scrape that Recommended tab weekly using a little Puppeteer scraper saved to a local HTML archive. Helps surface weird pieces no one else in my feed gets to see.

Glue-linting systems: Zotero, Obsidian, and accidental content treasure hunting

Zotero isn’t a “curation tool” per se, but if you force it into that role? You get some ridiculous power, mostly because of how well it collapses citation metadata. Obsidian is the opposite: no citation brain, pure anti-structure markdown anarchy. The magic lives in between.

Small pro-move stack I landed on by accident:

• Save highlight text via Hypothes.is
• Export daily via API dump to CSV
• Import to Zotero with Altmetric plugin tagging topical interest
• Auto-sync that Zotero db with Obsidian
• Use Dataview plugin to show only entries with more than 3 quotes AND a tag match

Ended up surfacing a 2018 post from danah boyd that had been attribution-lost for years. The quote was in three different blogs but no one cited source. Zotero cross-tied it via DOI metadata.

Only brutal part is syncing these together without creating loops. If anything E2E updates the .bib files, Zotero will sometimes overwrite with null entries because it thinks they’re system-generated cruft. So now I run the whole pipeline only with a “daily-notes” flag to prevent feedback loops.

The one time I blew up my RSS graph using Inoreader’s rules engine

Inoreader’s custom rules system is powerful but terrifyingly poorly documented. I tried building a rule that said “If any item contains the phrase ‘API deprecation’ and it’s older than 180 days, auto-star it.” Thought I’d be clever for trend lag tracking. What I accidentally did: create a recursive rule by using “starred items” as a trigger for another archive export, which caused a feedback crawl loop during sync.

I only realized this because my Gmail got a warning from Inoreader about email limits. Yes — it used email response notices as a backchannel to let me know my rule broke their backend sync cap.

Turns out, rules can’t filter by relative time unless you embed a regex into the article content structure… which is not available unless the feed supports full HTML body feeds (most don’t).

Had to delete everything and reconstruct it from a backup OPML I’d exported two weeks earlier. If you mess with rules: test them on minor feeds first. Anything high-volume explodes instantly.

What really makes a “thought leadership” curation stack

It’s not feeds. It’s not even the content bank. It’s recall speed. I realized this when someone sent me a six-month-old Mastodon post about OSS community governance, and I immediately remembered not only seeing it — but that I had linked to it in a Zettelkasten note about Rust moderation protocols.

But only because I’d typed a summary note when I first saved it into my Arc bookmarks with a #governance tag. Literally that simple. Thought leadership doesn’t come from abstract tools, it comes from navigability you trust.