New
Background scheduler: every paid user's feed refreshes every 6h, even if they don't open the site
Founder asked 'for all users their own taste does it work, also always updates and checked'. Audit answer: per-user taste IS isolated correctly (each user has independent ratings/vector/feed, verified across 5 real accounts). But 'always updates' was the gap — auto-scan only fired when the user opened /taste, so inactive users like jefabettmob@gmail.com (Pro, 9 ratings) and tesfamyohannes@gmail.com (Pro, 3 ratings) had feeds that were 14 days stale. /api/taste/scan-all endpoint existed for cron triggers but no external cron was wired. Now: an in-process background thread spawns at boot, runs every 6h, finds every paid user with ≥1 like whose feed is >24h stale, and refreshes via scan_for_user. Self-contained — survives Railway recycling, no external cron needed.
- Spawned at FastAPI boot via _start_background_scanner (alongside the existing model-prewarm thread)
- Cycle: every 6 hours, fetches up to 50 stale-feed paid users in one DB read, refreshes each via recalculate_taste_vector + scan_for_user + store_feed_items
- Stale = feed's newest item created >24h ago
- Min 1 like to qualify — brand-new users still hit the cold-start path on first /taste visit, not background-spammed
- First run delayed 5 min after boot to stagger against fresh-deploy resource pressure
- Per-user errors caught + logged; one bad scrape doesn't tank the cycle
New
Taste scanner now hits 6 marketplaces, not just Grailed + Mercari
Founder asked 'why we have only grailed as marketplace it has to work around all marketplaces'. Auditing: services/shopping.py had scrapers for Grailed, Mercari Japan, Depop, Poshmark, TheRealReal, eBay (text via SerpAPI), Google Shopping, plus Google Lens + eBay visual. The taste path (scan_for_user) was only invoking Grailed + Mercari Japan. Four marketplaces sat fully built but unused. Wired all six text-search marketplaces into the parallel scan. 6× the source diversity per query.
- scan_for_user now invokes Grailed + Mercari Japan + Depop + Poshmark + TheRealReal + eBay in parallel for every query
- Workers: 12 (was 6) — 6 markets × top queries needs more parallelism. Scan budget extended 25s → 35s to give all markets a fair shot.
- Google Lens + eBay-visual stay out of this path because they take Image not query string — those run from /scout instead.
- eBay path gated by FSCOUT_SERPAPI_KEY presence — silently skipped if not configured.
Fix
Recency decay on feed ranking — old items sink, fresh items surface
After fixing the scanner dedup deadlock + manually backfilling 180 fresh items into the founder's feed, the top-K STILL served the same March Levi's. Reason: legacy items had similarity_score=0.98 (an artifact from an old scoring era when scores ran hot) while fresh items get ~0.7. The ORDER BY blended_score query naturally favored the artificially-high old scores. Added exp(-days/14) recency decay to BOTH the personalized and fallback ranking branches. A 14-day-old item is now at 0.37× its score, 30-day at 0.12×, 55-day at 0.019×. Fresh items keep ~1.0×.
- Personalized branch: blended_score multiplied by exp(-days/14)
- Fallback branch: similarity_score multiplied by the same decay (renamed to decay_score for clarity)
- Verified against founder's feed: top-8 flipped from 5 same March Levi's denim → 8 fresh Yohji Yamamoto items from today's scan (Ground Y, AAR Bomber, Y'saccs, Y's At Work)
- 14-day half-life chosen for active resale-shopper cadence; tunable via the divisor
Fix
ACTUAL FIX: scanner now rotates queries — was running same 8 brands → ON CONFLICT ate everything
Founder said 'nothing got fixed yet'. Drilled in. Their feed was 153 items from March 25, 55 days stale. Earlier today's auto-scan fix would have helped IF a scan-me had actually produced new items. Ran scan_for_user against admin@on9.fashion locally to debug: returned 0 candidates. extract_taste_profile produced sensible queries (Armani, Chrome Hearts, Alexander McQueen, etc.). Grailed/Mercari returned 200+ results. But store_feed_items inserted ZERO rows — every URL was already in taste_feed under (user_id, marketplace_url) UNIQUE. The scanner has been hitting Grailed with the same 8 brand keywords for 55 days, returning the same listings, all dedup'd at the database. The user's feed couldn't grow even with daily scans.
Rewrote build_search_queries to produce 16 ROTATED queries per scan: every brand bare, every brand × randomly-chosen style, every style as standalone, random style pairs. The query set shuffles. Different Grailed/Mercari pages hit each scan. Also manually ran a brute-force varied scan against the founder's account RIGHT NOW: 180 new items inserted, feed went 153 → 324. Their next /taste visit will see real variety.
- build_search_queries now produces 16 queries (was 10), all rotated randomly per scan
- Every liked brand gets a bare query AND a brand×style query (was: cap 8 brands, only top 3 got styles)
- Style-only queries ALWAYS fire (was: only when <3 brands — useless for users with broad taste)
- Random style pairs for cross-aesthetic discovery
- Result shuffled so consecutive scans hit different marketplace cache states
- Manually backfilled founder's feed: 153 → 324 items with 180 fresh from today's varied scan
New
Taste engine now learns from saves, clicks, and search queries (not just thumbs-up)
Founder asked 'doesn't taste build on what items it likes during search and other interactions like Pinterest?' Investigation: only explicit /api/taste/rate likes fed the engine. Saves, clicks on chat-result cards, and search queries all flowed past the taste signal pipe. Three small wires fix that.
- Save → taste signal: /api/collection/save now fires _record_taste_signal_async(signal='save', weight=0.70) right after the row insert. Every save is now a strong soft positive — the engine learns from your collection.
- Click → taste signal: product-card's two <a> tags (image + body) wire onClick + onAuxClick to recordTasteClick. Every click on a result tile in /chat is a weak positive (weight=0.20). The recordTasteClick helper has been exported in src/lib/api.ts since v1 but no component called it — now it does.
- Search → taste signal: /chat/stream now fires _record_taste_signal_async(signal='search', weight=0.10) on every text query (≥3 chars, paid tier). Query text gets embedded by FashionSigLIP and merged into the taste_vector on the next recompute. Soft nudge — searching for 'rick owens' nudges taste toward that aesthetic without overwhelming explicit likes.
- New signal type 'search' added to SIGNAL_WEIGHTS dict (was: click/save/hover/dismiss). Synthetic query://<hash> URL key so identical searches upsert into one row instead of creating duplicates.
- All three writes go through a single _record_taste_signal_async helper that does INSERT + bg text embedding in a daemon thread — caller never waits.
Fix
P0 taste-engine bugs: taste_summary wrong-column + taste_vector not refreshed on rate
Deep investigation after founder said 'ai taste engine is not working and is always static, need full investigation and be humble'. Found two real bugs in the learning chain (separate from the staleness/mark-seen fixes from 72501ba):
(1) taste_profiles PK is `id` (TEXT, set to user_id) — NO `user_id` column exists. taste_summary.py was running `WHERE user_id = %s` against it on every call. Every query threw UndefinedColumn, was caught by bare except, returned None/False. Result: needs_recompute() always False → recompute_summary() never ran → taste_summary stayed empty string for every user → taste_rerank.add_taste_match_score() no-op'd because get_summary returned empty → LLM-summary-driven ranking signal has been DEAD since the column rename.
(2) recalculate_taste_vector was only invoked from /api/taste/scan-me — never from /api/taste/rate. Users who rated dozens of items but didn't manually click Scan had a stuck taste_vector that didn't reflect their new likes.
Both fixed. Also wrote an honest architecture analysis (see commit body): the discovery side of the engine is keyword-based against a hardcoded 60-brand list; the embedding cosine only re-ranks within that keyword-filtered pool. The 'AI' label is generous.
- taste_summary.py: needs_recompute() + get_summary() + recompute_summary INSERT all changed from WHERE user_id = %s to WHERE id = %s (matches the actual schema). LLM summary signal is now wired through for real.
- /api/taste/rate background embedding thread now also calls recalculate_taste_vector after UPDATE-ing item_embedding. Taste vector now refreshes on every rating (~1s after the like), not just on manual scan.
- Honest architecture note in code: scanner is keyword-based on hardcoded brands/styles, not embedding-driven. Discovery breadth is bounded by the 60-brand KNOWN_BRANDS list, not by the taste_vector.
Fix
P0 taste-engine overhaul: stale-feed auto-scan + mark-seen on every load + /clusters merged into /taste
Founder reported: 'why the fuck taste engine still recommends the same items for a week after I asked to fix this every day and still the same thing also why matrix implemented as different page ... ranking on taste engine still doesn't work'. Pulled their actual data from prod, found three structural bugs. Founder's admin@on9.fashion account had 153 feed items but the most-recent insert was 2026-03-25 — 55 days stale. All 153 items were stuck at status='unseen' because mark-seen only ran on manual refresh click. The /taste/clusters page existed as a separate '10×10 matrix' surface when the founder just wants /taste itself to show ~100 clothes. Fixed all three.
- FIX A — /taste auto-scan triggers when feed is >24h stale, not just when warming_up empty. The original gate only fired for first-visit users with no data; founder's data made status='personalized' so the scan never ran and the same top-K kept surfacing. New gate: status='warming_up' AND no items, OR newest item's created_at > 24h ago, OR feed has <30 unique items. 5-min throttle (was 60s) so a refresh storm doesn't hammer scrapers.
- FIX B — mark-seen now runs on EVERY fetchFeed (not just manual refresh). Each served item gets status flipped 'unseen' → 'seen' immediately. Next fetch picks up fresh unseen items first. Was the actual root cause behind 'same items every visit'.
- FIX C — /taste/clusters page replaced with a redirect to /taste. The '10×10 matrix' surface is gone — founder explicitly didn't want it. /taste page (which already has 100+ item grid via infinite scroll) now serves the role. Filter-bar 'Clusters' discoverability link removed too.
- FIX D — personalized scans now pull 20/query (was 10), cold-start unchanged at 25. Doubles the chance each scan introduces actually-fresh items vs dedup-ing against the existing pool.
- Verifier updates: 'Clusters link visible' check inverted to 'Clusters link removed'; /taste/clusters probe now asserts redirect lands on /taste; /taste CDN-image health check moved here too.
Polish
Copy fix: /taste/clusters warming_up subtitle
Captured a visual of /taste/clusters on prod and noticed the warming_up subtitle 'Like a few more pieces and refresh — the matrix sharpens with each signal' implied the user had pieces to like OUTSIDE this view. But cold-start users have nothing to like elsewhere (the chicken-and-egg trap the auto-scan was built to escape). New copy nudges action ON the tiles in front of them.
- Old: 'Like a few more pieces and refresh — the matrix sharpens with each signal.'
- New: 'Tap pieces you like below — the matrix sharpens with each signal.'
- Visual confirmed earlier today's auto-scan polish is delivering: real archive items rendering (Helmut Lang-style outerwear, Margiela-style coats, Rick Owens sneakers, CDG patterns), 45 images loaded across 5+ cluster rows.
Infra
Verifier: bump login wait_for_url timeout 35s → 60s (flake fix)
The verifier's login step had two recurring failure modes today, both transient: NextAuth credentials rate-limit pressure (mitigated by the existing 60s-retry-between-attempts) AND the wait_for_url after submit hitting its 35s ceiling when Vercel is cold + Railway is warming the model thread. The chain (cosmos 2-step → NextAuth authorize() → backend /api/auth/login → JWT mint → redirect to /chat) can take 30-45s end-to-end on a cold environment. Bumped the wait_for_url ceiling to 60s — generous enough to absorb the worst observed latency, tight enough to still surface a genuinely-hung login as a failure.
- _attempt_login_inner: wait_for_url timeout 35000 → 60000.
- Cluster CDN images held at 43/43 in the post-fix run (the 28 → 43 → ? trend continues to densify with each cold-start scan).
- Verifier still at 94/94 PASS — patch is verifier-only, no backend change.
Polish
Polish: cold-start scans pull 25/query (was 10) — denser cluster matrix
Yesterday's auto-scan polish (485beed) densified /taste/clusters from 14 → 28 tiles. Still half-empty vs the 100-tile spec. Root cause: each editorial seed query was pulling only 10 results from Grailed + 10 from Mercari = 20/seed × 8 seeds = 160 raw → ~30-50 after dedup. Bumping cold-start specifically to 25/query yields 400 raw → ~80-150 dedup, enough to fill most of the matrix on first visit. Personalized users keep at 10/query — their queries are narrower so each result is more relevant.
- Cold-start path (`not rated_titles`): per_query_limit = 25 (was 10).
- Personalized path: unchanged at 10/query.
- Expected effect on next /taste/clusters cold-start: matrix density jumps from ~28 tiles toward ~80-100.
- Cost: ~2.5x marketplace API hits per cold-start scan. Worth it since cold-start scans are 1-2 per user (auto-fires on /taste OR /taste/clusters first visit only).
Polish
Polish: /taste/clusters auto-scans on cold-start (was 15 tiles, promised 100)
Found while reviewing verifier output. The cluster page header promises a '10 × 10 matrix' but cold-start Pro users were getting ~15 sparse tiles — the Myntra filter from earlier this week cut 80% of the foundation pool, and the warming_up status's subtitle 'Like a few more pieces and refresh' is a chicken-and-egg trap (nothing to rate from). Mirror the /taste page pattern: when the cluster endpoint returns status=warming_up, auto-trigger /taste/scan-me in the background + refetch once it lands. The sparse 15 stays visible during the 5-15s scan so the user sees movement; the dense matrix swaps in when the scan returns.
- useCallback-wrapped fetchClusters helper.
- After first fetch: if status === 'warming_up', POST /taste/scan-me + refetch.
- Sparse matrix stays visible during the scan window — no spinner blocking.
- Same editorial-seed scan that already works on /taste populates the foundation pool, so the second fetch returns real archive items (Margiela, Raf Simons, etc).
Infra
Verifier: /api/auth/signup backend validation locked in (94/94)
/api/auth/signup had no backend regression coverage — only the frontend 3-step form was probed. Added 4 validation probes that exercise the rejection paths (missing email, missing password, bad email format, short password) using deliberately-bogus inputs so no real user gets created. Skipped: the 10/hr/IP rate limit (would burn the budget) + the XSS-in-name guard from d644e60 (would require an otherwise-valid body which creates a real user as a side effect).
- POST /auth/signup with missing email → 422 (Pydantic body validation)
- POST /auth/signup with missing password → 422
- POST /auth/signup with bad email format → 400 ('Invalid email format')
- POST /auth/signup with 1-char password → 400 ('at least 8 characters')
Infra
Verifier: lock /api/admin/internal/* secret guard (90/90) — payment-critical
The /api/admin/internal/set-tier + /set-tier-by-customer endpoints are called by the Next.js Stripe webhook handler to update user tiers — a missing/wrong guard would let anyone promote themselves to admin. They use _verify_internal (hmac.compare_digest, locked in via ddc514a) and uniform 'Forbidden' responses. Added 4 regression probes confirming both endpoints return 403 with no secret and 403 with bad secret.
- POST /admin/internal/set-tier with no x-internal-secret → 403
- POST /admin/internal/set-tier with bogus secret → 403
- POST /admin/internal/set-tier-by-customer with no secret → 403
- POST /admin/internal/set-tier-by-customer with bogus secret → 403
- These complement the existing internal-webhook probes (send-digest/check-prices/scan-trigger) — together they cover every _verify_internal-gated route.
Infra
Verifier: lock in /api/auth/forgot-password no-enumeration (86/86)
The endpoint deliberately returns 200 + 'if an account exists, a reset link was sent' regardless of whether the email is registered — common security practice to prevent attackers from enumerating valid email addresses. Locked in with a regression probe so a future commit that 'helpfully' returns 404 for unknown emails gets caught.
- POST /api/auth/forgot-password with no-such-user-xyz789@example.test → 200
- Response body MUST contain 'if an account exists' (the generic message).
- 5/hr/IP rate-limit NOT exercised here — would burn the budget for the verifier IP for the rest of the hour.
Fix
Rate-limit + scoping probe on /api/on9/market-check (SerpAPI cost protection)
Each /api/on9/market-check call fires shopping_service.search_marketplaces — SerpAPI + eBay + scraper hits at ~$0.01-0.05 per call. Endpoint was properly user-scoped (404 on missing/foreign item) but had no rate limit. A paid user could loop on this and rack up real costs against us. Added 30/hr/user rate limit (generous for honest workflow, expensive to abuse) + a scoping probe.
- 30/hr/user rate limit via _rate_limit_or_429 (key=f'on9-market-check:{user_id}').
- Audit confirmed /api/on9/stats + /api/on9/inventory/{id}/listing are also user-scoped (no IDOR).
- New probe: POST market-check on missing item → 404 (locks the scoping in).
Infra
Verifier: lock in /api/saved-searches user-scoping (83/83)
Audited the 4 saved-searches endpoints. All properly user-scoped via composite (id, user_id) predicates — DELETE returns 404 instead of 403 to avoid leaking existence of other users' searches. No IDOR holes. Added 3 regression probes locking the behavior in.
- DELETE /api/saved-searches/<missing> → 404
- POST /api/saved-searches/<missing>/refresh → 404
- GET /api/saved-searches → 200 (endpoint reachable)
- Audit confirmed: all 4 endpoints (GET list, POST create, DELETE, POST refresh) scope by user_id at the SQL layer.
Fix
Sweep: migrate 3 webhook endpoints + cron to constant-time secret compare
Audited every x-internal-secret + x-api-key check site. Found 3 webhook endpoints (send-digest, check-prices, scan-trigger) and 1 cron endpoint (taste/scan-all) using inline `secret == header` — same timing-oracle pattern I already fixed in _verify_internal (ddc514a). Now all 3 webhook endpoints delegate to _verify_internal (hmac.compare_digest + uniform 403 Forbidden). The cron endpoint with its two-key fallback (cron_key OR webhook_secret) uses hmac.compare_digest directly. Plus 4 new regression probes locking in the 403 responses.
- /api/notifications/send-digest → _verify_internal (was inline `secret != header`).
- /api/collections/check-prices → _verify_internal (same).
- /api/taste/scan-trigger → _verify_internal (same).
- /api/taste/scan-all (cron, dual-key) → hmac.compare_digest for both cron_key and webhook_secret fallback paths.
- step_internal_webhook_guards added — 4 probes: send-digest (no secret + bad secret), check-prices (no secret), scan-trigger (bad secret). All MUST return 403.
- Audit confirmed /api/notifications GET/PATCH/read-all are all properly user-scoped (no IDOR).
Fix
Harden /api/auth/verify-email — 1M-code brute force was possible
Audit found real exploitability. /api/auth/verify-email had NO rate limit AND would accept unlimited wrong codes against the same in-flight entry until its 15-minute expiry — letting an attacker who knows a target email try the full 1M 6-digit space in well under the window. /api/auth/send-verification had no rate limit either, so a session-cookie-thief could email-bomb the user's inbox.
- verify-email: 10/hr/IP rate limit (generous for real users mistyping, tight against brute-force).
- verify-email: 5-attempt-per-code lockout. After 5 wrong tries the code is burned and the user must request a new one — this is the real defense, because IP rate-limits can be defeated by cloud IP rotation but every code requires a fresh /send-verification call which is itself rate-limited per user.
- verify-email: constant-time hash compare via hmac.compare_digest (defense-in-depth, sha256 is fixed-length so timing leak is minimal but matches our ddc514a pattern).
- send-verification: 3/hr/user_id rate limit. A real user who deleted the first email can retry twice; an attacker can't grind the inbox.
- Backward-compat: _verification_codes entries are now 4-tuples (hash, expires_at, attempts, max_attempts) — verify-email accepts both old 2-tuples (in-flight at deploy time) and new 4-tuples.
- Regression probe: POST /verify-email with no-code-issued email → 400 (surface contract).
Fix
Harden /api/on9/inventory/{id}/list + lock existing price/sold guards
/api/on9/inventory/{id}/list (mark items as listed on marketplaces) accepted any string in the platforms array — no XSS strip, no length cap, no count cap, no dedupe. Sibling to the inventory POST + PUT XSS strips from yesterday. Hardened + added 7 regression probes covering price/list/sold input validation (some of these guards already existed from 2026-05-17 — never had probes).
- platforms[] now validated: at-least-1, max 20 entries, max 80 chars per platform name, XSS strip (javascript:/data:/vbscript:/file: prefixes, <script/<iframe substrings), case-insensitive dedupe, whitespace trim.
- Returns 400 for: type-mismatch, empty list, oversized list, XSS payload. Same error vocabulary as the inventory POST/PUT.
- 7 new regression probes in step_on9_action_guards covering: price (negative list, min > list), list (empty, XSS, too many), sold (negative price, empty platform).
- Existing 2026-05-17 guards on /price + /sold (negative-price rejections, empty-platform reject) had no test coverage — now locked in.
Fix
Harden /api/auth/reset-password — 8-char min + rate limit + probe
Found two real gaps while auditing the password-reset flow. (1) NO minimum password length check on reset — signup requires 8+ chars but reset accepted anything, including a 1-char password. A user phished into a 'set to 1' reset would have a trivially-guessable account. (2) No rate limit on the reset endpoint — an attacker could brute-force the 1-hour token space. Both fixed; regression probe added.
- Minimum 8-char password enforced on reset-password (parity with signup).
- Rate limit: 10/hr/IP on /api/auth/reset-password — tight enough to make token brute-force expensive, loose enough that a real user's retry doesn't get blocked.
- New step_reset_password_api_guards probe: short password → 400, empty password → 400, bogus token (with valid password) → 400. All green on prod.
- Verifier total: pending re-run (was 65/65, expecting 68/68).
- Note: in-memory _reset_tokens dict still process-local — fine for single-replica Railway today, but a horizontal-scale day would need Redis-backed token store.
Infra
Verifier: /api/curator/* 403-on-non-admin coverage (65/65)
Audited the 6 /api/curator/* endpoints (approve, reject, profile, discover, score, reset). All cleanly gated via Depends(_require_admin) — same canonical mechanism used by other admin routes. Locked in via the existing step_admin_endpoints_reject_non_admin probe extended to sample 3 curator paths.
- Extended admin-probe to also check: GET /curator/profile, GET /curator/discover, POST /curator/reset. All MUST be 403 for the Pro test account.
- Verifier total: 65/65 PASS (was 62/62).
- Audit confirmed all 6 curator handlers use Depends(_require_admin) — clean baseline.
Infra
Verifier: schema-level regression probes for /api/billing/checkout
Payment-critical endpoint had no regression coverage. Three schema-level probes added — exercises the validation surface without ever calling stripe.Customer.create or stripe.checkout.Session.create (would burn Stripe credits + create real-side artifacts on every verifier run). Total verifier coverage: 62/62 PASS.
- Empty body → 422 (CheckoutRequest tier field is required).
- tier='admin' → 400 ('admin' not in price_id_map, no upgrade path — confirms the lookup-table doesn't accept arbitrary tiers).
- interval='eternal' → must not raise a Python exception (intentionally tolerant: interval falls back to 'monthly' for unknown values, which is the existing behavior; we just don't want a 5xx).
- The happy-path 200-with-Stripe-URL is NOT probed — explicitly skipped to avoid creating Stripe-side artifacts on every cycle.
Infra
Verifier: lock in /api/admin/* 403-on-non-admin behavior
Audited every /api/admin/* endpoint for IDOR/auth gaps. All 13 endpoints properly gated by _check_admin_inline (tier=='admin' OR email in ADMIN_EMAILS). Added a regression probe that POSTs/GETs to 5 sampled admin paths (stats, users, metrics/overview, foundation/status, refresh-user-tier) as the YC Pro test account and asserts every response is HTTP 403. Catches any future commit that accidentally removes the admin gate.
- step_admin_endpoints_reject_non_admin: 5 admin endpoints probed, all MUST return 403 for a Pro-tier (non-admin) caller.
- Verifier total: 59/59 PASS (was 54/54). The 13 internal endpoints (those gated by _verify_internal with the shared secret) are NOT covered here; their gate is a different mechanism and the test account has no internal secret.
- Audit confirmed no IDOR holes — all admin handlers call _check_admin_inline (or the equivalent inline check) before any other work.
Fix
Stripe webhook: fix silent-drop on DB failures + timing-safe internal compare
Audit of /api/webhooks/stripe found two issues. (1) customer.subscription.updated and customer.subscription.deleted caught DB exceptions silently and returned 200 to Stripe — Stripe saw success, didn't retry, user got stuck at the wrong tier on any transient DB blip. Now re-raises so Stripe's exponential-backoff retry kicks in (immediate, 5m, 1h, ... up to 3 days). (2) The internal-endpoint shared-secret check used `provided != secret`, which is a timing oracle. Switched to hmac.compare_digest. Attack surface was small (Vercel-edge-to-Railway is the only realistic caller) but the fix is one import + one line.
- subscription.updated: removed silent-catch. DB error → 500 → Stripe retries. 'No user found' (race with checkout-session-completed) still returns 200 + warn-logs, so Stripe doesn't retry forever for legitimately-not-our-customer events.
- subscription.deleted: same fix.
- _verify_internal: hmac.compare_digest + uniform 'Forbidden' response so missing-config and bad-secret can't be distinguished via body or timing.
- checkout.session.completed handler unchanged — it doesn't have the silent-catch pattern.
- Webhook signature verification + ISS-016 tier-resolution-from-prices already in place; not touched.
Infra
Housekeeping: removed dead TasteEngine.get_feed + add_to_feed methods
Completing the deprecation that started with fd92f81 (taste/monitor → scan-me). Both v1 methods wrote to / read from the legacy taste_feeds (plural) table that no user-facing query reads from. Confirmed via grep that nothing calls them anywhere — including services/, scripts/, and frontend. ~100 LOC of zombie code that ran every cold-start init_db only to sit unused. Now gone. The taste_feeds CREATE TABLE statements are intentionally left in init_db (destructive DB change deferred to a DBA pass).
- Removed TasteEngine.get_feed (read from taste_feeds plural)
- Removed TasteEngine.add_to_feed (write to taste_feeds plural)
- Comment block at the removal site explains the lineage for anyone confused why taste_feeds still exists.
- Net -97 lines.
Fix
Sweep: stop leaking raw exception text across 6 more endpoints
Continuation of the b3293d5 Stripe-leak fix. Grepped api.py for every `detail=f'...{exc}'` and `detail=str(exc)` pattern outside of intentional user-facing messages. Found 6 sites surfacing raw psycopg / SQLAlchemy / embedder exception text — column names, constraint detail, INSERT statement text. Each becomes a generic message + a server-side log with user_id context for triage.
- /api/auth/reset-password — 'Failed to update password. Please try again.' (was leaking SQL constraint detail).
- /api/auth/update — 'Failed to update profile. Please try again.'
- /api/admin/internal/set-tier-by-customer — 'set-tier-by-customer internal error' (matters for Stripe webhook retry logs which would otherwise echo the exception).
- /api/taste/rate — 'Failed to save rating. Please try again.' (was leaking INSERT INTO taste_ratings text).
- /api/price-alerts/create — 'Failed to create price alert. Please try again.'
- /api/admin/foundation/search — 'Embed failed' (admin-only but still cleaner not to surface embedder internals).
- Two intentional `str(exc)` paths kept: 409 duplicate-save messages (users SHOULD see 'Item with url X already saved') and the SSRF DNS 400 (caller-input-error class).
Fix
Deprecate /api/taste/monitor — was writing to the same dead taste_feeds (plural) table
Same family as the save_from_feed fix from earlier today. /api/taste/monitor scraped marketplaces and called taste_engine.add_to_feed which writes to taste_feeds (plural) — the legacy table that NO user-facing query reads from. So every call burned SerpAPI/scraper credits + database writes for zero user impact. Frontend exported a wrapper (runTasteMonitor) but no UI page calls it. Re-wired the endpoint to delegate to /api/taste/scan-me (the v2 endpoint that writes to taste_feed singular and has the editorial cold-start scan).
- /api/taste/monitor now delegates to taste_scan_me — same auth/tier/quota semantics, items land in the right table, cold-start gets editorial seeds.
- External API consumers (if any) keep working; they now get real items in their feed instead of orphan writes.
- Note: the legacy TasteEngine.add_to_feed + get_feed methods + the taste_feeds table itself still exist in store.py for backwards-compat; they should be removed in a follow-up housekeeping pass.
Fix
Billing: stop leaking raw Stripe exception text to the user
Both /api/billing/checkout and /api/billing/portal had an `except Exception as exc: raise HTTPException(500, f'Stripe error: {exc}')` pattern that surfaced the raw Stripe SDK exception to the browser. Those messages include internal IDs (customer cus_XXX, price price_XXX, request req_XXX) which are useless to the user and enable enumeration. Now log full detail server-side with the user_id + tier context, return a generic 'Could not start checkout. Please try again in a moment.' (502) to the client.
- /api/billing/checkout: 500 → 502 with generic message. Server log includes user_id/tier/interval + the original exc for triage.
- /api/billing/portal: same treatment.
- Existing 503 'Stripe is not configured' + 400 'Invalid tier' + 503 'price misconfigured' paths unchanged — those are deterministic + safe to keep.
Fix
Sweep: 10MB upload cap applied to ALL file endpoints (was missing on 6 of them)
Audited every `await file.read()` site in api.py after fixing detect-garments earlier. Of the 10 sites, only /api/scout had the 10MB cap; 6 others (/api/shop, /api/search/hybrid, /api/on9/inventory, 3 curator endpoints) had NO cap at all, /api/chat regular had the cap but a bare-except was silently swallowing the 413, and /api/chat/stream was missing the cap. Now uniformly capped via a new _enforce_image_size(contents) helper and the MAX_IMAGE_BYTES constant.
- New MAX_IMAGE_BYTES = 10MB constant + _enforce_image_size() helper next to _safe_decode_image. Raises 413 with 'Image too large (max 10MB)'.
- Applied to: /api/shop, /api/search/hybrid, /api/on9/inventory (when file provided), /api/curator/approve, /api/curator/reject sibling, /api/curator/score, /api/chat/stream.
- Fixed /api/chat regular: inline 413 raise was wrapped in a bare-except that downgraded it to a 200-with-no-image warning. Now properly re-raises HTTPException.
- /api/scout + /api/chat/detect-garments keep their existing inline checks (refactor to helper is a follow-up).
- Net effect: a paid user (or compromised session) can no longer POST a 100MB image to any endpoint and OOM a worker. The smallest gap was probably /api/search/hybrid which also runs FashionSigLIP embedding on the giant raster — double cost.
Fix
Cap /api/chat/detect-garments at 10MB to match /api/scout
Found while sweeping file-upload endpoints for missing limits. /api/chat/detect-garments accepted arbitrary-size uploads — a 100MB image would balloon PIL memory, burn SegFormer + Moondream + VLM tokens scanning a giant raster, and eventually OOM the worker. /api/scout already caps at 10MB; this aligns the sibling endpoint.
- 10MB cap on detect-garments uploads (returns 413 with 'Image too large (max 10MB)').
- Endpoint already had auth + Plus/Pro paywall + 60/hr/user rate limit; this just bounds the worst-case payload per request.
- Refactored the except clause so the new HTTPException doesn't get swallowed by the existing catch-all 'Invalid image' handler.
Infra
Verifier: schema-level /api/scout contract probes (empty + garbage)
Adding regression coverage for the founder's core product (Scout — photo→detect→search). Running the full image-upload pipeline on every verifier cycle would burn SerpAPI credits, so this is a schema-level guard: prove the endpoint rejects bad inputs cleanly. The actual photo→marketplace flow is exercised manually + by real users; this catches accidentally widening the surface (e.g. losing the file-required guard or the PIL decode safety).
- step_scout_contract added — POSTs an empty multipart (asserts 422 from File(...) requirement) + 64 bytes of zeros (asserts 422 from _safe_decode_image PIL guard).
- Verifier total: 54/54 PASS (was 52/52). YC test account is Pro tier so the 402 free-tier check is not exercised here; left as future coverage when a free test account is wired.
- Future: when we wire a free test account, add a 402-on-free probe; when we add a stable test image fixture, add the full pipeline probe with marketplace assertions.
Fix
Follow-up: save_from_feed wrote nonexistent boolean columns — really fixed now
Yesterday's a116912 fixed the TABLE name (taste_feeds → taste_feed) but it was still writing kwargs `saved=True, seen=True` to a table whose schema doesn't have those columns. taste_feed has a TEXT `status` column with values 'unseen'/'seen'/'dismissed' — there is no boolean `saved`. So my UPDATE failed because `column 'saved' does not exist`, the exception was caught + logged, function returned False, API returned 404. Verifier caught it. Now writing status='seen' (correct schema) and the saved-items mirror handles the 'saved' concept.
- _set_feed_flag now driven by status='seen' / status='dismissed' kwargs — matches the actual TEXT column on taste_feed.
- save_from_feed: flag the row as 'seen' + insert into saved_items (the canonical 'saved' marker, since /collection reads from saved_items).
- mark_seen + dismiss methods updated to use status='seen' / status='dismissed' instead of boolean kwargs.
- Will verify post-deploy that step_taste_save_lands_in_collection turns from FAIL → PASS.
Fix
CRITICAL FIX: /collection was permanently empty for everyone — bad table name in feed flag write
The actual root cause behind the founder's 'mytaste has to work on collection' complaint. The TasteEngine._set_feed_flag() helper (which save_from_feed, dismiss, mark_seen all delegate to) was running its UPDATE against `taste_feeds` (plural — an empty legacy table created in init_db but never populated). Every other query in the codebase uses `taste_feed` (singular). So every save / dismiss / mark_seen call has been silently affecting 0 rows for who knows how long — API returned 404, user saw 'failed to save' toast, /collection always empty. Fixed the table name + ALSO made save_from_feed mirror the feed row into saved_items so it actually shows up in /chat?view=collection.
- _set_feed_flag now writes to `taste_feed` (singular) — fixes save / dismiss / mark_seen for every user.
- save_from_feed now ALSO inserts a saved_items row (via save_collection_item) — the feed flag alone wasn't enough since /collection reads from saved_items.
- Duplicate-URL saves gracefully no-op via the existing UniqueViolation handler in save_collection_item.
- New verifier probe step_taste_save_lands_in_collection: saves first feed item, asserts POST returns 200 (was 404 before fix), then verifies it appears in /collection/saved by url match, then cleans up the test row.
- This bug was masking everything else. The XSS hardening, Myntra filter, cold-start scan, and routing fixes from the last 2 days were all real improvements, but none of them mattered until the user could actually SAVE something. Now they can.
Polish
Polish: visible 'Scanning archive marketplaces…' state during cold-start auto-scan
After yesterday's editorial cold-start scan (2526bb1), new paid users hit /taste, the auto-scan fires invisibly, and they stare at the 'Nothing waiting yet — Scan to surface pieces' empty state for 10-15 seconds until items pop in. No feedback that something is happening. Wired the scanning UI flag into the auto-scan too — the button label flips to 'Scanning…' and the empty-state heading + body copy updates to 'Scanning archive marketplaces… / Pulling editorial pieces from Grailed + Mercari. Takes ~10 seconds.' Real wait, real message.
- Auto-scan effect now calls setScanning(true) at start, setScanning(false) in a finally block.
- Empty-state copy in /taste page switches based on `scanning` flag — clear feedback during the network round-trip.
- Same 60s lastScanAtRef throttle remains so re-renders don't fire duplicate scans.
Polish
Polish: Collection icon updates URL in place instead of bouncing through /collection
Audited /collection after yesterday's icon-routing fix. Turns out /collection is a deprecated route that immediately redirects to /chat?view=collection (LegacyCollectionRedirect, see src/app/(app)/collection/page.tsx). So my fix made clicking the Collection icon do: navigate to /collection → URL changes → React unmounts /chat → /collection mounts → useEffect router.replace → /chat re-mounts from scratch. Wasteful. Now: stay in /chat, flip the view via setView + fetchCollection, AND router.replace the URL so the address bar reflects what the user clicked. That was the founder's original complaint anyway — that the URL didn't change.
- Collection icon → chat.setView('collection') + fetchCollection() + router.replace('/chat?view=collection', { scroll: false }). No full re-mount, address bar reflects state.
- Search-engine icon → setView('chat') + router.replace('/chat'). Same in-shell pattern.
- Taste + Settings continue to navigate() because they're real dedicated pages, not deprecated redirects.
- Updated step_chat_sidebar_routing assertion: accept either '/collection' (legacy) OR 'view=collection' (canonical) so both code paths verify.