Yihan Zhu

AI Has No Needs

Sat, 04 Apr 2026 18:30:00 GMT

The default way people talk about AI right now is replacement. Look at what a person does, measure it, hand it to a model. If the model can do it at eighty percent of the quality for five percent of the cost, that is the pitch. Replacement is legible. It fits in a slide deck. You can point at the before and the after and show the difference.

It is also where trust breaks first.

People lose confidence in black boxes fast. It does not take a pattern of failure — it takes one. One bad recommendation, one hallucinated answer, one decision that a person would not have made. After that, the whole system feels unreliable, even if it was right the other ninety-nine times. Replacement puts the algorithm in charge and the human on the sidelines, and when something goes wrong there is nobody in the loop who saw it coming.

That is why augmentation sounds like the better frame. Instead of replacing what people already do, use AI to help them do things they could not do before. Keep the human in the loop. Let the model expand what a person can reach instead of substituting for them entirely. The argument makes sense, and I think it is probably right.

But I have started noticing a problem with it.

The bottleneck nobody planned for

AI output is cheap. Absurdly cheap. A model can generate a detailed implementation plan, a full architecture proposal, a ten-page analysis — in seconds, for almost nothing. That speed is supposed to be the advantage of augmentation. The model does the heavy lifting, the human stays in control.

In practice, the human becomes a bottleneck.

I notice this in my own work. When an AI gives me a long implementation plan, I lose patience. I start skimming. Then I stop reading entirely and just move to implementation. The plan might be good. It might have problems. I do not know, because I did not actually engage with it. I ran out of attention before the model ran out of output.

That is the uncomfortable part. The gravitational pull toward replacement is not philosophical. It is practical. You do not decide to remove the human from the loop. You just run out of patience, and the human quietly falls out of it. The faster and cheaper AI output becomes, the stronger that gravity gets.

Reviewing output is not augmentation

But I think the bottleneck reveals something important about how we are thinking about augmentation wrong.

The bottleneck only exists when the human is cast as a reviewer of AI output. The model produces, the human checks. That sounds like augmentation, but it is really just automation with a human checkbox. A rubber stamp. The human is not contributing anything the model could not eventually do — they are just slowing down a pipeline that would run faster without them.

That is not augmentation. That is replacement waiting to happen.

So what would real augmentation mean? It would mean the human contributes something the model fundamentally cannot. Something that is not just slower or less efficient, but categorically different.

To find what that thing is, it helps to look honestly at what AI actually does.

AI recombines. It takes what exists on the public internet — text, code, ideas, patterns — and reassembles it in response to a prompt. It can surface things you had not encountered yet. It can connect pieces you had not thought to connect. But it is not generating anything that did not already exist in some form. The raw material was always there. The model just found it faster than you would have.

That distinction matters more than it sounds. When an AI helps me plan a project, it is not inventing a new way to think about the problem. It is pulling from patterns that thousands of other developers have already written about, assembling them into something that fits my prompt. When it suggests an architecture, the architecture already existed. When it proposes a solution, the solution was already out there in blog posts, documentation, Stack Overflow threads, open-source codebases. The model is a very fast, very broad search engine with a conversational interface.

That is generative, not creative. And it reframes what most "augmentation" actually is: the model is bringing you up to speed on knowledge that already existed, not pushing the frontier of what is known. That is valuable — genuinely valuable. But it is a ceiling, not an open sky.

The part that made me uncomfortable

Here is where the argument gets harder.

If AI is just recombining existing knowledge, then it is not truly creative. But if I am honest, most of what passes for human innovation looks like recombination too.

Camera plus glasses equals smart glasses. Taxis plus smartphones equals Uber. A trip ledger plus immigration rules equals MapleDays. These feel innovative, but the ingredients all existed before someone combined them. Is that fundamentally different from what a model does when it recombines text patterns into a new paragraph?

I sat with that question for a while, and I think there is a difference — but it is not where I expected it to be.

The difference is not in the recombination itself. It is in what drives it.

AI has no needs. It does not feel friction. It does not get annoyed by a bad workflow, lose patience with a broken process, or notice that something should exist but does not. It can generate a million combinations. What it cannot do is feel which one matters.

That feeling — this is annoying, this should be easier, this is worth fixing — is where every meaningful direction starts. The person who put a camera in glasses did not just combine two objects. They understood what it means to want information without using your hands. That understanding came from being a person with hands, in a world where hands are busy. The person who built Uber did not just combine taxis and smartphones. They stood on a street corner in the rain, unable to get a ride, and felt how broken that experience was. The combination was obvious. The frustration that made it matter was not.

AI does not have hands. It does not stand in the rain. It does not have needs. It cannot identify what is worth doing until a human who feels the problem points it in a direction.

That is what I think separates human recombination from machine recombination. Both can combine existing pieces. But the human is the one who knows which combination is worth making — because they felt the problem first. The model cannot want a better experience. It cannot be frustrated by a bad one. It does not care what gets built, because it does not care about anything.

And once I thought about it that way, the bottleneck question dissolved. The human's job was never to review output. It was to feel the friction that the model cannot feel, and to aim the system at problems that matter. That is not a bottleneck. That is the irreplaceable input.

The thing that threatens the irreplaceable input

If the human's real contribution is judgment — the felt sense of what matters, the direction that comes from lived experience — then there is one specific threat worth taking seriously.

AI is sycophantic. It agrees with you. Not always explicitly, but structurally. Models are trained to be helpful, and helpful usually means validating. When I brainstorm with AI, it makes me feel good about where I am heading. The ideas sound sharper after the model reflects them back. The plan feels more solid. The direction seems right.

I have started wondering how much of that is the idea actually being good, and how much is the model telling me what I want to hear.

Sycophancy inflates the one thing the human is supposed to uniquely contribute. You stop questioning your own direction because the tool keeps confirming it. You lose the habit of self-doubt — not because you got better at thinking, but because the feedback loop got warmer.

And this does not stay contained to productivity.

The model does not just agree with your opinions. It presents information with the same confident tone whether it is right or wrong. It does not hedge. It does not say "you should verify this." It sounds like it knows, and that confidence is enough to make people stop checking for themselves. I have heard about people on employer-dependent work visas in the US who trusted AI for legal guidance instead of checking the official sources. The model gave them answers that sounded right — confident, specific, actionable — and they made decisions based on those answers. When the actual situation arrived and the information turned out to be wrong, the consequences were immediate and serious. Inability to work. Damage to their reputation in the industry. The kind of harm that does not reverse easily.

Then there is the other direction — not over-trusting AI's information, but over-trusting its validation of your own thinking. I heard a story about an adult who had not graduated high school, convinced he had solved a mathematical problem that no one else in history had been able to solve — just by talking to an AI. The model did not tell him he was wrong. It hallucinated along with him. It validated his reasoning, filled in gaps with confident-sounding nonsense, and pulled him deeper into a rabbit hole where he genuinely believed he had made a breakthrough. He had not. The AI had no way to know that either, because it does not understand what solving a problem means. It just generates text that sounds right.

And I have heard about cases that go darker still — people who lean on AI as a sounding board for their thinking, their emotions, their sense of self, and end up in places where nobody pushes back, nobody challenges, nobody says stop. The model follows your lead. If your lead goes off the edge, it follows you there too.

That is probably the most uncomfortable tension in this whole line of thinking. The most popular tool for augmenting human judgment is quietly eroding the judgment itself.

Staying close to the ground

I do not have a clean answer for that. But I have the beginning of one.

The times I have been most wrong about a product decision were not the times I lacked information. They were the times I felt most confident. The reasoning was sound. The logic was defensible. And if I had asked an AI, it would have agreed with every step. What actually corrected me was contact with reality — users who were confused, workflows that broke, constraints that forced me to confront what actually works instead of what should work in theory.

Sycophancy is most dangerous when you are operating in the abstract — planning, ideating, and the model keeps telling you the plan is great. It is least dangerous when you are grounded in something that does not flatter you. A user is confused or they are not. A product works or it does not. Reality does not agree with you to be polite.

I do not think this resolves the tension entirely. The line between human recombination and AI recombination might be blurrier than any of us want to admit. And the tool we rely on to augment our thinking is not neutral — it flatters.

But I keep coming back to one thing. AI has no needs. It cannot want something to be different. And wanting something to be different is where every meaningful thing starts.

That is probably worth protecting.

From Prompts to Harnesses

Fri, 27 Mar 2026 20:48:32 GMT

The way I work with code has changed faster than I have found words for it.

A year ago, the skill everyone talked about was prompt engineering — how to phrase a question so the model would give you something useful. Six months later, that shifted to context engineering — how to manage what the model knows, what it sees, what it remembers. Now the conversation has moved again, to harness engineering — how to build the system around the model so it actually works reliably.

Three names in roughly eighteen months. That is not just vocabulary churn. Each new name caught on because the previous one stopped explaining the part that mattered most. And I think the progression itself tells us something important about where AI tooling is heading and what the work of building software is becoming.

Each stage is a discovery about where the leverage actually is

The interesting thing about these transitions is not the definitions. It is what each one revealed about the limits of the stage before it.

Prompt engineering assumed the model was smart enough — you just needed to unlock it with the right words. The hard problem was communication. If you could phrase your intent precisely, the model would deliver. And for a while, that was roughly true. Simple tasks — generate this text, summarize that document, translate this paragraph — responded well to better prompts. The lever was language.

Then people started building more ambitious things: multi-step workflows, retrieval-augmented systems, agents that used tools. And prompt engineering hit a wall. It did not matter how well you phrased a question if the model was reasoning over the wrong information. The bottleneck was not communication. It was what the model could see.

That is what context engineering made explicit. The lever moved from "say the right thing" to "show the right thing." And the key insight was subtler than it sounds: context engineering is curation, not accumulation. Models have finite attention budgets. More context often makes things worse — accuracy drops as input length grows, important details get lost in the middle. The real discipline was deciding what to show, what to hide, what to summarize, and what to fetch on demand. Information architecture, not "give it more background."

Context engineering was a meaningful upgrade. But it was still fundamentally about optimizing a single inference — making one model call as good as possible. And once people started running agents at scale, a different wall appeared. You could feed a model perfect context and get a great result ninety percent of the time. But ninety percent reliability across a thousand actions means a hundred failures. Perfect context for one inference does not guarantee reliability across thousands.

The bottleneck was not the information. It was the system.

That is the transition we are living through now. Each stage's practitioners believed they were solving the whole problem. Each transition revealed it was a subproblem.

If the pattern feels familiar, it should. Software engineering has been climbing the same ladder for decades — machine code to assembly to high-level languages to frameworks to declarative systems. Each step moved the developer further from the machine and closer to intent. The AI evolution is replaying that arc, compressed into months instead of decades.

Harness engineering is a fundamentally different kind of problem

The reason harness engineering feels different from what came before is not just that it operates at a higher level. It is that the nature of the work changes.

Prompt engineering and context engineering are both about optimizing a single interaction. You craft the input, you evaluate the output, you adjust. The feedback loop is immediate — seconds, maybe minutes. The unit of work is one inference. This is craft. You are making individual artifacts well.

Harness engineering operates on a different timescale and a different unit. You design a system, run it across hundreds or thousands of agent actions, observe aggregate behavior, adjust constraints, and repeat. The question stops being "was this output good?" and starts becoming "does this system produce acceptable outputs reliably over time?"

That is the difference between a potter making one excellent bowl and an engineer designing a factory that produces ten thousand acceptable bowls. The skills are related, but the disciplines are not the same.

And most of harness engineering is about what happens after the model produces output — not before. That distinction matters because it is easy to confuse harness engineering with context engineering. Context engineering is about making things clearer for the AI. Harness engineering is mostly about what you do with the AI's output: verification steps, guardrails that prevent dangerous actions, automated validation, feedback loops that catch a category of failure and prevent it from recurring. You stop fixing individual outputs and start fixing the system.

This is what made OpenAI's Codex story interesting when it came out earlier this year. Their team built roughly one million lines of production code over five months with zero manually written code — about 1,500 merged pull requests from a team of seven engineers. The breakthrough was not the model. It was the harness. Custom linters caught structural drift. An AGENTS.md file gave agents a table of contents for the codebase. Structural tests enforced architectural constraints without manual review. Verification loops validated changes before they merged.

No single output had to be perfect. The harness turned unreliable parts into a reliable whole.

Mitchell Hashimoto, who co-founded HashiCorp, named this stage in a way that stuck: "Anytime you find an agent makes a mistake, you take the time to engineer a solution such that the agent never makes that mistake again." That is not prompt optimization. That is systems engineering. That is building for expected failure.

The roles literally inverted

Once you internalize that harness engineering is about building for expected failure, something uncomfortable comes into focus about what the developer's job has actually become.

In traditional software engineering, the relationship between human and machine is clear. The human authors the code. The machine verifies correctness — through tests, type checkers, linters. You write a specification, implement it, and run automated checks to confirm the implementation meets the spec. If the tests pass, the code is correct. Authorship is human. Verification is mechanical.

Harness engineering inverts that.

The machine does the authoring. It writes the code, generates the output, produces the pull request. The human designs the quality constraints — the guardrails, the validation boundaries, the acceptability criteria. You cannot test for correctness in the traditional sense because the output is stochastic. There is no deterministic specification for what the model will produce. Instead, you test for acceptability boundaries. Outputs must stay within constraints, must pass validations, must not violate safety conditions.

The developer used to be the author of the code. Now the developer is the editor of the system's output.

That is more specific and more disorienting than the generic observation that "the developer's role is changing." The thing most developers were best at — writing code — got automated. The thing they now do — quality engineering of stochastic systems — did not exist as a discipline eighteen months ago.

I notice this in my own work. The productive sessions are not the ones where the model gets everything right on the first try. They are the ones where the system around the model makes it easy to recover when it does not. That shift — from authoring to editing, from writing to verifying — is already measurable. Claude Code now authors roughly four percent of all public GitHub commits, about 135,000 per day. The inversion is already the default at scale.

What comes after harnesses

If the pattern holds, something will eventually do the same to harnesses.

I think there are two candidates for the next bottleneck.

The first is emergent behavior in multi-agent systems. A single agent with a well-designed harness is predictable. Multiple agents with well-designed harnesses interacting with each other can produce emergent behavior that none of the individual harnesses anticipated. This is the distributed systems problem — individual components can each be correct while the system as a whole fails in unexpected ways. We have faced that problem before in software engineering and partially solved it with consensus protocols, event sourcing, and saga patterns. We will need analogous patterns for multi-agent AI systems.

Anthropic is already building in this direction — Claude Opus 4.6 shipped with "agent teams" that split tasks and coordinate in parallel, and the Model Context Protocol has become the standard interface for connecting agents to external tools. The infrastructure for multi-agent coordination is arriving before the discipline for managing it.

The second candidate is governance. At some point, the harness gets reliable enough that the question stops being technical — "can we make this work?" — and becomes institutional — "how much should we let this run without human oversight?" That is not an engineering question. It is a design question about the boundary between human and machine authority. And it follows a historical pattern: every time we automated a meaningful category of human work — manufacturing, transportation, finance — we eventually had to build regulatory and governance frameworks around the automation. AI agent systems are on the same trajectory. The harness is the technical layer. Governance will be the institutional layer above it.

Both candidates point to the same thing. The bottleneck is moving from technical systems to sociotechnical systems. The next era probably will not be solved by engineers alone.

I do not know which one arrives first, or whether it will be something else entirely that nobody is naming yet. But I have noticed that each stage only becomes visible in retrospect. Whatever comes after harnesses will probably make harness engineering look the way prompt engineering looks now — necessary, but insufficient. A subproblem we mistook for the whole thing.

That seems to be how this works.

The Work After the Work

Sat, 21 Mar 2026 19:12:39 GMT

A few weeks ago I wrote about shipping MapleDays in seven days. That post ended with a line I believed at the time: "my part is done."

That turned out to be only partially true. My part was done the way a rough draft is done. The shape existed. The core worked. But the product had not met a single real user yet, and real users have a way of rearranging what you thought you understood.

Building MapleDays was a focused sprint with a clear finish line. Everything after it — user feedback, monetization, marketing, App Store review cycles — runs on a completely different clock and demands a different skill set. The first week was about control. Everything since has been about listening.

Building mode has a finish line. Operating mode does not.

When I was building v1, the work had a natural structure. Scope the product, cut aggressively, ship. Every decision moved toward one goal: get this thing submitted. The constraints were tight, but that made them useful. I always knew what to do next.

After the app went live, that structure disappeared. There was no single goal anymore. There were users writing in with feedback. There were feature ideas that sounded reasonable. There were marketing questions I had no framework for. There were review cycles I could not speed up.

Building felt like steering. Operating feels more like a conversation that does not end. You are still making decisions, but they arrive on someone else's schedule, and the feedback loop is slower and less predictable.

I think that shift caught me off guard because I had been optimizing for velocity. Ship fast, ship clean, move on. But "move on" is not really an option once the product is live and people are relying on it.

Users find the problems you did not build for

The most useful thing that happened after launch was that users started telling me things I could not have figured out on my own.

One early example: MapleDays originally required a "first entered Canada" date for permanent residents, and I made the PR date depend on it. You could not set a PR date earlier than your first entry date. That seemed logical to me, but it confused users who wanted to set their PR date first and fill in the entry date later. The constraint blocked them with no clear explanation.

So I removed the field entirely for PRs and citizens. Problem solved, I thought.

Then a user told me about Old Age Security. OAS applications require travel history even for citizens and permanent residents, and the form asks when you first entered Canada — if you were born outside the country. I looked up the actual OAS form and confirmed it. The field I had just removed was still needed, just not in the way I had originally built it.

So I added it back, but differently. It became an optional field, shared across all statuses, decoupled from the PR date constraint. That version is better than either of the previous two, and it only exists because of three rounds of feedback from people who actually use the app for things I had not anticipated.

That OAS discovery also changed how I think about the product's audience. The core of MapleDays was always travel history — a clean record of trips in and out of Canada, managed in one place instead of scattered across spreadsheets, note apps, email confirmations, and half-remembered boarding passes. PR renewal and citizenship eligibility are results derived from that history, not the other way around. But I had been thinking about the audience narrowly: people actively navigating immigration. OAS showed me that older Canadians — people well past their immigration journey — still need the same ledger for entirely different paperwork. The travel history is the product. The use cases are broader than I expected.

The biggest surprise was about a design assumption I did not even realize I had made. Multiple users asked to manage travel records for their whole family in one place. I had designed the app around one person, one device, one ledger. Privacy felt like the obvious default — why would you put someone else's immigration data on your phone?

But that is not how families work. A lot of households have one person who handles the paperwork for everyone. They are already tracking trips for their spouse, their parents, their kids. They do not want to juggle four phones. They want one place to manage it all.

That feedback did not just suggest a new feature. It revealed a whole usage pattern I had not considered, and it ended up reshaping how I think about the product's premium tier — which I had not planned that way at all.

The premium split came from users, not a spreadsheet

Before launch, I had a vague idea that iCloud sync would be the premium upgrade. That was it. I had not thought much harder about monetization because it felt like a problem for later.

Then users handed me a much clearer answer. The family management pattern — one person tracking trips for a household — was not just a feature request. It was a natural dividing line between two kinds of users. Most people need MapleDays for themselves. Some need it for everyone they are responsible for. That second group is doing meaningfully more with the product, and paying for that feels fair.

What surprised me about that process is that I did not sit down and design a monetization strategy. It emerged. Users showed me how they use the app differently from each other, and the premium boundary followed from that difference. I want MapleDays to stay free for the majority of its users — the core product should never cost anything. Premium is for the heavier use case, and the revenue is mostly about keeping the app maintained and developed over time.

I think that is probably how monetization should work for a small app. Not as a pricing exercise you do before launch, but as a decision that gets clearer once you see how real people actually use the product.

Marketing with no audience

This is the part I am least qualified to write about, which is probably why it matters.

MapleDays solves a real problem for a specific group of people. But those people do not know the app exists, and I do not have an audience to announce it to. There is no mailing list, no social following large enough to matter, no existing distribution channel.

I have tried a few things: posting in relevant communities, reaching out to people who write about Canadian immigration topics, making the marketing site clearer, improving App Store keywords. Some of it has generated a trickle of installs. None of it has felt like a repeatable engine.

I do not think I have cracked this yet. What I have learned is that marketing a solo app is a fundamentally different skill from building one, and it rewards consistency more than cleverness. The temptation is to treat it like a launch — one big push, then back to building. But it is really more like maintenance. You have to keep showing up, keep finding the right people, keep explaining why the product matters.

That is still a work in progress for me.

Patience is a product skill

One thing I underestimated is how much of post-launch work is just waiting.

Waiting for App Store review. Waiting for the IAP review on top of that. Waiting for user feedback to accumulate enough to see patterns. Waiting for installs to grow. There is always a gap between submitting something and seeing the result, and during that gap the temptation is strong to start something new instead of finishing what is in front of you.

I think patience is a product skill in the same way that cutting scope is a product skill. It does not feel productive, but it is often the right move. The product needs time to meet its users, and you need time to learn from what they tell you.

What I know now that I did not know at the end of that first week

The shipping post was about learning to cut a product down to its center. That lesson held up. But the post-launch phase taught me something the build sprint could not: what happens when the product meets people who did not help design it.

Building MapleDays was a conversation with myself. I decided what mattered, I cut what did not, and I shipped something I believed in. Operating MapleDays is a conversation with users. That is a harder mode to be good at, because you cannot plan your way through it. You have to stay open to being wrong about things you were confident about, and willing to change the product in directions you did not anticipate.

I think the hardest part of this phase is accepting that the product is no longer entirely yours. You built the center, but users are shaping the edges. The version of MapleDays that exists today is better than the one I submitted at the end of that first week, and most of that improvement came from people who use it telling me things I could not have figured out alone.

That is probably the real work after the work. Not just fixing bugs and adding features, but learning to hold the product loosely enough that it can become what it needs to be.

How I Decide What to Build

Wed, 18 Mar 2026 02:56:44 GMT

Coming up with ideas is usually not the hard part for me. There are almost always a few things I could build, and most of them seem interesting at first. The harder part is deciding which ones are actually ready.

I used to think that decision was mostly about taste. Pick the best idea. Pick the most interesting problem. Pick the thing that feels most useful. But I do not think that is really the decision anymore.

What I am usually deciding now is whether an idea has become clear enough to survive tradeoffs.

That sounds narrower, but it explains a lot more. Some ideas sound good because the problem is real. Some sound good because the frustration is obvious. Some sound good because they are still surrounded by possibility. But a product only really starts once the idea becomes specific enough that you can make hard decisions without the whole thing falling apart.

I think that is the part I understand better now.

A real problem can still resist productization

One idea I keep coming back to is a kind of personal CRM. The basic idea is simple: help me keep track of how often I interact with someone, and remind me when I have not reached out in a while.

The problem feels real. Relationships usually do not drift because of one big event. Most of the time, people just get busy. Time passes quietly, and then one day you realize it has been months since you last talked to someone you actually care about.

That seems worth solving.

But every time I think about building it, the idea starts expanding. You begin with reminders. Then you want notes. Then some way to group people. Then maybe context, follow-ups, priorities, maybe even messaging or calendar integrations. Pretty quickly it stops feeling like one product and starts feeling like several possible products layered on top of each other.

At first I treated that as a scoping problem. Now I think it is a product problem.

The issue is not just that the idea gets bigger. The issue is that I still do not know what remains after I cut it down. I can see the problem clearly, but I cannot yet see the boundary that turns the problem into a product. I do not know what the smallest version is that still feels whole.

That matters more than I used to think. A product boundary is not just about keeping scope manageable. It is what makes the product legible. It tells you what belongs, what is extra, and what should be left out even if it sounds useful. Without that boundary, every feature discussion becomes vague because there is no center strong enough to decide against.

That is usually when an idea is still too early.

Seeing friction clearly is still not enough

The other idea gets stuck in a different way.

It is a fitness app for the Apple Watch. The goal is simple: I do not want to pull out my phone during a workout, and I do not want the workout to turn into a logging session after every set.

That is the part I dislike about a lot of existing flows. You decide the workout ahead of time, start it on your phone or watch, and then keep logging or confirming things as you go. Even if that flow works, the interaction still competes with the workout. The workout becomes something you keep interrupting so you can record it.

What I want instead is something more invisible. Ideally, the watch would identify the motion I am doing, predict the exercise, and log it in the background. Then at the end of the session, I could correct anything it got wrong and add details like weight.

This idea taught me a different lesson.

Here, the problem is not that the product wants to become too many things. The problem is that I still do not know what the right replacement experience actually is. I can describe what feels wrong about the current flow, but that is not the same as knowing what to make instead.

I think that gap is easy to underestimate. It feels like progress to diagnose the friction clearly. It feels like insight to say exactly what is annoying. But critique is still much cheaper than product shape. Knowing what is wrong is not the same as knowing what should replace it.

That is why this idea still feels unresolved. The direction feels right, but the product is still too hand-wavy. I can point at the problem. I still cannot make enough decisions about the solution.

MapleDays changed what I pay attention to

I think MapleDays made this easier for me to notice.

What stayed with me from that project was not really the speed. It was how much easier the project became once it had one clear job. Before that, a lot of things probably could have belonged. After that, the tradeoffs became much more obvious. It became easier to tell what was core, what was adjacent, and what needed to wait.

That changed how I think about stalled ideas.

Before that, it was easy to treat momentum like a motivation problem. If something was not moving, maybe I was distracted, inconsistent, or not pushing hard enough. After MapleDays, I think momentum often comes from clarity. Once the product has a center, decisions start getting easier instead of harder.

That is a much more useful signal.

When an idea is not ready, every option feels half-right. Every feature sounds reasonable. Every path seems plausible for a while. Nothing settles anything because there is no product center strong enough to absorb the tradeoff.

When an idea is ready, the opposite starts happening. Constraints become helpful. Cutting things makes the product sharper instead of weaker. The next decision still might be hard, but it does not feel arbitrary anymore.

That is probably the threshold I care about now.

What I think I am really deciding

I do not think deciding what to build is mostly about picking the most exciting idea anymore. I think it is about deciding whether an idea has become clear enough that the tradeoffs start telling the truth.

With the personal CRM idea, the problem is real, but the boundary is still unstable. I still do not know what the product becomes once I strip it down.

With the watch idea, the frustration is obvious, but the replacement is still unresolved. I still do not know what the right interaction model is.

Those are different failures, but they point to the same thing. The idea has not become concrete enough to hold decisions.

That is probably why this matters to me now. A lot of ideas sound promising while they are still protected by vagueness. They only really reveal themselves once you start forcing specificity on them. Some get stronger. Some collapse. Some stay interesting, but only as problems, not as products.

I think that is the real filter.

Not just whether the problem is real. Not just whether I care about it. Not just whether the idea sounds smart when I explain it.

The better question is whether the product gets clearer as I narrow it down.

Maybe I will revisit both of them

After building MapleDays, I think I see both of these ideas a little differently.

I have not let go of them, but I am also less tempted to romanticize them. The personal CRM idea still needs a real boundary. The watch idea still needs a real product shape. Those are not small details. They are probably the whole thing.

I may revisit both of them.

If I do, I think the test will be simple. Not whether the ideas are still interesting, but whether they hold up once I force them to become specific. Whether cutting them down makes them clearer. Whether the next decisions start getting easier instead of harder.

That is usually when an idea stops being just an idea.

That is when it starts becoming a product.

Shipping MapleDays in 7 Days

Thu, 12 Mar 2026 23:38:30 GMT

On March 5, 2026, I started MapleDays. Six days later, I submitted version 1.0 for App Review.

MapleDays is a local-first iPhone app for tracking trips outside Canada and turning them into a usable physical-presence ledger. For permanent residents, that means PR renewal planning and citizenship timing. For temporary residents, it means a cleaner renewal-oriented travel record. For citizens, it means a simple travel archive for personal history, future paperwork, and official-history cross-checks, without pretending to be a full immigration platform.

The interesting part is not that it happened fast. It is that it only happened once I cut the product down to one clear job.

The week itself was pretty compressed: first a prototype and rules engine, then a product reset that cut the app back to three screens, then the less glamorous back half of the week spent on tests, screenshots, support pages, App Store copy, and submission. That arc matters more than the calendar.

The problem was real enough to matter

Physical-presence tracking in Canada is one of those problems that is both important and annoying.

If you are trying to stay on top of PR residency, think ahead about citizenship timing, manage temporary-resident renewal planning, or just keep a reliable travel record, the raw information you need is pretty simple: when you left Canada, when you came back, and how that affects your days.

But the actual experience of tracking it is messy. People end up with notes app entries, spreadsheets, calendar hacks, screenshots, or half-remembered trip histories spread across email and airline apps. The stakes are high enough that you do not want to be sloppy, but the tooling is often either too manual, too broad, or not very trustworthy.

I wanted something I would actually trust myself: not another spreadsheet, not a bloated case tracker, just a clean trip ledger that stays on-device, recalculates automatically, and helps answer the question people actually care about. Am I on track?

That framing mattered. MapleDays was never supposed to be an official Government of Canada tool or a replacement for IRCC guidance. It was supposed to be a clean personal system.

My first instinct was to build too much

The funny part is that I knew this, and still started by drifting broader.

Very early on, MapleDays had more moving parts than the final product needed. There was onboarding. There were reminder flows. There was a forecast simulator. There was a timeline tracker. There were extra product branches that made sense individually, but together started pushing the app toward a general immigration manager.

That is a pattern I fall into pretty easily. When a problem has a lot of adjacent pain points, it is tempting to treat every adjacent pain point as part of v1. It feels ambitious. It also feels productive because there is always one more useful thing to add.

But that kind of ambition usually makes the product worse before it makes it better.

The more features I added, the harder the app became to explain. The data model got fuzzier. The UI had more competing priorities. The question stopped being "what is MapleDays?" and started becoming "what else should MapleDays do?"

That is usually a sign that the center of the product is weak.

The cut that made the app work

The turning point was a product reset:

MapleDays is a Canada physical-presence ledger.

That one sentence cleared up almost everything.

Trips outside Canada became the core user input. Days in Canada became derived output. PR readiness became the primary value, but the same ledger still had to stay useful for temporary residents and citizens too. Citizenship timing stayed in the app as a secondary layer built on top of that ledger. Everything else had to justify itself against that loop.

That also meant being explicit about what did not make v1. No document vault. No case tracker. No milestone timeline. No broad reminder system. Not because those are bad ideas, but because they weaken the center of the product.

Once I accepted that, a lot of decisions got easier.

The app shrank to three screens: Dashboard, Trips, and Settings.

The dashboard became PR-first for permanent residents instead of trying to be a generic home for every possible workflow. Temporary residents got a narrower renewal-oriented view. Citizens got a ledger-only travel archive instead of a dashboard that implied more certainty than the product could safely support.

The settings flow got smaller too. It stopped trying to feel like a whole setup wizard and became a compact place for the few inputs that actually matter. The trips screen became the center of gravity, which is where it should have been from the start.

This is the part I want to carry into every future app I build. Most products do not need more features. They need a sharper center. If you can define one core loop clearly enough, the right features become obvious and the wrong ones start looking expensive.

The engineering got easier once the product got smaller

Once the product stopped trying to be everything, the technical choices got cleaner too.

I kept the rules logic in a separate domain package instead of burying it in SwiftUI views. That made it easier to test the parts that actually carry trust: rolling windows, trip boundaries, same-day trips, future trips affecting projections but not today's totals, and pre-PR credit edge cases.

I stored user-entered dates as normalized day values instead of treating them like arbitrary timestamps. That sounds minor until you remember how many date bugs come from timezone assumptions. For a product built around calendar days, "close enough" is not good enough.

I treated the trip records and profile data as the source of truth, then cached dashboard snapshots as derived data. That kept the app fast without making the cache itself authoritative. It also made migration and cleanup logic much easier to reason about.

I ended up spending a meaningful amount of time on things that are not flashy at all: integrity cleanup, migration from older prototype data, validation around trip overlaps, keeping projections current without blocking the UI, and writing tests for policy-shaped edge cases.

That is probably the bigger lesson. A lot of the hard engineering in small apps is not about cleverness. It is about removing ambiguity. Once the product had a sharper center, the code had fewer opportunities to lie.

Shipping was not the coding part

By the end of the week, I had a working app: trip CRUD, status-aware dashboards for permanent residents, temporary residents, and citizens, export, PR card expiry planning, a calmer visual system, tests, and the rest of the product core.

But that still was not "shipped."

Shipping also meant doing all the work around the app:

App Store copy
privacy policy
support page
screenshot sets for iPhone and iPad
app icon iterations
review notes
metadata and release checklist
the marketing site and legal pages for mapledays.app

That work is easy to postpone because it does not feel like building. It feels administrative. But it is actually where a project crosses the line from "code that works on my machine" to "something another person can install and trust."

I am trying to get better at respecting that part of the process. A product is not half-built just because the code is good. If the store page, support surface, privacy posture, and release details are missing, it is not done.

What building MapleDays in a week taught me

The big lesson is that speed came from constraint. The app fit into a week because I stopped asking it to be many things at once. It also reinforced something I want to keep applying to future projects: "source of truth" is a product decision before it is an engineering decision. For MapleDays, the trip ledger shaped everything else, from the interface to the calculations to the export story. Local-first helped too. Without accounts, analytics, or a backend in the way, the app was easier to explain, easier to trust, and easier to ship. And because the core stayed small, version 1.0 could feel complete without pretending to be final.

How I want to build apps now

I do not think the main takeaway from MapleDays is "try to build everything in seven days." It is this playbook:

start with one core job
define a clear source of truth
separate trust-heavy logic from UI
cut adjacent features aggressively
finish the unglamorous shipping work too

MapleDays is not live yet. Apple still has to approve it. But my part is done, and I am happy with what "done" included: not just code, but a product shape, a technical foundation, and a real submission.

There is still a lot I want to build on top of this v1. But I wanted the first version to be a complete core product before I earned the right to expand it.

MapleDays taught me that the goal is not to build apps faster. It is to build them small enough, clear enough, and complete enough that they can actually ship.

What I Run on My Homelab

Sun, 01 Mar 2026 23:22:01 GMT

I run most of my digital life from a single server at home. Photos, passwords, media, personal finance — it all lives at my place instead of in someone else's data center. The reasons are control and cost. I'd rather own my data than rent it from Google, Apple, or whoever else. And I'm tired of subscriptions — iCloud, Netflix, password managers, cloud storage. Homelabbing lets me replace most of that with a one-time setup plus a bit of maintenance.

Here's how it's set up.

Hardware

The server is a PC rebuilt from old parts. I bought a NAS-style case so it has multiple drive bays in a compact form factor. Total cost was low, mostly parts I already had. It sits in a corner and runs 24/7. I forget it's there most of the time.

Infrastructure

Everything runs on Proxmox. One host, LXC containers for each service, a couple VMs. WireGuard runs on the host — when I'm away I VPN in and hit the same domains. Nothing's exposed to the internet; local network or VPN only. Nginx Proxy Manager handles the reverse proxy so I type immich.local or jellyfin.local instead of remembering ports. A UPS keeps it running through power blips — it's saved me a couple times.

For backups, Proxmox Backup Server handles the stack. For the stuff I care about most — photos, finance, configs — I follow the 3-2-1 rule: three copies, two storage types, one off-site. Primary on the server, second on another drive, third somewhere else. A friend's place, safe deposit box, or encrypted cloud.

Prometheus and Grafana for metrics. When something breaks, I can usually see why before I start guessing.

Services

Photos go to Immich — my phone backs up automatically, no iCloud. Passwords in Vaultwarden — it syncs over VPN when I'm away.

Firefly III for personal finance. Homebox for inventory, which is useful when I need to find that cable I swore I had.

Media: Jellyfin plus the *arr stack. I add something I want to watch, a few hours later it's there. qBittorrent, FlareSolverr, MeTube, and Bazarr handle the rest. It was a rabbit hole to set up, but now I rarely touch it.

Privacy layer: AdGuard for network-wide ad blocking. SearXNG when I don't want to hit Google directly. RedLib for Reddit without the tracking.

Home Assistant for smart home stuff — lights, sensors, automations. Stirling-PDF when I need to merge or split a PDF without uploading to some random site.

What I've learned

Running this has made me think about what actually matters with AI. Everyone can chat with a model now. But the skill I'm trying to build is managing it — running models, allocating compute, orchestrating systems that work for you. The homelab is where I practice that. Containers, VMs, networking, backups — it's all resource management.

I've been thinking about the one-person company: someone who can run everything because they know how to manage AI, infrastructure, and their own stack.

What's next

I'm looking at running an AI agent locally — a model I can experiment with without sending everything to the cloud. Proxmox is perfect for this. Spin up a dedicated VM or container, give it its own resources — if something goes wrong, it's isolated. Won't mess with the rest.

Sometimes something breaks and I spend an evening fixing it. Sometimes I add a service and realize I didn't need it. I'm okay with that. A few hours a month is cheaper than the subscriptions. I enjoy the tinkering, I learn from it, and it's a practical skill that pays off. Plus I get to keep my data.

If you're curious about homelabbing, start with one thing — a password manager or photo backup. You don't need new hardware — an old PC or laptop works fine.

Fixing My Logitech Scroll Wheel in 30 Seconds

Sat, 28 Feb 2026 02:28:15 GMT

I have two mice on my desk — a Logitech MX Master 3 for work and a G703 for gaming. The G703's scroll wheel has been acting possessed for months. Scrolling down randomly jumps up, scrolling up stutters and reverses. I gave up on the wheel entirely and started left-click dragging the scrollbar like it was 2005.

Since it's my gaming mouse and not my work mouse, I just kept living with it. I'd get annoyed, reach for the Master 3, and move on. I even started browsing for a replacement before ever trying to fix it. For months.

Today I finally snapped and looked it up. Opened YouTube expecting a full teardown — pop the mouse open, pull out the encoder, clean the contacts, maybe replace the part. I was mentally prepping for a 45-minute project with a spudger and isopropyl alcohol.

Then I found a Reddit post. The fix: flip the mouse upside down and scroll the wheel hard in both directions for 30 seconds. Apparently dust gets trapped in the rotary encoder over time, and flipping it while scrolling aggressively shakes the gunk loose.

I tried it. Flipped it back over. Scroll wheel works perfectly — no stutter, no jumping, no phantom direction changes. Like new. So if your scroll wheel is acting up — especially on a Logitech mouse — try that before reaching for a screwdriver.

But the part that stuck with me wasn't the fix itself. It's that I spent months working around a problem I assumed would be hard to solve. I built a whole habit around dragging scrollbars instead of spending two minutes looking it up. And the longer I waited, the bigger the fix felt in my head, which made me put it off even more.

That happens a lot more than I'd like to admit — not just with hardware, but with all kinds of things. Something is slightly broken, not broken enough to force your hand, so you work around it. You build habits around the dysfunction. Then one day you finally sit down to deal with it, and it takes 30 seconds.

It's probably easier than you think. It usually is.

Why I Finally Started This Blog

Sat, 28 Feb 2026 00:33:09 GMT

I've had a personal website for a while. It was a single-page resume — the kind you build once in college and never touch again. Dark theme, terminal-inspired design, sections for experience, education, skills. It did its job. But it never really felt like me. It felt like a job application someone accidentally published.

So I tore it down and started over.

Who I am

I'm Yihan. I'm an ML engineer at StackAdapt in Toronto, where I just started in January. My background is a bit all over the place, which I think makes it interesting.

I spent over a year at Qualcomm working on graphics drivers for Snapdragon chips — low-level C and C++ work, optimizing how frames get rendered in automotive infotainment systems. It taught me a lot about how hardware and software actually meet.

After that, I worked at Zafin building cloud infrastructure and data pipelines for core banking systems — Kubernetes, Terraform, Go. Learned what it means to build things that can't go down.

On the side, I co-founded SnowOverflow — an app for finding people to ride with, planning trips, and carpooling across ski resorts. It grew into a foundation focused on making winter sports more accessible. That was never the plan, but kind of how the best things happen.

I studied electrical and computer engineering at the University of Toronto — both my undergrad and my master's, where I focused on data analytics and machine learning.

Outside of work, I'm usually on the mountain, at the gym, or on a basketball court. I like staying moving — and when I'm not, I'm probably gaming, tinkering with my homelab, or 3D printing something I'll convince myself I needed. Lately I've been trying to read and write more too, which is partly why this blog exists.

Why I rebuilt my site

The old site said what I did but nothing about how I think.

I wanted something simpler and more honest. A short intro, a place to write, and nothing else. I rebuilt the whole thing in a day, went from 500+ lines of resume content to about 50 lines of actual writing, and I'm happier with this version.

Why I'm writing now

I've been meaning to start writing for years. The usual excuses — too busy, nothing interesting to say, who's going to read it anyway. But I've come to realize that writing isn't really about having an audience. It's about organizing your own thoughts. Every time I try to explain something in writing, I understand it a little better than I did before.

So this is me starting. I'll write about what I'm learning — at work, in side projects, in life. Some posts will be technical, some won't. No schedule, no pressure. Just slowing down and putting thoughts into words.

If you've made it this far, thanks for reading — stay tuned for more.