Building VoiceAnki: A Voice-First Study App That Kept Growing

What This Project Is

VoiceAnki started as a pretty simple idea: what if flashcard review felt more like a conversation and less like tapping through tiny buttons?

The core goal was to make studying possible in a more hands-free, audio-first way. Instead of treating voice as a gimmick layered on top of a normal flashcard app, the project pushed toward something more opinionated:

speak the prompt
listen for the answer
evaluate the response
keep the review loop moving without constant screen interaction

Over time, that turned into a much larger app than the original idea suggested. What exists now is not just a voice button on a flashcard screen. It is a full Android app with a session runtime, deck import pipeline, history, settings, AnkiWeb integration, and an increasingly serious answer-evaluation system.

This post is a look back at the work that went into it, what changed along the way, and what turned out to be harder than expected.

The Starting Point

At the beginning, the product shape was intentionally narrow:

Android only
local deck storage
spoken prompts
spoken answers
deterministic grading
lightweight study history

That focus mattered. It kept the project from immediately collapsing into a vague “AI tutor” idea. The first real work was not around machine learning at all. It was around building a dependable study loop:

a card queue
review scheduling
a reducer-driven session state machine
text-to-speech
Android speech recognition
foreground session behavior so the app could survive longer interactions

That part of the app is still the backbone of everything else. Even the newer AI and semantic work only makes sense because there is already a deterministic study engine underneath it.

Turning It Into a Real App

Once the core loop existed, the app started growing in the more familiar directions any real product eventually has to grow.

The project gained:

a home screen that lists decks
deck detail views
a settings screen for answer mode, speech rate, listening window, and grading behavior
session history
a persistent Room-backed database
DataStore-backed settings

That was the moment it stopped feeling like a prototype and started feeling like an app with real internal structure.

One theme that kept coming up was that nearly every “simple” feature touched more systems than expected. A new setting was never just a toggle. It usually had to travel through:

settings storage
view models
UI state
runtime configuration
sometimes the session reducer itself

That kind of wiring is not glamorous, but it is what makes later experimentation possible without the whole app turning into spaghetti.

Importing Decks Instead of Pretending

One of the biggest shifts in the project was deciding that the app should not live forever on a demo deck.

That meant building a real import path.

There are two different import stories in the app now:

importing from files
importing from AnkiWeb

The file import work led to a full import pipeline:

parse a deck file
turn it into an internal draft
preview the import
commit it into the local database

That draft step turned out to be especially useful. It created a clean boundary between “we successfully fetched or parsed something” and “we are ready to persist it as a real deck.” That became important later when the app started pulling content from the web rather than only from local files.

The .apkg path was also a turning point. Anki package import sounds straightforward until you actually have to do it on-device:

unzip the package
extract and read the SQLite content
resolve media references
map notes, cards, models, and templates into something your own app understands

That is the kind of work that is easy to underestimate from a distance. It is not especially flashy, but it is exactly the sort of feature that makes an app useful in the real world.

AnkiWeb: From Scraping to a Better Product Decision

AnkiWeb support was one of the most iterative parts of the project.

The first instinct was what many apps would try first: scrape the shared-deck pages and build a native search/detail flow on top of that. That approach looked promising at first, but it ran straight into the reality of the modern web:

JavaScript-heavy pages
Cloudflare-style challenge behavior
markup that is not stable enough to treat as a public API

The project went through several rounds of trying to make that scraper path more resilient, including:

improving network setup and headers
hardening HTML parsing
using a WebView to render pages instead of assuming static HTML

That work was valuable, but it also taught an important product lesson: sometimes the best engineering move is to change the shape of the feature.

The eventual direction became much better:

use a visible in-app browser activity for AnkiWeb
let the user browse the real site
intercept .apkg downloads in-app
store the download privately
create an import draft
jump straight into the existing preview/import flow

That was a much more honest solution. It stopped fighting the site and started using the app’s own strengths: import, preview, and local persistence.

Making Voice Feel Like the Main Interface

The heart of the app is still the study session runtime.

A lot of the work here was not about adding more UI, but about making the voice loop feel coherent:

when prompts are spoken
when the app starts listening
how long the listening window should last
when partial recognition should be trusted
when to stop early on a strong answer
when to reveal the answer
how self-grading and automatic grading fit together

On Android, speech is never just “call the speech API and you’re done.” There are always edge cases:

permissions
recognizer flavor differences
partial results versus final results
cancellation timing
audio focus
device quirks

A lot of this project became an exercise in being honest about those constraints and designing around them instead of pretending they do not exist.

That honesty also showed up in the app’s session state model. The runtime is not a pile of callbacks. It is built around explicit states and events, which makes it much easier to reason about what the app thinks is happening at any given moment.

That structure paid off again and again as more features got layered in.

Answer Evaluation: From Exact Matching to Something Smarter

The earliest evaluator was mostly deterministic:

normalize text
compare against accepted answers
allow fuzzy matching where appropriate

That still works well for many cards. In fact, it is still the right answer for:

arithmetic
spelling
short identifiers
cases where a near miss should absolutely not pass

But as soon as the app started touching longer answers and more natural language, the limits became obvious. A strict string-oriented evaluator can be technically consistent while still feeling wrong to a human being.

That led to the semantic grading work.

The first step was not “let AI handle grading.” It was a more conservative plan:

keep deterministic matching first
add a semantic fallback only when lexical matching is not enough
use on-device embeddings rather than a cloud-first model

That design choice mattered. It kept the project grounded. Semantic grading was not supposed to replace the rest of the evaluator. It was supposed to rescue reasonable answers that were being unfairly rejected.

Semantic Grading Turned Out to Be Harder Than the Idea

The semantic work brought some of the most interesting engineering problems in the whole project.

The app now includes:

a semantic evaluator
an embedding cache
a decision policy with accept / unsure / reject bands
a bundled sentence-embedding model

But the path there was not smooth.

One of the first real blockers was that the original MediaPipe dependency being used for text embeddings was simply too old. On-device initialization was crashing natively on the target phone. The fix was not a clever code workaround. The real fix was dependency modernization. Once the library was upgraded to a current version, the embedder could initialize successfully.

That was a good reminder that “AI bugs” are often just normal software engineering bugs wearing a more dramatic outfit.

The second challenge was more subtle: just because semantic scoring works does not mean it should be trusted blindly.

This showed up especially clearly on a command-heavy CS50-style deck. Some answers that felt obviously related were accepted. Some answers that felt obviously wrong were also accepted. Other short command answers that a human would probably allow were rejected.

That forced a more nuanced policy:

semantic scoring is useful
but command-like and syntax-heavy answers need lexical anchors
shorthand answers like tail for tail <file> should still be allowed
vague phrases like not sure should never pass just because an embedding score looks high

That is exactly the kind of product problem that makes this sort of project interesting. The challenge is not just “can the model produce a number?” The challenge is whether the resulting behavior matches what a real learner would expect.

AI Mode and the Difference Between “Plumbing” and “Experience”

Another large branch of work explored a fuller AI mode using Gemini live audio and tool-calling ideas.

This part of the project went through multiple milestones:

plumbing mode flags through settings, navigation, and runtime state
adding a live client shell
integrating bidirectional audio
wiring tool calls into the existing reducer-driven session logic
adding fallback behavior when live transport fails

This was useful work, but it also created a good internal standard for honesty. It became important to distinguish between:

a feature being “wired through the app”
a feature being “technically alive”
a feature being “good enough to present honestly as a user-facing experience”

A lot of AI product work gets fuzzy on that distinction. This project benefited from repeatedly pulling those apart.

The result is a codebase that now has real AI-related infrastructure and experiments, but still treats deterministic study behavior as the stable center of the app.

That turned out to be the right posture.

A Better Product Through Better Constraints

One of the more surprising themes in the project was that constraints improved the product.

Examples:

trying to scrape AnkiWeb forced a rethink that led to a better in-app browser + import handoff
a crashing on-device semantic path forced a proper dependency upgrade instead of magical thinking
overly broad semantic grading on command decks forced a more human grading policy
navigation crashes around import preview forced a more correct SavedStateHandle setup

None of those were “fun” problems in the moment, but they each moved the project toward something sturdier and more coherent.

The app is better because it had to survive those collisions with reality.

What Exists Now

At this point, the project includes a meaningful amount of real functionality:

voice-first study sessions
spoken prompts and spoken answers
persistent review scheduling
settings and history
deck import from local files
.apkg import support
AnkiWeb browsing and direct import handoff
bundled starter decks
semantic grading infrastructure
on-device text embeddings for semantic evaluation
experimental AI/live-session infrastructure

There is also a growing body of product and platform planning around where the app could go next:

Gemini-assisted study features
stronger semantic grading policies
Wear OS companion support
car-aware or Android Auto-adjacent ideas

Not all of those are finished products, but they represent something important: the project is no longer just a pile of features. It has a direction.

What I Learned From Building It

The biggest lesson is that “voice-first study app” sounds smaller than it really is.

You are not just building:

a UI
a speech recognizer
a deck importer

You are building the glue between all of them, and the glue is where most of the actual engineering lives.

Another lesson is that good product behavior often comes from restraint, not ambition.

The best parts of this project are not the ones where the app tries to be magical. They are the parts where it:

stays deterministic when it should
uses ML as support rather than theater
preserves clear state boundaries
avoids pretending unstable integrations are already polished product experiences

That kind of discipline is not always flashy, but it is what makes a project feel trustworthy.

What Comes Next

The next stage of work is less about piling on new surfaces and more about sharpening the judgment of the app.

The biggest open question is not “can we add more AI?” It is:

how do we make the app accept the right answers, reject the wrong ones, and feel fair to the learner?

That likely means:

better semantic policies
deck-sensitive grading behavior
clearer settings around evaluation style
more real-world testing across different kinds of decks

There is still plenty of room to grow, but the project is now at an interesting point: it already does a lot, and the challenge is no longer proving that the idea can exist. The challenge is making it consistently good.

That is a much better problem to have.

netchosis

where psychosis meets the net

VoiceAnki

Building VoiceAnki: A Voice-First Study App That Kept Growing

What This Project Is

The Starting Point

Turning It Into a Real App

Importing Decks Instead of Pretending

AnkiWeb: From Scraping to a Better Product Decision

Making Voice Feel Like the Main Interface

Answer Evaluation: From Exact Matching to Something Smarter

Semantic Grading Turned Out to Be Harder Than the Idea

AI Mode and the Difference Between “Plumbing” and “Experience”

A Better Product Through Better Constraints

What Exists Now

What I Learned From Building It

What Comes Next

Leave a comment Cancel reply

Building VoiceAnki: A Voice-First Study App That Kept Growing

What This Project Is

The Starting Point

Turning It Into a Real App

Importing Decks Instead of Pretending

AnkiWeb: From Scraping to a Better Product Decision

Making Voice Feel Like the Main Interface

Answer Evaluation: From Exact Matching to Something Smarter

Semantic Grading Turned Out to Be Harder Than the Idea

AI Mode and the Difference Between “Plumbing” and “Experience”

A Better Product Through Better Constraints

What Exists Now

What I Learned From Building It

What Comes Next

Share this:

Related

Leave a comment Cancel reply