From Sketch to UI with Code on the Go: Create instant app layouts from hand drawings

Share This Post

In app design, the UI is the primary bridge between the user’s needs and a digital solution. A great app may begin as a collaborative spark, a sketch on a whiteboard during a brainstorming session. Momentum dies on the path from this creative burst to a working prototype. Creating and revising an XML-based UI, even with an IDE that provides a visual layout editor, is an unpleasant slog.

Code on the Go’s new Sketch to UI feature closes that gap. Draw a rough layout on paper (even a napkin), photograph it with your phone, and the app generates the XML layout for you. This happens entirely on the phone, without needing internet access or external APIs. Here’s how we did it.

Challenges we faced

An obvious approach to recognizing elements in a sketch is to run the image through a general-purpose vision model or send it to a remote API. We ruled both of those out right away.

First, Code on the Go works without an internet connection. The recognition pipeline has to run locally even on older hardware (such as 32-bit ARM devices). That rules out heavy computational models and models that require a server round-trip.

The second constraint is signal quality. Hand-drawn sketches are messy, with irregular strokes and lines that cross where they shouldn’t and don’t where they should. Hand-written letters are inconsistent. They change shape depending on context. They are often similar to other figures and shapes. Standard optical character recognition (OCR) fails on handwriting, especially if the handwriting is positioned near drawings that represent the visual part of the UI.

The third constraint involves missing metadata. A hand-drawn rectangle doesn’t tell you whether it’s a button, a TextView, or an ImageView. It needs a reference ID and probably size, color, and other properties. To generate useful XML, the system must recognize shapes and their properties.

Code on the Go’s “three-zone” approach

The solution starts with structure. Instead of trying to interpret a sketch as a single unstructured image, Sketch to UI asks developers to organize their sketches into three zones:

  • The canvas (center): where you draw the UI widgets themselves
  • The left and right margins: where you write the metadata for each widget (type, properties, and values)
  • Tags: short labels that link each widget drawing on the canvas to its corresponding metadata in the margins
The image below shows an example of a hand-drawn UI sketch containing the canvas area and the margin areas on the left and right.
Figure 1 - Hand-drawn sketch showing Code on the Go’s “three-zone” model for communicating UI sketches and metadata. Note the difference between the center “canvas” and the left/right margins with critical metadata for the 3 UI inputs.

This is similar to how engineers annotate technical drawings. The drawing communicates shape and position. Annotations in the margin communicate specification. By separating these concerns spatially, the system can apply different recognition strategies to each zone rather than trying to do everything at once.

Figure 2 - Here is the phone-based UI created by the sketch in Figure 1.

Finding the boundaries

The first task is to identify where the canvas ends and the margins begin. That boundary will vary from sketch to sketch, depending on how much the creator draws and how much they write.

Sketch to UI relies on vertical projection analysis to find the boundaries. The algorithm scans the image column by column, looking for vertical bands of white space where the canvas and margin naturally separate. The boundary is set at the midpoint of the largest gap on each side.

Camera distortion and paper edges tend to create false signals so the outer 5% of the image on each side is ignored. This makes boundary detection more reliable. Vertical Sobel filtering is applied to emphasize vertical lines while suppressing horizontal ruled lines, which would otherwise confuse the analysis. And if the handwriting sits too close to the drawings for a clean gap to appear, the algorithm adjusts its threshold rather than simply failing. The boundary detector adapts to each sketch instead of requiring the developer to draw on a pre-fixed template.

Figure 3 - This graph shows the boundary analysis that was applied against the sketch in Figure 1, to determine which elements represent the user-visible UI and which are metadata associated with the UI elements.

Recognizing the widgets: YOLO on Android

For the canvas zone, Sketch to UI identifies the types of UI components and their locations. The system uses a “You Only Look Once” (YOLOv8) object detection model converted to LiteRT for on-device inference.

Training the YOLOv8 model required solving a data problem first. Different contributors draw widgets differently. Worse, existing public datasets for hand-drawn UI sketches use inconsistent visual conventions. As a result, training on those datasets produced a model that generalized poorly.

The solution was synthetic data generation. Rather than collecting and hand-annotating real sketches, we built a code-driven pipeline that programmatically generates artificial sketches. Each widget type has its own generative script that produces varied but controlled examples of how that widget might be drawn. Because the same code that generates the images also emits the labels, every training image comes with “exact ground truth” (the verified correct answer) at essentially zero annotation cost.

This allowed rapid iteration. When the model struggled with a particular widget type, the team could adjust and retrain the corresponding generative script instead of collecting and annotating more real-world examples.

The current model performs well on common widgets like buttons and image placeholders. More visually ambiguous widgets, like sliders, dropdown menus, and text entry boxes, are challenging. Our goal is to get accuracy above 90%.

Reading the margins: OCR and fuzzy matching

Once the widget positions are identified from the canvas, the system turns to the margins to extract their metadata (types, IDs, and property values).

OCR is applied only to the margin zones, not the full image, because running OCR on the entire image produces too much noise from the sketch content. Limiting OCR to the margins, where only text metadata should appear, improved accuracy greatly. When text-like content is detected inside a widget bounding box on the canvas, OCR is run on that specific region.

Even with that filtering, handwriting recognition is still imperfect: A letter can be misread, or a word may be slightly wrong. The system handles this with fuzzy string matching using Levenshtein distance, which measures how many single-character edits it would take to turn one string into another. When OCR produces something close to a known Android XML attribute or widget type, the system substitutes the closest valid match. For example, a handwritten “txtVeiw” will become TextView in the output rather than an unrecognized token that would break the generated XML.

What the output looks like

The pipeline produces valid Android XML layout code that can be built directly in Code on the Go. A sketch of a screen with a top app bar, a scrollable list, and a floating action button in the corner becomes a runnable scaffold in a few seconds, without typing a single line of XML.

Figure 4 - This image shows the XML generated by Code on the Go’s Sketch to UI (from the sketch in Figure 1, created in the interface in Figure 2).

Why this is important

Sketch to UI is a productivity multiplier. Instead of fighting with a virtual keyboard to type out <Button android:id="@+id/submit" ... />, the developer simply draws a rectangle, writes “B-1” next to it, and writes “B-1: submit” in the margin. Code on the Go handles the rest.

This system supports rapid prototyping in environments where traditional tools simply won’t run. Streamlining the creative workflow gets you to “Done!” faster.

What’s coming next

Our goal is to reach a precision score of over 90% for the standard Android widgets. And we’re working to remove even more friction. For example:

  • Direct image import: For an “image placeholder” widget in your sketch, you can tap the box on your screen to immediately select a photo from your phone’s gallery.
  • Custom models: Power users will be able to upload their own YOLO models into Code on the Go, supporting their own specialized UI components.

The future of software development isn’t just about bigger screens and more cloud power. It’s about creating tools that work wherever the developer is… even if their “office” is just a piece of paper, a pen, and a phone. (Coffee is optional.)

We gratefully acknowledge the help of Tony Santangelo in the initial graphic design and visual vocabulary of Sketch to UI.