FYP23081 — VisionFlow
HKU
The University of Hong Kong · Department of Computer Science
COMP4804 Final Year Project · 2023–2024

A cloud-based visual scripting IDE for computer vision

VisionFlow is a cross-platform, collaborative IDE that streams real-time computer vision pipelines over WebRTC — build, run and share CV programs from any browser.

Justus Ip/Supervised by T.W. Chim & C.K. Chui
Motivations

Computer vision is powerful — but hard to adopt

CV is transforming manufacturing, healthcare, transport and surveillance. Yet three problems still hold teams back from putting it to work.

Hardware constraint

Hardware constraint

Many advanced CV libraries and models require powerful GPUs or TPUs to run smoothly, putting them out of reach for small organisations on tight budgets.

User friendliness

User friendliness

Building CV programs requires intermediate coding skills and complex environment setup. Node-based IDEs exist for general programming, but not for CV tasks.

Black-box pipelines

Black-box pipelines

OpenCV pipelines lack real-time feedback at intermediate stages, making it hard to evaluate performance or pinpoint where things break.

Introducing VisionFlow

Build computer vision programs visually

VisionFlow is a web-based visual scripting IDE for CV. Drag, drop and connect ready-made blocks instead of writing code — and watch every stage of the pipeline render live.

From code to canvas

Recreate the same OpenCV program by wiring nodes — no Python, no boilerplate, no environment setup. The pipeline runs in real time as you build.

Build advanced pipelines

Compose object detection, tracking, OCR and custom logic into a single visual program. Chain dozens of nodes without losing observability.

Integrate with anything

Pipe outputs into your own apps, databases or external services. VisionFlow is the orchestration layer; your stack stays yours.

Extend with the built-in code editor

Need a node that doesn't exist? Write it in pure Python with the embedded editor and use it like any other block.

Technology

Built on protocols that scale

Streaming, sync and compute are decoupled and built on standards — so VisionFlow stays low-latency even as pipelines, devices and collaborators grow.

Real-time video over WebRTC

Real-time video over WebRTC

Low-latency streaming via the same protocol used by video conferencing. Hardware-accelerated with libx264 and libvpx, end-to-end encrypted via DTLS-SRTP.

Live edits over WebSocket

Live edits over WebSocket

TLS-encrypted, bidirectional state sync. Project edits propagate instantly to compute nodes and back to clients — Google-Docs-style collaboration included.

Pythonic extension API

Pythonic extension API

Write custom nodes in plain Python. Class introspection (à la Java reflection) infers I/O fields and types at runtime, so there's no special framework to learn.

Distributed compute

Distributed compute

Spread inference across multiple compute nodes to handle parallel video streams. Results aggregate back to the master via low-latency RPC.

Roadmap

What's next

VisionFlow is a working prototype — these are the next bets that will turn it into a tool people outside the lab can rely on.

  1. 01

    Expand the node library

    Add new building blocks and higher-level abstractions so users can compose richer workflows with less wiring.

  2. 02

    Optimise performance

    Reduce execution overhead, sharpen resource utilisation and scale to larger pipelines and higher-resolution streams.

  3. 03

    Documentation & tutorials

    Ship guides, reference docs and walk-throughs so newcomers can ramp up and explore the platform's full surface area.

  4. 04

    Public release

    Open VisionFlow to the public — gather feedback, accept contributions, and explore sustainable monetisation paths.

Screenshots

A look inside

Selected views from the IDE, demo projects and the embedded code editor.

Screenshot 1
Screenshot 2
Screenshot 3
Screenshot 4
Screenshot 5
Screenshot 6
Screenshot 7
Screenshot 8
Screenshot 9
Screenshot 10
Screenshot 11