Expertise and services#
Refer to the homepage for my work experiences and business verticals/horizontals.
High-throughput imaging system POCs and MVPs#
I build end-to-end giga-pixel, high-throughput imaging systems for research labs and early-stage startups. Compared to conventional R&D pipelines that advocates for “make it work, make it right, then make it fast”. End-to-end design projects requires a concurrent R&D workflow where the entire technical stack should develop jointly to ensure timely project delivery.

An overview of a typical high-throughput, giga-pixel imaging system as a Proof of concept (POC). (a) Plate loading and image acquisition steps overlapped in time. (b-c) High speed data links streams 96 camera pixels to the 5x GPU cards. (d) Front view of the 96 Eyes hardware, version 1.#

Minimum viable product (MVP) for the client. Compared to the version 1, the entire instrument is rebuilt with aluminum enclosures. It also comes with a more robust motion & thermal control electronics, mechanical interlocks, laser safety housing, and remote control interfaces. It is also equipped with auxiliary nano-positioning stages for self-calibration of consumables.#
Design and prototyping capabilities, assuming a cost-plus contract:
Physical layer: choose between widefield fluorescence imaging setup, or structured-illumination, or light-sheet, depending on the use cases;
Analog layer: sourcing CMOS cameras, avalanche photodiodes, and/or illumination modules; thermal control via Peltier modules and custom heatsink; motion PID control with custom PCBs.
Digital layer: AVR microcontroller units for low-latency instrument control; Nvidia Jetson system-on-board (SoB) for real-time embedded computer vision. Multi-GPU compute workstation for hardware-accelerated image signal processing.
Tooling and DevOps: Ansible for continuous deployment (CD); Buildbot for continuous integration (CI).
Storage layer: ZFS for near-instant file garbage collection; EXT4 for peak RW throughput. HDF5 for high-throughput data container; OME-TIFF for long-term storage.
Compute layer: instrument control software/firmware in either Go/C++ or Python/C++ depending on the pixel throughput; CPU multithread or single/multi-GPU accelerated algorithms for image reconstructions.
Presentation layer: Terminal UI for factory-floor QC testing; WebUI for z-stack image dashboards; or OpenGL/GLFW UI for real-time volume rendering.
Note
Learn about the Amgen-Caltech project from: A.C.S. Chan, J Kim, A Pan, H Xu, D Nojima, C Hale, S Wang, C Yang, “Parallel Fourier ptychographic microscopy for high-throughput screening with 96 cameras (96 Eyes)” Scientific Reports 9, 11114 (2019).
Note
Also, read the whitepaper regarding the multidisciplinary collaboration aspects to facilitate timely MVP launch: https://arxiv.org/html/2508.18512v1
System-level design and architecture review#

I offer services for system requirement capture and design, tracing the hardware/software design decisions back to the original design requirements and constraints. The deliverable can be a design-input tracking database hosted on the premises, or a static requirement analysis report with a simplified schema.
Read more about my work on end-to-end design input capture and analysis for early stage startups (i.e. 5 to 30 stakeholders).
Cross-compilation and multi-platform support, software only, C++17#
Linux high-performance computing cluster/workstations#

Custom-designed Linux workstation. (Left) custom GPU liquid cooling heatsinks from OEMs. (Right) Fully assembled multi-GPU workstation for high-throughput image processing. Photo courtesy of Bitspower and Caltech.#
I primarily write multi-threaded, GPU-accelerated scientific code with the Clang/LLVM toolchain. I also offer material sourcing service for high-performance computing (HPC) workstations with server-grade dual-CPU motherboards, USB/Ethernet hubs, and multi-GPU cards.
Read my code for multi-GPU workstations.
Nvidia GPU Jetson system-on-board (SoB)#
I offer Ansible-assisted, over-the-air (OTA) application deployment over the Nvidia Jetpack’s Linux base image. Compared to the Yocto’s build-everything-from-scratch approach, Ansible approach ensures minimal wear and tear of the boot sector in the UFS/eMMC chips.
Camera MIPI/USBVision interface integration in userspace is also supported, as long as the kernelspace drivers are already licensed from the OEMs to the clients.
Examples of camera OEMs are: ArduCAM, Allied Vision.
Windows 10/WSL native C++/Golang applications, with WebGUI#
I provide cross-compilation supports and DevOps to support C++/Golang application testing on virtual machine hosted on Windows (i.e. Windows system for Linux, WSL), and continuous deployment of end-user applications to Windows 10/11. I specialize in C++ function mangling to call MSVC-style symbols from GNU/MinGW64 runtime, getting the best of both worlds.
Xilinx/AMD FPGA system-on-chip#

I customize Yocto build system directory structures integrating user-provided Vivado hardware designs (e.g. as blackbox). I also offer Meson build systems integrations to facilitate userspace application testing for both off-target (i.e. no QEMU) and on-target.
Android support, native boot sector#
I no longer provide this service because of the cost-prohibitive EULAs and licenses that comes with Snapdragon/Tensor’s hardware development kits (HDK). Exceptions can be made if the customers has already licensed from the SoC vendors. Learn about my contributions to Build systems
GPU acceleration and domain-specific language design for scientists#

The hardware-accelerated compute landscape is far more fragmented than ever, resulting in vendor lock-in of user algorithms. We are now witnessing CPUs (equipped with 512-bit SIMDs) out-competing mainstream GPUs on specific algorithm pipelines. Even for GPUs from the same vendor, the micro-architecture drastically changes from one generation (e.g. Nvidia Maxwell) to another (e.g. Volta).
Therefore, instead of writing multiple versions of static, hand-optimized (CUDA or C++) code targeting specific system-on-chips (SoCs), it makes more sense to design high-throughput image processing algorithms in a portable language with zero-cost abstraction, similar to the decoupling between computer-aided design (CAD) and computer-aided manufacture (CAM).
Read more about my work on Imaging problem formulation language.
Low-latency computing, C++17/C++20#

(Left) procedural programming style, versus (Right) declarative programming style enabled by C++17/C++20 features.#
I specialize in baremetal, embedded system programming in C++17/C++20 language, with stack-space optimization. Microcontroller unit (MCU) and CPU architecture ranging from AVR to ARMv8/NEON. I also offer tutorials for founding scientists/engineers to equip them for Static-Type-driven development (i.e. parse, don’t verify) and Behavior-driven development (BDD) for early algorithms with the low-latency mindset.
Read my contribution of the security-hardened smartphone camera to Adobe Content Authenticity Initiative (CAI).
Illumination/imaging optics design#

Optical system design, as a business, has become very costly to run over the past few years for freelancer CDM (contracted development & manufacturing), especially without purchasing authority and a host lab to support the R&D. Nowadays, I mostly compose/evaluate optical schematics, and verify them against OEM-provided specifications. If you find anyone who wishes to ship the instrument on loan to my office, I am more than willing to pick up the slack. Please feel free to contact me on LinkedIn.
Read more about my projects here.
Instrument control PCB design#

I used to build mixed-signal control PCBs for a living. Recently, I stopped offering this service amid rising BOM costs and PCBA fabs. But I am willing to pick up the slack if my clients offer a cost-plus contract.
My specialties:
single-axis motion control with PID feedback, implemented via biquads tuned from Z-transformed designs;
Biomedical signal preconditioning/amplifier with second-order OpAmp filters;
Analog PLL circuit design, GHz input bandwith, ~10MHz clock output.
Peltier thermal control.
Adapter boards for Zynq7000s and/or Nvidia Jetson system on boards (SoBs).
Read my portfolio here.
Technical illustration#

Nowadays, I rarely offer technical illustration services except to friends at work; it is no longer a profitable service sector given the low hourly wage, and the emergence of text-to-figure generative AI, e.g. DALL-E. I still keep my own palette and pre-fab icons, in case I need to create and present novel ideas to customers and/or investors.
Read my portfolio here.