Expertise and services¶
Refer to the homepage for my work experiences and business verticals/horizontals.
High-throughput imaging system POCs and MVPs¶
I build end-to-end giga-pixel, high-throughput imaging systems for research labs and early-stage startups. Compared to conventional R&D pipelines that advocates for “make it work, make it right, then make it fast”. End-to-end design projects requires a concurrent R&D workflow where the entire technical stack should develop jointly to ensure timely project delivery.
An overview of a typical high-throughput, giga-pixel imaging system as a Proof of concept (POC). (a) Plate loading and image acquisition steps overlapped in time. (b-c) High speed data links streams 96 camera pixels to the 5x GPU cards. (d) Front view of the 96 Eyes hardware, version 1.¶
Minimum viable product (MVP) for the client. Compared to the version 1, the entire instrument is rebuilt with aluminum enclosures. It also comes with a more robust motion & thermal control electronics, mechanical interlocks, laser safety housing, and remote control interfaces. It is also equipped with auxiliary nano-positioning stages for self-calibration of consumables.¶
Design and prototyping capabilities, assuming a cost-plus contract:
Physical layer: choose between widefield fluorescence imaging setup, or structured-illumination, or light-sheet, depending on the use cases;
Analog layer: sourcing CMOS cameras, avalanche photodiodes, and/or illumination modules; thermal control via Peltier modules and custom heatsink; motion PID control with custom PCBs.
Digital layer: AVR microcontroller units for low-latency instrument control; Nvidia Jetson system-on-board (SoB) for real-time embedded computer vision. Multi-GPU compute workstation for hardware-accelerated image signal processing.
Tooling and DevOps: Ansible for continuous deployment (CD); Buildbot for continuous integration (CI).
Storage layer: ZFS for near-instant file garbage collection; EXT4 for peak RW throughput. HDF5 for high-throughput data container; OME-TIFF for long-term storage.
Compute layer: instrument control software/firmware in either Go/C++ or Python/C++ depending on the pixel throughput; CPU multithread or single/multi-GPU accelerated algorithms for image reconstructions.
Presentation layer: Terminal UI for factory-floor QC testing; WebUI for z-stack image dashboards; or OpenGL/GLFW UI for real-time volume rendering.
Note
Learn about the Amgen-Caltech project from: A.C.S. Chan, J Kim, A Pan, H Xu, D Nojima, C Hale, S Wang, C Yang, “Parallel Fourier ptychographic microscopy for high-throughput screening with 96 cameras (96 Eyes)” Scientific Reports 9, 11114 (2019).
Note
Also, read the whitepaper regarding the multidisciplinary collaboration aspects to facilitate timely MVP launch: https://arxiv.org/html/2508.18512
System-level design and architecture review¶
I offer services for system requirement capture and design, tracing the hardware/software design decisions back to the original design requirements and constraints. The deliverable can be a design-input tracking database hosted on the premises, or a static requirement analysis report with a simplified schema.
Read more about my work on end-to-end design input capture and analysis for early stage startups (i.e. 5 to 30 stakeholders).
Cross-compilation and multi-platform support, software only, C++17¶
Linux high-performance computing cluster/workstations¶
Custom-designed Linux workstation. (Left) custom GPU liquid cooling heatsinks from OEMs. (Right) Fully assembled multi-GPU workstation for high-throughput image processing. Photo courtesy of Bitspower and Caltech.¶
I primarily write multi-threaded, GPU-accelerated scientific code with the Clang/LLVM toolchain. I also offer material sourcing service for high-performance computing (HPC) workstations with server-grade dual-CPU motherboards, USB/Ethernet hubs, and multi-GPU cards.
Read my code for multi-GPU workstations.
Nvidia GPU Jetson system-on-board (SoB)¶
I offer Ansible-assisted, over-the-air (OTA) application deployment over the Nvidia Jetpack’s Linux base image. Compared to the Yocto’s build-everything-from-scratch approach, Ansible approach ensures minimal wear and tear of the boot sector in the UFS/eMMC chips.
Camera MIPI/USBVision interface integration in userspace is also supported, as long as the kernelspace drivers are already licensed from the OEMs to the clients.
Examples of camera OEMs are: ArduCAM, Allied Vision.
Windows 10/WSL native C++/Golang applications, with WebGUI¶
I provide cross-compilation supports and DevOps to support C++/Golang application testing on virtual machine hosted on Windows (i.e. Windows system for Linux, WSL), and continuous deployment of end-user applications to Windows 10/11. I specialize in C++ function mangling to call MSVC-style symbols from GNU/MinGW64 runtime, getting the best of both worlds.
Xilinx/AMD FPGA system-on-chip¶
I customize Yocto build system directory structures integrating user-provided Vivado hardware designs (e.g. as blackbox). I also offer Meson build systems integrations to facilitate userspace application testing for both off-target (i.e. no QEMU) and on-target.
Android support, native boot sector¶
I no longer provide this service because of the cost-prohibitive EULAs and licenses that comes with Snapdragon/Tensor’s hardware development kits (HDK). Exceptions can be made if the customers has already licensed from the SoC vendors. Learn about my contributions to Build systems
GPU acceleration and domain-specific language design for scientists¶
The hardware-accelerated compute landscape is far more fragmented than ever, resulting in vendor lock-in of user algorithms. We are now witnessing CPUs (equipped with 512-bit SIMDs) out-competing mainstream GPUs on specific algorithm pipelines. Even for GPUs from the same vendor, the micro-architecture drastically changes from one generation (e.g. Nvidia Maxwell) to another (e.g. Volta).
Therefore, instead of writing multiple versions of static, hand-optimized (CUDA or C++) code targeting specific system-on-chips (SoCs), it makes more sense to design high-throughput image processing algorithms in a portable language with zero-cost abstraction, similar to the decoupling between computer-aided design (CAD) and computer-aided manufacture (CAM).
Read more about my work on Imaging problem formulation language.
Low-latency computing, C++17/C++20¶
(Left) procedural programming style, versus (Right) declarative programming style enabled by C++17/C++20 features.¶
I specialize in baremetal, embedded system programming in C++17/C++20 language, with stack-space optimization. Microcontroller unit (MCU) and CPU architecture ranging from AVR to ARMv8/NEON. I also offer tutorials for founding scientists/engineers to equip them for Static-Type-driven development (i.e. parse, don’t verify) and Behavior-driven development (BDD) for early algorithms with the low-latency mindset.
Read my contribution of the security-hardened smartphone camera to Adobe Content Authenticity Initiative (CAI).
Illumination/imaging optics design¶
Optical system design, as a business, has become very costly to run over the past few years for freelancer CDM (contracted development & manufacturing), especially without purchasing authority and a host lab to support the R&D. Nowadays, I mostly compose/evaluate optical schematics, and verify them against OEM-provided specifications. If you find anyone who wishes to ship the instrument on loan to my office, I am more than willing to pick up the slack. Please feel free to contact me on LinkedIn.
Read more about my projects here.
Instrument control PCB design¶
I used to build mixed-signal control PCBs for a living. Recently, I stopped offering this service amid rising BOM costs and PCBA fabs. But I am willing to pick up the slack if my clients offer a cost-plus contract.
My specialties:
single-axis motion control with PID feedback, implemented via biquads tuned from Z-transformed designs;
Biomedical signal preconditioning/amplifier with second-order OpAmp filters;
Analog PLL circuit design, GHz input bandwith, ~10MHz clock output.
Peltier thermal control.
Adapter boards for Zynq7000s and/or Nvidia Jetson system on boards (SoBs).
Read my portfolio here.
Technical illustration¶
Nowadays, I rarely offer technical illustration services except to friends at work; it is no longer a profitable service sector given the low hourly wage, and the emergence of text-to-figure generative AI, e.g. DALL-E. I still keep my own palette and pre-fab icons, in case I need to create and present novel ideas to customers and/or investors.
Read my portfolio here.