Expertise and services#

Refer to the homepage for my work experiences and business verticals/horizontals.

High-throughput imaging system POCs and MVPs#

I build end-to-end giga-pixel, high-throughput imaging systems for research labs and early-stage startups. Compared to conventional R&D pipelines that advocates for “make it work, make it right, then make it fast”. End-to-end design projects requires a concurrent R&D workflow where the entire technical stack should develop jointly to ensure timely project delivery.

_images/96eyes-data-flow.png

An overview of a typical high-throughput, giga-pixel imaging system as a Proof of concept (POC). (a) Plate loading and image acquisition steps overlapped in time. (b-c) High speed data links streams 96 camera pixels to the 5x GPU cards. (d) Front view of the 96 Eyes hardware, version 1.#

_images/96eyes-mvp.jpg

Minimum viable product (MVP) for the client. Compared to the version 1, the entire instrument is rebuilt with aluminum enclosures. It also comes with a more robust motion & thermal control electronics, mechanical interlocks, laser safety housing, and remote control interfaces. It is also equipped with auxiliary nano-positioning stages for self-calibration of consumables.#

Design and prototyping capabilities, assuming a cost-plus contract:

  • Physical layer: choose between widefield fluorescence imaging setup, or structured-illumination, or light-sheet, depending on the use cases;

  • Analog layer: sourcing CMOS cameras, avalanche photodiodes, and/or illumination modules; thermal control via Peltier modules and custom heatsink; motion PID control with custom PCBs.

  • Digital layer: AVR microcontroller units for low-latency instrument control; Nvidia Jetson system-on-board (SoB) for real-time embedded computer vision. Multi-GPU compute workstation for hardware-accelerated image signal processing.

  • Tooling and DevOps: Ansible for continuous deployment (CD); Buildbot for continuous integration (CI).

  • Storage layer: ZFS for near-instant file garbage collection; EXT4 for peak RW throughput. HDF5 for high-throughput data container; OME-TIFF for long-term storage.

  • Compute layer: instrument control software/firmware in either Go/C++ or Python/C++ depending on the pixel throughput; CPU multithread or single/multi-GPU accelerated algorithms for image reconstructions.

  • Presentation layer: Terminal UI for factory-floor QC testing; WebUI for z-stack image dashboards; or OpenGL/GLFW UI for real-time volume rendering.

Note

Learn about the Amgen-Caltech project from: A.C.S. Chan, J Kim, A Pan, H Xu, D Nojima, C Hale, S Wang, C Yang, “Parallel Fourier ptychographic microscopy for high-throughput screening with 96 cameras (96 Eyes)” Scientific Reports 9, 11114 (2019).

Note

Also, read the whitepaper regarding the multidisciplinary collaboration aspects to facilitate timely MVP launch: https://arxiv.org/html/2508.18512v1

System-level design and architecture review#

_images/design-thinking.png

I offer services for system requirement capture and design, tracing the hardware/software design decisions back to the original design requirements and constraints. The deliverable can be a design-input tracking database hosted on the premises, or a static requirement analysis report with a simplified schema.

Read more about my work on end-to-end design input capture and analysis for early stage startups (i.e. 5 to 30 stakeholders).

Cross-compilation and multi-platform support, software only, C++17#

Linux high-performance computing cluster/workstations#

_images/gpu-cooling.jpg

Custom-designed Linux workstation. (Left) custom GPU liquid cooling heatsinks from OEMs. (Right) Fully assembled multi-GPU workstation for high-throughput image processing. Photo courtesy of Bitspower and Caltech.#

I primarily write multi-threaded, GPU-accelerated scientific code with the Clang/LLVM toolchain. I also offer material sourcing service for high-performance computing (HPC) workstations with server-grade dual-CPU motherboards, USB/Ethernet hubs, and multi-GPU cards.

Read my code for multi-GPU workstations.

Nvidia GPU Jetson system-on-board (SoB)#

I offer Ansible-assisted, over-the-air (OTA) application deployment over the Nvidia Jetpack’s Linux base image. Compared to the Yocto’s build-everything-from-scratch approach, Ansible approach ensures minimal wear and tear of the boot sector in the UFS/eMMC chips.

Camera MIPI/USBVision interface integration in userspace is also supported, as long as the kernelspace drivers are already licensed from the OEMs to the clients.

Examples of camera OEMs are: ArduCAM, Allied Vision.

Windows 10/WSL native C++/Golang applications, with WebGUI#

I provide cross-compilation supports and DevOps to support C++/Golang application testing on virtual machine hosted on Windows (i.e. Windows system for Linux, WSL), and continuous deployment of end-user applications to Windows 10/11. I specialize in C++ function mangling to call MSVC-style symbols from GNU/MinGW64 runtime, getting the best of both worlds.

Xilinx/AMD FPGA system-on-chip#

_images/ebaz4205.png

I customize Yocto build system directory structures integrating user-provided Vivado hardware designs (e.g. as blackbox). I also offer Meson build systems integrations to facilitate userspace application testing for both off-target (i.e. no QEMU) and on-target.

Android support, native boot sector#

I no longer provide this service because of the cost-prohibitive EULAs and licenses that comes with Snapdragon/Tensor’s hardware development kits (HDK). Exceptions can be made if the customers has already licensed from the SoC vendors. Learn about my contributions to Build systems

GPU acceleration and domain-specific language design for scientists#

_images/proximal-banner.png

The hardware-accelerated compute landscape is far more fragmented than ever, resulting in vendor lock-in of user algorithms. We are now witnessing CPUs (equipped with 512-bit SIMDs) out-competing mainstream GPUs on specific algorithm pipelines. Even for GPUs from the same vendor, the micro-architecture drastically changes from one generation (e.g. Nvidia Maxwell) to another (e.g. Volta).

Therefore, instead of writing multiple versions of static, hand-optimized (CUDA or C++) code targeting specific system-on-chips (SoCs), it makes more sense to design high-throughput image processing algorithms in a portable language with zero-cost abstraction, similar to the decoupling between computer-aided design (CAD) and computer-aided manufacture (CAM).

Read more about my work on Imaging problem formulation language.

Low-latency computing, C++17/C++20#

_images/cpp20-demo.jpg

(Left) procedural programming style, versus (Right) declarative programming style enabled by C++17/C++20 features.#

I specialize in baremetal, embedded system programming in C++17/C++20 language, with stack-space optimization. Microcontroller unit (MCU) and CPU architecture ranging from AVR to ARMv8/NEON. I also offer tutorials for founding scientists/engineers to equip them for Static-Type-driven development (i.e. parse, don’t verify) and Behavior-driven development (BDD) for early algorithms with the low-latency mindset.

Read my contribution of the security-hardened smartphone camera to Adobe Content Authenticity Initiative (CAI).

Illumination/imaging optics design#

_images/plastic-molded-lens.png

Optical system design, as a business, has become very costly to run over the past few years for freelancer CDM (contracted development & manufacturing), especially without purchasing authority and a host lab to support the R&D. Nowadays, I mostly compose/evaluate optical schematics, and verify them against OEM-provided specifications. If you find anyone who wishes to ship the instrument on loan to my office, I am more than willing to pick up the slack. Please feel free to contact me on LinkedIn.

Read more about my projects here.

Instrument control PCB design#

https://github.com/antonysigma/piezo-stage-pid-board/raw/master/preview.jpg

I used to build mixed-signal control PCBs for a living. Recently, I stopped offering this service amid rising BOM costs and PCBA fabs. But I am willing to pick up the slack if my clients offer a cost-plus contract.

My specialties:

  • single-axis motion control with PID feedback, implemented via biquads tuned from Z-transformed designs;

  • Biomedical signal preconditioning/amplifier with second-order OpAmp filters;

  • Analog PLL circuit design, GHz input bandwith, ~10MHz clock output.

  • Peltier thermal control.

  • Adapter boards for Zynq7000s and/or Nvidia Jetson system on boards (SoBs).

Read my portfolio here.

Technical illustration#

https://media.springernature.com/full/springer-static/image/art%3A10.1038%2Fs41598-019-47146-z/MediaObjects/41598_2019_47146_Fig1_HTML.png?as=webp

Nowadays, I rarely offer technical illustration services except to friends at work; it is no longer a profitable service sector given the low hourly wage, and the emergence of text-to-figure generative AI, e.g. DALL-E. I still keep my own palette and pre-fab icons, in case I need to create and present novel ideas to customers and/or investors.

Read my portfolio here.