Publications

ANVMP: A 28nm 52.6μW 1.25pJ/SOP Asynchronous Non-Volatile-Memory-based Computing-In-Memory Neuromorphic Processor for Edge-AI Applications

Published in ASSCC, 2024

The increasing demand for low power edge-AI devices that target various tasks has driven many research efforts. To address this demand, many technologies such as computing-in-memory (CIM) [1], spiking neural networks (SNN) [2], and asynchronous logic [3] have been implemented on edge-AI devices. However, previous works either had energy consumptions of many mW [1, 2, 3, 5] or exclusively addressed a single task [4]. In this paper, we propose a neuromorphic processor (ANVMP) that adapts non-volatile memory (NVM)-based CIM, asynchronous logic and achieves the performance of 52.6μW 1.25pJ/SOP@0.55V, 10% event rate, and 0.7% activation ratio (Fig. 1 (top)). The energy-efficient NVM-based CIM allows to power off the CIM macros completely during idle time to save power. The SNN-based neuromorphic computing uses binary coding which removes the need for DACs for CIM processing, and sparse spikes which reduce the energy consumption of information transmission with the cost of throughput. The event-driven nature of asynchronous logic allows energy to go where and when needed. Moreover, asynchronous logic offers fine-grained controls of CIM macros, which saves its power-on time.

Recommended citation: Jilin Zhang, Qiumeng Wei, Dexuan Huo, Tao Li, Bin Gao, He Qian, Huaqiang Wu, Kea-Tiong Tang, Hong Chen*, “ANVMP: A 28nm 52.6μW 1.25pJ/SOP Asynchronous Non-Volatile-Memory-based Computing-In-Memory Neuromorphic Processor for Edge-AI Applications” The IEEE Asian Solid-State Circuits Conference (A-SSCC), November 18 - 21, 2024, Hiroshima, Japan -

ANP-I: A 28nm 1.5pJ/SOP Asynchronous Spiking Neural Network Processor Enabling Sub-O.1 μJ/Sample On-Chip Learning for Edge-AI Applications (Open Access)

Published in JSSCC, 2024

Reducing learning energy consumption is critical to edge-artificial intelligence (AI) processors with on-chip learning since on-chip learning energy dominates energy consumption, especially for applications that require long-term learning. To achieve this goal, we optimize a neuromorphic learning algorithm and propose random target window (TW) selection, hierarchical update skip (HUS), and asynchronous time step acceleration (ATSA) to reduce the on-chip learning power consumption. Our approach results in a 28-nm 1.25-mm 2 asynchronous neuromorphic processor (ANP-I) with on-chip learning energy per sample less than 15% of inference energy per sample. With all weights randomly initialized, this processor enables on-chip learning for edge-AI tasks such as gesture recognition, keyword spotting, and image classification, consuming sub-0.1 μ J of learning energy per sample at 0.56 V and 40-MHz frequency while maintaining > 92% accuracy for all tasks.

Recommended citation: J. Zhang et al., "ANP-I: A 28-nm 1.5-pJ/SOP Asynchronous Spiking Neural Network Processor Enabling Sub-0.1- $\mu $ J/Sample On-Chip Learning for Edge-AI Applications," in IEEE Journal of Solid-State Circuits, doi: 10.1109/JSSC.2024.3357045. https://ieeexplore.ieee.org/document/10416736

Designing Self-timed Asynchronous Circuits with Chisel

Published in ASYNC, 2023

As an embedded library of the Scala programming language that leverages many features of object-oriented and functional programming, Chisel is a new generation hardware construction language designed for agile development. This paper proposes a design flow for designing self-timed asynchronous circuits using Chisel, which has two main features: 1) a reusable asynchronous library is created based on Chisel, which allows designers to develop flexible asynchronous modules for different applications; 2) the analysis of asynchronous circuits designed with Chisel is automatic, including auto-timing constraints generation. In our flow, designers use Chisel and a set of reusable asynchronous libraries to describe the asynchronous circuit. Then, the Chisel compiler generates an intermediate representation. Next, the circuit transformation module analyzes the IR to generate the final Verilog netlist and timing constraint files. Finally, the backend design is realized with commercial EDA tools. To demonstrate the feasibility of our approach, we implement the Fibonacci and greatest common divisor circuits on FPGA and compare them with previous designs with VHDL. We find that Chisel-based designs have at least 70% fewer lines of code. Besides, we design an asynchronous SNN processor(ANP-T) using Chisel and implement it with a 22nm CMOS process to verify our approach in the ASIC flow.

Recommended citation: Jilin Zhang, Chunqi Qian, Ddxuan Huo, Jian Zhang and Hong Chen, "Designing Self-timed Asynchronous Circuits with Chisel," 2023 28th IEEE International Symposium on Asynchronous Circuits and Systems (ASYNC), Beijing, China, 2023, pp. 27-33, doi: 10.1109/ASYNC58294.2023.10239616. https://ieeexplore.ieee.org/document/10239616

ANP-I: A 28nm 1.5pJ/SOP Asynchronous Spiking Neural Network Processor Enabling Sub-O.1 μJ/Sample On-Chip Learning for Edge-AI Applications

Published in ISSCC, 2023

With the development of on-chip learning processors for edge-AI applications, energy efficiency of NN inference and training is more and more critical. As on-chip training energy dominates the energy consumption of edge-AI processors [1], [2], [4], [5], reduction is of paramount importance. Spiking neural networks (SNNs) offer energy-efficient inference and learning compared with convolutional neural networks (CNNs) or deepneural networks (DNNs), but SNN-based processors have three challenges that need to be addressed (Fig. 22.6.1). 1) During on-chip training, some factors involved in ΔW computation are zeros resulting in ΔW=O, leading to redundant ΔW computation and memory access for weight update. 2) After reaching a certain accuracy, more data cannot improve the accuracy significantly, and 95% of the energy is wasted on the unnecessary processing of the input spike events afterwards. 3) In the case of sparse input-spike events, the number of spike events in each time step is different. If spike processing is synchronized by time step, the worst-case scenario needs to be considered. As a result, energy and time are wasted.

Recommended citation: Jilin Zhang et al., "22.6 ANP-I: A 28nm 1.5pJ/SOP Asynchronous Spiking Neural Network Processor Enabling Sub-O.1 μJ/Sample On-Chip Learning for Edge-AI Applications," 2023 IEEE International Solid- State Circuits Conference (ISSCC), San Francisco, CA, USA, 2023, pp. 21-23, doi: 10.1109/ISSCC42615.2023.10067650. https://ieeexplore.ieee.org/document/10067650

An Asynchronous Reconfigurable SNN Accelerator With Event-Driven Time Step Update

Published in ASSCC, 2019

In this paper, we put forward an asynchronous spiking neural network (SNN) accelerator with 1024 neurons and 1 million synapses, which is reconfigurable in terms of network connection and neuron parameters. Bundled data asynchronous circuits are adopted to design the neuromorphic computation core and mesh network. Multicast communication is used to transmit packet among and within each core for less packet transmission and better energy efficiency. A novel time step update mechanism, which updates neurons in an event-driven manner without considering the chip-wide activity of other unrelated neurons, is proposed to improve the performance of speed. The SNN accelerator is verified by classifying MNIST handwritten digit with Xilinx VC707 FPGA. The results show that the accelerator achieves 98% accuracy with MINST database, and more than 1 GIPS/W energy efficiency which is 32 times better than previous work.

Recommended citation: Jilin Zhang, H. Wu, J. Wei, S. Wei and H. Chen, "An Asynchronous Reconfigurable SNN Accelerator With Event-Driven Time Step Update," 2019 IEEE Asian Solid-State Circuits Conference (A-SSCC), Macau, Macao, 2019, pp. 213-216, doi: 10.1109/A-SSCC47793.2019.9056903. https://ieeexplore.ieee.org/document/9056903