# 

Phuc Thien Phan NGUYEN<sup>1,2</sup>, KimAnh PHAN<sup>1,2</sup>, Linh TRAN<sup>1,2</sup>

<sup>1</sup> Dept. of Electronics, Ho Chi Minh University of Techonology (HCMUT), Ho Chi Minh City, Vietnam <sup>2</sup> Vietnam National University Ho Chi Minh City (VNU-HCM), Ho Chi Minh City, Vietnam

#### linhtran@hcmut.edu.vn

Submitted September 21, 2024 / Accepted February 5, 2025 / Online first March 26, 2025

Abstract. In-memory computing (IMC) is an emerging approach to mitigating the memory bottleneck, a critical issue affecting energy efficiency and latency in modern digital computing. IMC operating in the analog domain can achieve high data density and accelerate signal processing tasks such as neural network training by leveraging nonvolatile memory technologies, specifically resistive switching devices. Conversely, content-addressable memories (CAMs), known for their inherent parallelism and fast digital lookup capabilities, are constrained by their large area and high energy consumption. To address these limitations, analog CAMs, which combine the analog domain with the tunability of memristors, have been proposed to enhance storage density and energy efficiency. In this work, we introduce a novel topology that reduces latency and area by employing the  $g_m/I_D$  design methodology to optimize the sizing of MOS devices. Utilizing the VTEAM model for simulations, our circuit achieves approximately twice the latency reduction compared to the 10T2M design, while occupying up to 66% less area. Additionally, our design exhibits the lowest latency among existing multi-bit and analog CAM approaches, reducing latency by 96%.

### Keywords

In-memory computing, content-addressable memory, analog CAM, memristor, VTEAM,  $g_m/I_D$  design methodology

## 1. Introduction

The exponential increase in data generation places unprecedented demands on computing resources, exposing significant limitations within traditional von Neumann architecture-based digital computers. These challenges are particularly pronounced in data-intensive computing tasks, where the inefficiencies associated with data movement between the processor and memory become a bottleneck. While incremental advancements in parallel processing offer some relief, a more transformative approach is emerging: inmemory computing (IMC). This paradigm shifts the locus of computation directly to the data storage sites, enabling in situ data processing within memory itself [1]. IMC represents a fundamental departure from the conventional von Neumann architecture, offering a potential solution to modern computing systems' inefficiencies.

In-memory computing (IMC) has emerged as a rapidly advancing research area, with significant progress in application development over the past decade, particularly in enhancing energy efficiency, performance, and scalability. A key primitive of IMC is content-addressable memory (CAM), which operates in contrast to traditional random access memory (RAM). In CAM, data is provided as input, and the corresponding location is extracted as output. CAMs compare input patterns with a stored set of data patterns within a CAM array, leveraging high parallelism to accelerate pattern matching [2] and lookup operations [3], [4]. Typically, CAMs or binary CAMs (BCAMs) operate with binary input patterns composed of "0" and "1" bits. However, ternary CAMs (TCAMs) offer greater flexibility by accommodating a third pattern, "X" - a wildcard that can match either "0" or "1". This capability enhances the array's functionality and density but comes at the cost of increased area and power consumption, as each memory cell requires a dedicated comparison circuit.

A novel approach to Content-Addressable Memory (CAM) is introduced in [5], where CAM operations are conducted in the analog domain, resulting in what is termed Analog Content-Addressable Memory (analog CAM – aCAM). The advantage of computation performed in the analog domain provides exponential efficiency gains over digital, particularly at lower precision requirements [6]. Compared to traditional CAM, aCAM processes analog values, allowing

it to handle both discrete and continuous ranges of values. This architecture compares a stored range of values within the aCAM against incoming analog input patterns. The match line remains charged if all input values fall within the stored range. This capability of storing and comparing a range of values in each cell offers the potential for aCAM to surpass its digital counterpart in terms of efficiency. Despite its advantages, the initial implementation of aCAM – comprising a comparator, tristate buffer, and floating gate devices – faced significant practical challenges, particularly high-power consumption and substantial area requirements, which posed difficulties in deploying the technology in an array configuration.

The use of nonvolatile memory technologies, including memristive devices [7], [8] and ferroelectric technology [9], has been shown to reduce power consumption and area in electronic circuits significantly. Memristive devices, in particular, offer unique advantages due to their ability to have their conductance freely adjusted through the application of voltage pulses [10–12], and their small footprint. Leveraging these characteristics, a memristor-based aCAM was proposed in [6]. The tunability of memristive devices allows them to store narrow ranges as discrete values, enabling them to replace digital CAMs while providing higher data density and lower energy consumption per search operation. Currently, aCAMs are the subject of extensive research, with various topologies being proposed to enhance the storage range [13], reduce footprint [14], improve accuracy [15], and memristive device programming algorithm [16]. In this paper, we propose a new approach to aCAM that offers improved area efficiency, reduced latency, and a higher dynamic range. Additionally, the  $g_m/I_D$  design methodology [17], [18] is employed to optimize the sizing of each device, ensuring that the design meets the required specifications.

The remainder of this paper is organized as follows: Section 2 reviews existing memristor models and specifies the model adopted in this study, including the relevant parameter settings. Section 3 discusses why utilizing memristor in CAM the proposed topologies of the analog CAM. Section 4 introduces a novel approach to aCAM design, employing the  $g_m/I_D$  methodology for transistor sizing. Section 5 presents simulation results, comparing the performance of the proposed design with other topologies and existing multi-bit and aCAM implementations. Section 6 provides the conclusions of this work.

## 2. Models of Memristor and Applications

Every two-terminal device that is driven by a DC or sinusoidal signal at any frequency, and has a pinched hysteresis loop in its I-V characteristic is a memristor [19–21] and the term "memristor" is commonly used to refer to memristive systems. For this paper, "memristor" will represent all devices falling within this classification.





and 
$$R_{\text{OFF}} = R_{\text{off}} \left( \frac{D - w}{D} \right)$$
, respectively.

Titanium dioxide  $(TiO_2)$  is electrically insulating due to its semiconducting properties, whereas its reduced form,  $TiO_{2-x}$ , exhibits conductivity. This conductivity arises from oxygen vacancies, which act as donor electrons and carry a positive charge. These vacancies behave as stable, mobile entities within the TiO<sub>2</sub> lattice, similar to bubbles in a liquid. Their movement can be controlled by an applied electric field, enabling dynamic modulation of the material's conductivity [22]. As illustrated in Fig. 1, when a positive voltage is applied to the top electrode, the positively charged oxygen vacancies are repelled into the pure TiO<sub>2</sub> layer, transforming it into  $TiO_{2-x}$  and turning the device ON. Conversely, applying a negative voltage attracts the vacancies upward, increasing the thickness of the insulating  $TiO_2$  layer and switching the device OFF. In practical applications, the ON switching occurs faster than the OFF switching due to differences in drift and diffusion currents. The equations describing the Linear Ion Drift model are provided as follows:

$$\begin{cases} \frac{\mathrm{d}x(t)}{\mathrm{d}t} = \frac{\mu_{\nu}R_{\mathrm{on}}}{D^2} \times i(t) \\ v(t) = \left[R_{\mathrm{on}}\left(\frac{w}{D}\right) + R_{\mathrm{off}}\left(\frac{D-w}{D}\right)\right]i(t). \end{cases}$$
(1)

Following the introduction of the first physical model of the memristor in [23], memristors have garnered attention due to their low power consumption, a critical feature in addressing the rising energy demands of machine learning applications. Analog in-memory computing with memristors offers a practical solution to overcome the memory bottleneck and energy constraints, paving the way for efficient high-performance computing systems [24]. Memristors have been applied in various domains, including memory crossbars [12], logic circuits and logic synthesis [25–27], neural networks [28], [29], data converters [30], and pattern matching [31], [32]. These diverse applications underscore the importance of accurate and reliable memristor models in the design and analysis of memristor-based systems. Typically, memristor models are used to describe device behavior through differential equations, enabling the analysis of their response to specific inputs. An effective model must balance high accuracy, computational efficiency, simplicity, and universality [33]. Addressing these requirements, the VTEAM model [34] was proposed to simplify equations while preserving essential physical behaviors, as illustrated in Fig. 2. This model assumes that the state variable remains unchanged below a defined voltage threshold and incorporates polynomial dependencies to reduce computational complexity, making it a practical tool for simulating memristor dynamics. According to [19], if the input is a voltage, a voltage-controlled time-invariant memristor is created and defined as:

$$\begin{cases} \frac{\mathrm{d}x}{\mathrm{d}t} = f(x, v)\\ i = g(x, v)v. \end{cases}$$
(2)

x(t) satisfies the state equation (2):

$$\frac{\mathrm{d}x}{\mathrm{d}t} = \begin{cases} k_{\mathrm{off}} \left(\frac{v(t)}{v_{\mathrm{off}}} - 1\right)^{\alpha_{\mathrm{off}}} f_{\mathrm{off}}(w), & 0 < v_{\mathrm{off}} < v\\ 0, & v_{\mathrm{on}} < v < v_{\mathrm{off}} \\ k_{\mathrm{on}} \left(\frac{v(t)}{v_{\mathrm{on}}} - 1\right)^{\alpha_{\mathrm{off}}} f_{\mathrm{on}}(w), & v < v_{\mathrm{on}} < 0. \end{cases}$$
(3)

denoting that  $\alpha_{on}$  and  $\alpha_{off}$  are constants,  $k_{on}$  is a negative constant,  $k_{off}$  is a positive constant,  $v_{on}$  and  $v_{off}$  are voltage threshold values,  $x \in [a_{on}, a_{off}]$  is the internal state variable presented in Fig. 2 and  $f_{on}(x)$  and  $f_{off}(x)$  are window functions defined as:

$$\begin{cases} f_{\text{off}}(x) = \exp\left[-\exp\left(\frac{x - a_{\text{off}}}{w_{\text{c}}}\right)\right] \\ f_{\text{on}}(x) = \exp\left[-\exp\left(\frac{x - a_{\text{on}}}{w_{\text{c}}}\right)\right]. \end{cases}$$
(4)

Next, the current-voltage equation can be expressed similarly to the voltage-current relationship in (1) where memristance linearly adjusts in the state variable x. Equation (5) uses Ohm law:

$$i(t) = \left[ R_{\rm on} + \frac{R_{\rm off} - R_{\rm on}}{a_{\rm off} - a_{\rm on}} (x - a_{\rm on}) \right]^{-1} v(t).$$
(5)

where memristance is defined by the same method while demonstrating the Linear Ion Drift model based on the physical model in Fig. 2:

$$M(x) = R_{\rm on} \left( \frac{x - a_{\rm off}}{a_{\rm on} - a_{\rm off}} \right) + R_{\rm off} \left( \frac{a_{\rm on} - x}{a_{\rm on} - a_{\rm off}} \right) > 0$$
  
$$\implies M(x) = R_{\rm on} + \left[ (R_{\rm off} - R_{\rm on}) \left( \frac{x - a_{\rm on}}{a_{\rm off} - a_{\rm on}} \right) \right].$$
(6)

This equation helps us to define the position of  $a_{\text{off}}$  and  $a_{\text{on}}$  in the device. The operation of the Linear Ion Drift device and the Simmons Tunnel Barrier device is the same. Otherwise, the memristance of a practical device is highly nonlinear because of the tunnel effect so that memristance is assumed exponentially with any change in tunnel barrier

width *x* [22], [35], and (7) is rewritten in exponential form where  $e^{\lambda} = \frac{R_{\text{on}}}{R_{\text{off}}}$ :

$$i(t) = \left[ R_{\rm on} \exp\left(\frac{\lambda}{a_{\rm off} - a_{\rm on}} (x - a_{\rm on})\right) \right] v(t).$$
(7)

This study employs the VTEAM model to implement a novel approach to analog CAM, as introduced in [36]. The model is implemented in Verilog-A, enabling efficient computation, while SPICE simulation programs are developed to support a wide range of SPICE users. These simulations aim to mitigate the stick effect and enhance numerical performance, ensuring faster computation [37]. The parameters used in the simulations are detailed in [34], and the corresponding IV curves are presented in Fig. 3.

## 3. Analog Content-Addressable Memory with Memristors

The first aCAM, introduced in [5], comprised a comparator, tristate buffer, and floating gate devices. However, its practical implementation was hindered by excessive power consumption and large area requirements, particularly in array configurations. To address these limitations, a memristorbased aCAM was proposed in [6], incorporating two key subcircuits: the lower bound subcircuit (LBS) and the upper bound subcircuit (UBS), as shown in Fig. 4. These subcircuits define the search range through voltage dividers controlled by the memristance of the memristors. The matchline determines whether a search operation results in a match or mismatch via pull-down transistors, executing a NOR-aCAM operation based on the gate voltages  $V_{GS1}$  and  $V_{GS2}$ . As illustrated in Fig. 5(a), a match occurs when both  $V_{GS1}$  and  $V_{GS2}$  remain below the threshold voltages of the pull-down transistors (T2 and T6).

Despite these improvements, the aCAM cell faces challenges such as sensitivity to variations and word-length dependency, primarily due to leakage currents through the pull-down transistors. To mitigate these issues, thresholdswitching memristors have been employed as matchline discharging devices, as depicted in Fig. 5(b) [38]. Furthermore, an alternative approach presented in [13] addresses leakageinduced search inaccuracies. The pull-down transistor's conductance varies continuously with respect to  $V_{DL}$ , with its sensitivity characterized by  $\frac{\partial G_{\rm T}}{\partial V_{\rm DL}}$ . This variation can distort the search result. To counteract this effect, the design enhances the gain at both LBS and UBS or adjusts the voltage transfer curve slope of  $V_{\rm G}$  versus  $V_{\rm DL}$  to be steeper [13]. in Fig. 5(c), a non-inverting buffer is introduced within the LBS to amplify gain in a 10T2M aCAM cell. Alternatively, in an 8T2M aCAM configuration, the second inverter is replaced with a p-type transistor, as illustrated in Fig. 5(d).



Fig. 2. The architecture of the Simmons Tunnel Barrier model where x implies the width of the Simmons Tunnel Barrier,  $v_g$  is voltage of undoped region or voltage across the barrier, v is the internal voltage and V is voltage potential applied model; (a) The applied voltage is not the same device polarity, memristance reaches LRS; (b) The applied voltage is the same device polarity, memristance reaches HRS.



Fig. 3. The IV characteristics of the VTEAM.



Fig. 4. Analog CAM structure including lower-bound subcircuit (LBS) and upper-bound subcircuit (UBS) storing lower bound and upper bound in the search range, respectively.

However, these design refinements come at the cost of increased circuit area. To address this, a more compact design, known as the 1T1M+2T aCAM pixel, was proposed in [14], as illustrated in Fig. 5(e). This architecture offers several advantages, including reduced footprint, enhanced tunability, and high-density integration. Nonetheless, mismatches between the discharging characteristics of PMOS and NMOS pull-down transistors (T2 and T3) introduce an additional challenge, as PMOS devices typically exhibit a soft-edge matching window, whereas NMOS devices have a sharp-edge response. This mismatch reduces the evaluation time window, limiting the matchline discharge opportunity [14].

Beyond precision enhancements, aCAM designs require proper memristor programming before search operations. As shown in Fig. 5(a), each subcircuit's data lines select the programmed memristor, where programming voltages  $V_{\text{SLhi}}$  and  $V_{\text{SLlo}}$  adjust the memristance similarly to a 1T1Mbased comparator. An alternative method introduced in [16] leverages a lookup table (LUT)-based algorithm to define boundary voltages instead of directly programming memristor conductance, effectively mapping the relationship between conductance and programming data line voltages.

The advancements in memristor-based aCAM designs enable a range of practical applications across various domains. One significant use case, depicted in Fig. 6(a), involves their integration into network routers for efficient range classification within a 16-bit Class B IP address space (0–65535). By encoding search ranges at different bit resolutions, aCAMs substantially reduce lookup table sizes compared to traditional TCAMs. For instance, a conventional TCAM implementation requires a  $21 \times 16$  table to represent the range 385–58630, as shown in Fig. 6(a.1). Higherbit aCAMs, such as the 8-bit design in Fig. 6(a.4), provide improved precision and efficiency but introduce additional complexity, including digital-to-analog conversion overhead and increased susceptibility to analog processing variability. Moreover, memristor-based aCAMs offer an efficient platform for implementing decision trees, where each decision node is mapped to the aCAM, enabling parallel processing of search queries. Intermediate results are stored in memristor RAM, facilitating rapid data retrieval and optimized tree traversal. This hybrid architecture is particularly wellsuited for real-time applications such as pattern recognition and machine learning, where speed and energy efficiency are crucial, as demonstrated in prior studies [31], [32].

## 4. Analog Content-Addressable Memory with Memristors: A New Approach

In this proposal, the search line architecture departs from previous designs by introducing two distinct components: search line high 1 ( $V_{SLhi1}$ ) and search line high 2

 $(V_{\text{SLhi2}})$ , while maintaining a unified search line low  $(V_{\text{SLlo}})$ . Additionally, a  $V_{DD}$  rail is integrated into the design to enhance gain in subsequent stages. The proposed circuit in Fig. 7 comprises seven transistors and five memristors (7T5M). During operation, the input search data are applied along the data lines (DL), with match/mismatch outcomes observed on the matchline. This operation is referred to as the search mode, which includes both the precharge and evaluate phases. In contrast, the analog storage range is configured by adjusting the memristance of two memristors in the first stage of each subcircuit. These memristors require programming before the commencement of the search mode, following a process analogous to that used in one-transistor one-memristor (1T1M) arrays [10], [11]. This procedure is termed the programming mode, which encompasses both the set and reset processes. This section will discuss the performance of the programming and search modes, the rationale for employing the multi-threshold voltage technique in this design, and the design flow utilizing the  $g_m/I_D$  design methodology to optimize the size of MOS devices.

During the programming mode, Data Line 1 (DL1) and Data Line 2 (DL2) are used to select the array column, determining which memristor is programmed. As shown in Tab. 1, programming voltages are applied through the row wires—specifically, V<sub>SLhi1</sub>, V<sub>SLhi2</sub>, V<sub>SLlo1</sub>, and V<sub>SLlo2</sub>—to set or reset the memristance. Compliance current adjustments are incorporated to enhance multilevel tunability. For example, the set process for M1, which reduces resistance to achieve the Low Resistance State (LRS), is initiated when DL1 is set to  $V_{G,SET}$ , with  $V_{SLhi1}$  at  $V_{SET,hi1}$  and  $V_{SLlo1}$ grounded. Conversely, the reset process for M1, which increases resistance to reach the High Resistance State (HRS), begins when DL1 is set to  $V_{DD}$ , with  $V_{SLlo1}$  at  $V_{RESET,lo1}$  and  $V_{\text{SLhi1}}$  at 0. This separation of  $V_{\text{SLhi}}$  into  $V_{\text{SLhi1}}$  and  $V_{\text{SLhi2}}$  is a distinctive feature of the proposed design, offering greater control over the programming process.

At the beginning of the search mode, similar to existing CAM circuit implementations, the matchline for each row is precharged to a high logic level, known as the precharge phase. Following this phase, the data line is applied to transistors T1 and T5 to initiate the evaluation phase. In the lower bound subcircuit, for example, T1 is responsible for dividing the voltage between  $V_{\text{SLhi1}}$  and  $V_{\text{SLlo1}}$  (i.e.,  $V_{\text{SL1}} = V_{\text{SLhi1}} - V_{\text{SLlo1}}$ ) across the memristor M1. When the resistance of T1 is high relative to that of M1, the node voltage  $V_{\rm A} = \frac{R_{\rm T_1}}{R_{\rm T_1} + R_{\rm M1}} V_{\rm SL1}$  in Fig. 7 will be high. Consequently, after passing through the non-inverting stage composed of two cascaded memristor-based inverters, VG1 exceeds the lower bound match threshold, activating T4 [39]. This match threshold is configured by adjusting the memristor conductance in the resistor-based inverter circuit during the programming mode [31]. Upon activation, T4 draws a significant drain current  $I_{D,T4} = I_{ML,LBS}$  from the matchline, ultimately discharging it to indicate a mismatch. Conversely, if the resistance of T1 is low relative to M1, T2 will not



Fig. 5. (a) Schematic of 6T2M aCAM [6] in search mode where data lines of LBS and UBS are merged; (b) Schematic of 4T2M2S aCAM [38] replacing two pull-down transistors with threshold switching memristor; (c) Schematic of 10T2M aCAM [13] increasing gain at each subcircuit to a steeper slope; (d) Schematic of 8T2M aCAM [13] utilizing PMOS as pull-down transistor; (e) Schematic of 1T1R+2T aCAM [14] with a small footprint.

| Operation | V <sub>SLhi1</sub>   | V <sub>SLhi2</sub>   | V <sub>SLlo1</sub>     | V <sub>SLlo2</sub>      | Data line 1     | Data line 2     |
|-----------|----------------------|----------------------|------------------------|-------------------------|-----------------|-----------------|
| Set M1    | V <sub>SET,hi1</sub> | 0                    | 0                      | 0                       | $V_{\rm G,SET}$ | 0               |
| Set M2    | 0                    | V <sub>SET,hi2</sub> | 0                      | 0                       | 0               | $V_{\rm G,SET}$ |
| Reset M1  | 0                    | 0                    | V <sub>RESET,lo1</sub> | 0                       | $V_{\rm DD}$    | 0               |
| Reset M2  | 0                    | 0                    | 0                      | $V_{\text{RESET, lo2}}$ | 0               | $V_{ m DD}$     |

Tab. 1. The set and reset operation in 7T5M design's programming mode.



Fig. 6. aCAM application examples: (a) Comparison of CAM tables for range searches between 385 and 58630 using (a.1) TCAM, (a.2) 3-bit aCAM, (a.3) 4-bit aCAM, and (a.4) 8-bit aCAM. (b) Mapping a decision tree to a memristor-based architecture for fast tree traversal and efficient data retrieval. Decision trees, which represent hierarchical structures used for decision-making processes, such as classification tasks in machine learning, are directly mapped onto the memristor-based system. aCAM is utilized for rapid searching and decision-making based on memristor-based entries, while RAM provides random access storage for additional data retrieval. This integrated approach of combining CAM and RAM accelerates decision tree traversal by efficiently storing and retrieving relevant data points.



Fig. 7. The schematic of the proposed acAM 7T5M cell in search mode.

be activated, resulting in a match. Thus, the lower bound subcircuit determines whether the input data is greater than the stored value of the left boundary encoded by M1. The operation of the upper bound subcircuit is analogous, with the key difference being the insertion of a memristor-based inverter between the voltage divider (T5-M4) and the pulldown transistor T7. In this case, a high input data value activates T7, corresponding to a mismatch operation. Therefore, the upper bound subcircuit verifies whether the input data is smaller than the stored value of the right boundary encoded by M4. As a result, the aCAM effectively determines whether the input data falls within the range stored in the cell [31]. About the lower/upper match threshold mentioned above, the UBS and LBS voltage transfer curves for  $V_{\rm G}$  versus  $V_{DL}$  can be utilized to determine these values, where  $V_{DL}$ results in  $V_{G1} = V_{Thn}$  and  $V_{G2} = V_{Thn}$ .

The proposed design employs multi-threshold voltage (multi- $V_t$ ) MOSFETs to improve gain, reduce power consumption, and minimize circuit area. By incorporating both high and low threshold voltage transistors, the design enhances power efficiency by limiting leakage and reducing delay while maintaining high-speed operation through controlled power dissipation. Additionally, noise and parasitic effects introduce variations in memristance, which can impact the accuracy of the search range [LBS, UBS], particularly at boundary values. To ensure proper input adaptation, low-threshold devices are used in the initial stage of both subcircuits. The circuit operates in two primary phases: precharge and evaluation. Total energy consumption is determined by both phases, with the evaluation phase being significantly influenced by the voltage-divider circuits in the memristor-based inverters. To optimize performance, transistors T2, T3, T4, T6, and T7 are designed with high threshold voltages. Furthermore, the scalability of aCAMs in large-scale integration must be considered, particularly the effects of word length [6] and potential sneak path issues [40] when integrating memristors.

Especially, the  $g_m/I_D$  design methodology [17], [18] is utilized to achieve these requirements where it represents a significant advancement over the traditional square-lawbased approach. This methodology provides analog designers with enhanced control and flexibility, allowing for more effective management of design trade-offs. For instance, the specifications for each stage in this design are  $A_v = [5, 10], B \ge 10$  MHz, and minimize the power dissipation. As memristance decreases, energy consumption increases due to higher leakage in these circuits so the memristor's polarity is placed to obtain HRS. It implies that gain in stages T2–M2, T3–M3, and T6–M5 depend on transconductance of MOS devices ( $A_v = g_m (r_{ds} \parallel R_M) \approx g_m \propto \left(\frac{W}{L}\right)$ ).

At low frequencies, the poles at input and output are defined as:

$$|p_{\rm in}| = \frac{1}{R_{\rm G}C_{\rm GS}} \ge \frac{1}{2\pi B},$$
 (8)

| Device's name | Device's type | W/L      |
|---------------|---------------|----------|
| T1            | NMOS-LVT      | 12n/50n  |
| T2            | NMOS-HVT      | 260n/50n |
| T3            | NMOS-HVT      | 340n/50n |
| T4            | NMOS-HVT      | 90n/50n  |
| T5            | NMOS-LVT      | 12n/50n  |
| T6            | NMOS-HVT      | 260n/50n |
| T7            | NMOS-HVT      | 90n/50n  |

Tab. 2. Design parameters.

$$|p_{\text{out}}| = \frac{1}{R_{\text{M}}C_{\text{L}}} \ge \frac{1}{2\pi B}.$$
 (9)

These voltage dividers in design achieve analog search functionality by applying a gate voltage to the transistor, forming a variable resistor divider, according to Thevenin's theorem,  $R_{\rm G}$  equivalents to  $R_{\rm T_1} \parallel R_{\rm M_1}$ . Besides, load capacitance is assumed of 5fF where  $C_{\rm GS}$  is also assumed the largest parasitic capacitance in MOS devices in the next stage. From these equations and the condition of bandwidth, the term  $g_{\rm m}$  and  $C_{\rm GS}$  can be expressed as:

$$g_{\rm m} \ge 2\pi B C_{\rm L},\tag{10}$$

$$C_{\rm GS} \le \frac{1}{2\pi B R_{\rm G}}.\tag{11}$$

In this methodology, the design process logically begins by defining the minimum transit frequency  $f_{\text{Tmin}}$ , calculated as  $f_{\text{Tmin}} = \frac{g_{\text{mmin}}}{2\pi C_{\text{GSmax}}}$ , where  $g_{\text{m}}/I_{\text{D}}$  is determined from the transit frequency versus  $g_{\text{m}}/I_{\text{D}}$  chart. The drain current  $I_{\text{D}}$  is then derived using  $I_{\text{D}} = \frac{g_{\text{mmin}}}{g_{\text{m}}/I_{\text{D}}}$ , and the current density  $\frac{I_{\text{D}}}{width}$  is obtained from the current density versus  $g_{\text{m}}/I_{\text{D}}$  chart. This process must consider two key factors. First, minimizing power dissipation for a given speed or noise specification necessitates a low drain current. Second, devices where  $g_{\text{m}}$  contributes to gain, such as those in an input stage, require a large  $g_{\text{m}}$ . Consequently, this design emphasizes a large  $g_{\text{m}}/I_{\text{D}}$ , particularly for devices in moderate or weak inversion biasing. The optimal  $g_{\text{m}}/I_{\text{D}}$  ratio typically lies within the range of 10 to 20, especially in the moderate inversion region. A  $g_{\text{m}}/I_{\text{D}}$  ratio of 20 is selected at each stage of this design, following the outlined design flow. The design parameters are detailed in Tab. 2.

#### 5. Results and Discussion

Simulations were performed using the FreePDK45 technology [41] and the VTEAM model [34], [36] with parameters shown in Sec. 2 in Cadence Virtuoso at the typical corner. The 7T5M aCAM cell was configured in a  $1 \times 16$ array with a 0.8 V supply voltage for comparison with other cell designs from the 45 nm PTM [13]. Results are summarized in Tab. 3. Key metrics under fixed constraints include dynamic range (DR), latency, energy dissipation, and area. DR was measured after a 1 ns precharge phase, defined as the minimum separation between full-match and one-mismatch

| Design          | Memristor model  | DR <sup>a</sup> [mV] | Latency <sup>b</sup> [ns] | Energy <sup>c</sup> [fF] | Area <sup>d</sup> $[\lambda^2]$ |
|-----------------|------------------|----------------------|---------------------------|--------------------------|---------------------------------|
| 10T2M [13]      | linear [6], [42] | 371.2                | 0.083                     | 73.3                     | 1624                            |
| 8T2M [13]       | linear [6], [42] | 37.5                 | 0.03                      | 30.4                     | 1232                            |
| 4T2M2S [13]     | linear [6], [42] | 209.6                | 0.43                      | 913.7                    | 758                             |
| This work: 7T5M | VTEAM [34], [36] | 250                  | 0.04                      | 75.2                     | 1078                            |

<sup>a</sup> Calculated at 1 ns, for three intervals, <sup>b</sup> Obtained for three intervals and DR = 100 mV, <sup>c</sup> Full-mismatch energy evaluated at 1 ns, for three intervals, <sup>d</sup> Active area.

| Design           | Technology | Area [µm <sup>2</sup> ] | Search delay [ns] | Search energy [fJ] |
|------------------|------------|-------------------------|-------------------|--------------------|
| 2FE aCAM [43]    | 45 nm      | 0.05                    | 2                 | 0.55               |
| 3T-1FE MCAM [45] | 22 nm      | 0.15                    | 0.35              | 1.08               |
| 2Fe aCAM [44]    | 45 nm      | 0.15                    | 0.14              | 0.4                |
| 2Fe MCAM [46]    | 65 nm      | NA                      | 0.22              | 60                 |
| 8T2M aCAM [13]   | 45 nm      | 0.624                   | 0.1               | 34.3               |
| This work: 7T5M  | 45 nm      | 0.546                   | 0.08              | 77.6               |

Tab. 3. Comparison summary with proposed design in [13].

Tab. 4. Comparison of the existing multi-bit and aCAM designs.

cases (level =  $V_{\text{DD}} \cdot [40\%, 60\%]$ , N = 16, T = 1 ns). Latency is the shortest time among three intervals for DR to reach 100 mV. Energy dissipation was measured over 1 ns, reporting average energy consumed during a full search for mismatch cases. Area estimation for the 7T5M design was based on a single NMOS transistor with a layout area of  $40\lambda^2$  plus  $2\lambda$  spacing on each side, totaling  $112\lambda^2$  [13]. For a 45 nm technology node, with a transistor length of 45 nm  $(2\lambda = 45 \text{ nm})$  and width of 90 nm, each cell design's area was estimated considering the number of NMOS and PMOS transistors. Memristors, fabricated using BEOL technology [6], were assumed to occupy an area equivalent to two transistors, so no additional area was allocated for memristor devices in the 6T2M, 10T2M, 8T2M, 7T5M, and 4T2M2S designs.

The proposed 7T5M design exhibits a substantial enhancement in dynamic range relative to previous designs. It achieves a dynamic range approximately 67.3% greater than that of the 8T2M design and is 1.2 times that of the 4T2M2S design. The 7T5M also maintains a low latency of 0.04 ns, outperforming the 10T2M design by about twofold and the 4T2M2S design by approximately 93%. However, its energy consumption is marginally higher than that of the 8T2M, the most energy-efficient design. In the 10T2M, 8T2M, and 7T5M circuits, energy consumption is driven by precharge and evaluation phases, with the evaluated energy largely influenced by the voltage-divider circuits of the memristor-based inverters. As memristance decreases, energy consumption increases due to higher leakage in these circuits. Nevertheless, the 7T5M design occupies a smaller area compared to the 8T2M (by 12.5%) and the 10T2M (by 66%), though it is 1.42 times larger than the 4T2M2S. Balancing performance with energy efficiency, the 7T5M design offers a robust solution for applications that require a high dynamic range, small area, and low latency, despite a relatively higher energy consumption profile.

A comparative analysis of the proposed 7T5M aCAM cell against recent multi-bit and analog CAM designs (as detailed in Tab. 3 and Tab. 4) evaluates key performance metrics such as the number of states, area, search delay, and energy dissipation. Simulations were conducted with varying cell counts per row: 64 for [43] and [44], 11 for [45], and 16 for [13] and the 7T5M design. The 7T5M aCAM achieves the shortest search delay of 0.08 ns, though this rapid search comes with the highest energy consumption of 77.6 fJ. The area of the 7T5M aCAM is  $0.546 \,\mu m^2$ , which is slightly less than the 8T2M aCAM but larger than the 2FE aCAM and 3T-1FE MCAM. The 2FE aCAM [44] offers a balanced performance with a search delay of 0.14 ns and the lowest energy consumption of 0.4 fJ, though it occupies more area than the 7T5M aCAM. These results highlight the importance of aligning CAM design choices with specific application requirements, particularly in balancing speed, energy efficiency, and area constraints.

#### 6. Conclusion

In this paper, a new approach is proposed named 7T5M design improving some design metrics such as a high dynamic range, small area, and low latency. This feature can be achieved depending on applying gm/ID design methodology where choosing a large gm/ID ratio obtains higher gain, and speed, and minimizes power dissipation. This paper also performs a comparison not only among four different aCAM designs (6T2M, 10T2M, 8T2M, and 4T2M2S) but also existing multi-bit FeFET-based CAMs and memristorbased aCAM. These results underscore the need to consider application-specific requirements when selecting a CAM design, particularly when balancing speed, energy efficiency, and area constraints.

## 7. Future Work

The accuracy challenges in analog-domain search operations remain unresolved, as there is no established method for evaluating the accuracy of the proposed design. Additionally, the effects of process corner variations have not been thoroughly examined. Future research will focus on defining appropriate figures of merit (FOMs) to quantify accuracy and implementing Resistive RAM, as described in [47], with post-layout simulations for validation. Moreover, the  $g_m/I_D$ design methodology will be utilized to develop a lookup table for memristance programming, improving the overall precision and robustness of the design.

#### Acknowledgments

The authors acknowledge Ho Chi Minh University of Technology (HCMUT), VNU-HCM, for supporting this study.

### References

- MANNOCI, P. In-memory computing with emerging memory devices: status and outlook. *APL Machine Learning*, 2023, vol. 1, no. 1, p. 1–25. DOI: 10.1063/5.0136403
- [2] GARZÓN, E., GOLMAN, R., JAHSHAN, Z., et al. Hamming distance tolerant content-addressable memory (HD-CAM) for DNA classification. *IEEE Access*, 2022, vol. 10, p. 28080–28093. DOI: 10.1109/ACCESS.2022.3158305
- [3] YANG, B.-D. Low-power effective memory-size expanded TCAM using data-relocation scheme. *IEEE Journal of Solid-State Circuits*, 2015, vol. 50, no. 10, p. 2441–2450. DOI: 10.1109/JSSC.2015.2457908
- [4] WOO, K.-C., YANG, B.-D. Low-area TCAM using a don't care reduction scheme. *IEEE Journal of Solid-State Circuits*, 2018, vol. 53, no. 8, p. 2427–2433. DOI: 10.1109/JSSC.2018.2822696
- [5] BLYTH, T., ORLANDO, R. Analog Content Addressable Memory (CAM) Employing Analog Nonvolatile Storage. Patent no. US6985372B1. [Online]. Available: https://patents.google.com/patent/US6985372B1/en
- [6] LI, C., GRAVES, C. E., SHENG, X., et al. Analog contentaddressable memories with memristors. *Nature Communications*, 2020, vol. 11, no. 1, p. 1–10. DOI: 10.1038/s41467-020-15254-4
- [7] GRAVES, C. E., LAM, S.-T., LI, X., et al. Memristor TCAMs accelerate regular expression matching for network intrusion detection. *IEEE Transactions on Nanotechnology*, 2019, vol. 18, p. 963–970. DOI: 10.1109/TNANO.2019.2936239
- [8] GRAVES, C. E., LI, C., SHENG, X., et al. In-memory computing with memristor content-addressable memories for pattern matching. *Advanced Materials*, 2020, vol. 32, no. 37, p. 1–10. DOI: 10.1002/adma.202003437
- [9] KIM, J., CHOI, M.-J., JANG, H. Ferroelectric field effect transistors: progress and perspective. *APL Materials*, 2021, vol. 9, no. 2, p. 1–18. DOI: 10.1063/5.0035515

- [10] MERCED-GRAFALS, E. J., DÁVILA, N., GE, N., et al. Repeatable, accurate, and high-speed multi-level programming of memristor 1T1R arrays for power-efficient analog computing applications. *Nanotechnology*, 2016, vol. 27, no. 36, p. 1–5. DOI: 10.1088/0957-4484/27/36/365202
- [11] IRMANOVA, A., MAAN, A., JAMES, A., et al. Analog selftimed programming circuits for aging memristors. *IEEE Transactions on Circuits and Systems II: Express Briefs*, 2021, vol. 68, no. 4, p. 1133–1137. DOI: 10.1109/TCSII.2020.3032282
- [12] LI, Y., ANG, K.-W. Hardware implementation of neuromorphic computing using large-scale memristor crossbar arrays. Advanced Intelligent Systems, 2020, vol. 3, no. 1, p. 1–10. DOI: 10.1002/aisy.202000137
- [13] BAZZI, J., SWEIDAN, J., FOUDA, M. E., et al. Variability-aware design of RRAM-based analog CAMs. *IEEE Access*, 2024, vol. 12, p. 55859–55873. DOI: 10.1109/ACCESS.2024.3388730
- [14] AGWA, S., PAPANDROULIDAKIS, G., PRODROMAKIS, T. A 1T1R+2T analog content-addressable memory pixel for online template matching. In *Proceedings of the 2023 IEEE International Symposium on Circuits and Systems (ISCAS)*. Monterey (USA), 2023, p. 1–5. DOI: 10.1109/ISCAS46773.2023.10181451
- [15] PEDRETTI, G., MOON, J., BRUEL, P., et al. X-time: An in-memory engine for accelerating machine learning on tabular data with CAMs. *arXiv*, 2023, p. 1–13. DOI: 10.48550/ARXIV.2304.01285
- [16] YU, J., MANEA, P.-P., AMELI, S., et al. Analog feedback-controlled memristor programming circuit for analog content-addressable memory. In *Proceedings of the 2023 IEEE International Conference* on Metrology for eXtended Reality, Artificial Intelligence and Neural Engineering (MetroXRAINE). Milan (Italy), 2023, p. 983–988. DOI: 10.1109/MetroXRAINE58569.2023.10405732
- [17] FLANDRE, D., VIVIANI, A., EGGERMONT, J.-P., et al. Improved synthesis of gain-boosted regulated-cascode CMOS stages using symbolic analysis and gm/ID methodology. *IEEE Journal of Solid-State Circuits*, 1997, vol. 32, no. 7, p. 1006–1012. DOI: 10.1109/4.597291
- [18] HESHAM, B., HASANEEN, E.-S., HAMED, H. F. A. Design procedure for two-stage CMOS opamp using gm/ID design methodology in 16 nm FinFET technology. In *Proceedings of the 2019 31st International Conference on Microelectronics (ICM)*. Cairo (Egypt), 2019, p. 325–329. DOI: 10.1109/ICM48031.2019.9021511
- [19] CHUA, L., KANG, S. M. Memristive devices and systems. *Proceedings of the IEEE*, 1976, vol. 64, no. 2, p. 209–223. DOI: 10.1109/PROC.1976.10092
- [20] CHUA, L. Everything you wish to know about memristors but are afraid to ask. *Radioengineering*, 2015, vol. 24, no. 2, p. 319–368. DOI: 10.13164/re.2015.0319
- [21] CHUA, L. Resistance switching memories are memristors. *Applied Physics A*, 2011, vol. 102, no. 4, p. 765–783. DOI: 10.1007/s00339-011-6264-9
- [22] YANG, J. J., PICKETT, M. D., LI, X., et al. Memristive switching mechanism for metal/oxide/metal nanodevices. *Nature Nanotechnol*ogy, 2008, vol. 3, no. 7, p. 429–433. DOI: 10.1038/nnano.2008.160
- [23] STRUKOV, D. B., SNIDER, G. S., STEWART, D. R., et al. The missing memristor found. *Nature*, 2008, vol. 453, no. 7191, p. 80–83. DOI: 10.1038/nature06932
- [24] SOUTO, J., BOTELLA, G., GARCÍA, D., et al. Neuromorphic circuit simulation with memristors: Design and evaluation using memtorch for MNIST and CIFAR. *arXiv*, 2024, p. 1–19. DOI: 10.48550/ARXIV.2407.13410
- [25] ZAHEDI, M., SHAHROODI, T., WONG, S., et al. Efficient signed arithmetic multiplication on memristor-based crossbar. *IEEE Access*, 2023, vol. 11, p. 33964–33978. DOI: 10.1109/ACCESS.2023.3263259

- [26] SERB, A., KHIAT, A., PRODROMAKIS, T. Seamlessly fused digital-analogue reconfigurable computing using memristors. *Nature Communications*, 2018, vol. 9, no. 1, p. 1–9. DOI: 10.1038/s41467-018-04624-8
- [27] RASHED, M. R. H., KUMAR, S. K., EWETZ, R. Logic synthesis for digital in-memory computing. In *Proceedings of the IEEE/ACM International Conference on Computer-Aided Design (ICCAD)*. San Diego (USA), 2022, p. 1–9. DOI: 10.1145/3508352.3549348
- [28] BARRAJ, I., MESTIRI, H., MASMOUDI, M. Overview of memristor-based design for analog applications. *Micromachines*, 2024, vol. 15, no. 4, p. 1–12. DOI: 10.3390/mi15040505
- [29] MLADENOV, V., KIRILOV, S. A memristor neural network based on simple logarithmic-sigmoidal transfer function with MOS transistors. *Electronics*, 2024, vol. 13, no. 5, p. 1–26. DOI: 10.3390/electronics13050893
- [30] XIE, T., YU, S., LI, S. A high-parallelism RRAM-based computein-memory macro with intrinsic impedance boosting and in-ADC computing. *IEEE Journal on Exploratory Solid-State Computational Devices and Circuits*, 2023, vol. 9, no. 1, p. 38–46. DOI: 10.1109/JXCDC.2023.3255788
- [31] BAZZI, J., SWEIDAN, J., FOUDA, M. E., et al. Hardware acceleration of DNA pattern matching using analog resistive CAMs. *Frontiers in Electronics*, 2024, vol. 4, p. 1–12. DOI: 10.3389/felec.2023.1343612
- [32] TAHA, M. M. A., TEUSCHER, C. Approximate memristive inmemory Hamming distance circuit. ACM Journal on Emerging Technologies in Computing Systems, 2020, vol. 16, no. 2, p. 1–14. DOI: 10.1145/3371391
- [33] GARCÍA-REDONDO, F., LÓPEZ-VALLEJO, M., ITUERO, P. Building memristor applications: From device model to circuit design. *IEEE Transactions on Nanotechnology*, 2014, vol. 13, no. 6, p. 1154–1162. DOI: 10.1109/TNANO.2014.2345093
- [34] KVATINSKY, S., RAMADAN, M., FRIEDMAN, E. G., et al. VTEAM: A general model for voltage-controlled memristors. *IEEE Transactions on Circuits and Systems II: Express Briefs*, 2015, vol. 62, no. 8, p. 786–790. DOI: 10.1109/TCSII.2015.2433536
- [35] KVATINSKY, S., FRIEDMAN, E. G., KOLODNY, A., et al, U. C. TEAM: Threshold adaptive memristor model. *IEEE Transactions on Circuits and Systems I: Regular Papers*, 2013, vol. 60, no. 1, p. 211– 221. DOI: 10.1109/TCSI.2012.2215714
- [36] KVATINSKY, S., TALISVEYBERG, K., FLITER, D., et al. Models of memristors for SPICE simulations. In *Proceedings of the IEEE* 27th Convention of Electrical and Electronics Engineers in Israel. Eilat (Israel), 2012, p. 1–5. DOI: 10.1109/EEEI.2012.6377081
- [37] BIOLEK, D., KOLKA, Z., BIOLKOVÁ, V., et al. (V)TEAM for SPICE simulation of memristive devices with improved numerical performance. *IEEE Access*, 2021, vol. 9, p. 30242–30255. DOI: 10.1109/ACCESS.2021.3059241
- [38] BAZZI, J., FOUDA, M. E., KANJ, R., et al. Threshold switch modeling for analog CAM design. In *Proceedings of the 32nd International Conference on Microelectronics (ICM)*. Cairo (Egypt), 2020, p. 1–4. DOI: 10.1109/ICM50269.2020.9331775
- [39] PEDRETTI, G., GRAVES, C. E., VAN VAERENBERGH, T., et al. Differentiable content addressable memory with memristors. *Advanced Electronic Materials*, 2022, vol. 8, no. 8, p. 1–9. DOI: 10.1002/aelm.202101198
- [40] SHI, L., ZHENG, G., TIAN, B., et al. Research progress on solutions to the sneak path issue in memristor crossbar arrays. *Nanoscale Advances*, 2020, vol. 2, no. 5, p. 1811–1827. DOI: 10.1039/d0na00100g

- [41] NC STATE EDA. FreePDK45. [Online] Cited 2024-04-31. Available at: https://eda.ncsu.edu/freepdk/freepdk45/
- [42] PEDRETTI, G., GRAVES, C. E., SEREBRYAKOV, S., et al. Treebased machine learning performed in-memory with memristive analog CAM. *Nature Communications*, 2021, vol. 12, no. 1, p. 1–14. DOI: 10.1038/s41467-021-25873-0
- [43] YIN, X., LI, C., HUANG, Q., et al. FECAM: A universal compact digital and analog content addressable memory using ferroelectric. *IEEE Transactions on Electron Devices*, 2020, vol. 67, no. 7, p. 2785–2792. DOI: 10.1109/TED.2020.2994896
- [44] LIU, X., KATTI, K., HE, Y., et al. Analog content-addressable memory from complementary FeFETs. *Device*, 2024, vol. 2, no. 2, p. 1–12. DOI: 10.1016/j.device.2023.100218
- [45] RAJAEI, R., SHARIFI, M. M., KAZEMI, A., et al. Compact singlephase-search multistate content-addressable memory design using one FeFET/cell. *IEEE Transactions on Electron Devices*, 2021, vol. 68, no. 1, p. 109–117. DOI: 10.1109/TED.2020.3039477
- [46] BEKDACHE, O., EDDINE, H. N., AL TAWIL, M., et al. Scalable complementary FeFET CAM design. In *Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS)*. Monterey (USA), 2023, p. 1–5. DOI: 10.1109/ISCAS46773.2023.10181788
- [47] GIACOMIN, E., GAILLARDON, P.-E. A resistive random access memory add-on for the NCSU FreePDK 45 nm. *IEEE Transactions on Nanotechnology*, 2019, vol. 18, p. 68–72. DOI: 10.1109/TNANO.2018.2881109

#### About the Authors ...

**Phuc Thien Phan NGUYEN** is a master student who also received his B.S. degree in Electronics and Telecommunications Engineering from Ho Chi Minh City University of Technology (HCMUT), VNU-HCM, in 2023. His research interests include CMOS digital circuit design, transistor and memristor modeling, and neuro-inspired engineering. He is the first author of this paper. He can be contacted at email: nptphuc.sdh232@hcmut.edu.vn.

**KimAnh PHAN** received the B.S., M.S. degree in Electronics and Telecommunications Engineering from Ho Chi Minh City University of Technology (HCMUT), VNU-HCM (2002, 2012). She is currently a Lecturer at the Faculty of Electrical-Electronics Engineering, Ho Chi Minh City University of Technology (HCMUT), VNU-HCM. Her research interests include memristor logic synthesis, memristor circuit, and analog IC design. She can be contacted at email: pvkanh@hcmut.edu.vn.

Linh TRAN (corresponding author) received the B.S. degree in Electrical and Computer Engineering from University of Illinois, Urbana – Champaign (2005), M.S. and PhD. in Computer Engineering from Portland State University (2006, 2015). Currently, he is working as Lecturer at Faculty of Electrical-Electronics Engineering, Ho Chi Minh City University of Technology – VNU HCM. His research interests include quantum/reversible logic synthesis, computer architecture, hardware-software co-design, efficient algorithms and hardware design targeting FPGAs and deep learning. He can be contacted at email: linhtran@hcmut.edu.vn.