# A Multi-Processing 10 000 frames/s CMOS Image Sensor for Machine Vision

Jérôme Dubois, Student Member, IEEE, Dominique Ginhac, Member, IEEE, and Michel Paindavoine

Abstract—A high speed analog VLSI Image acquisition and pre-processing system has been designed and fabricated in a a 0.35  $\mu m$  standard CMOS process. The chip features a massively parallel architecture enabling the computation of programmable low level image processing in each pixel. Extraction of spatial gradients and convolutions such as Sobel filter or Laplacian are implemented on the circuit. For this purpose, each pixel of 35  $\mu m$  x 35  $\mu m$  includes a photodiode, an amplifier, two storage capacitors and an analog arithmetic unit based on a four-quadrants multipliers architecture. The retina provides address-event coded output on three asynchronous buses, one output is dedicated to the gradient and both others to the pixel values.

A proof-of-concept chip of 64 x 64 pixels was fabricated. A dedicated embedded platform including FPGA, ADCs has also been designed to evaluate the vision chip. Measured results show that the proposed sensor successfully captures raw images up to 10 000 frames per second and runs low level image processing at a frame rate comprised 2 000 and 5 000 frames per second.

Index Terms—CMOS Image Sensor, Parallel architecture, High-speed image processing, Analog arithmetic unit.

#### I. INTRODUCTION

TODAY, improvements continue to be made in the growing digital imaging world with two main image sensor technologies: the charge coupled devices (CCD) and CMOS sensors. The continuous advances in CMOS technology for processors and DRAMs have made CMOS sensor arrays a viable alternative to the popular CCD sensors. New technologies provide the potential for integrating a significant amount of VLSI electronics onto a single chip, greatly reducing the cost, power consumption and size of the camera [1]–[4]. This advantage is especially important for implementing full image systems requiring significant processing such as digital cameras and computational sensors [5]–[7].

Most of the work on complex CMOS systems talks about the integration of sensors providing a processing unit at chip level ("system-on-chip" approach) or at column level by integrating an array of processing elements dedicated to one or more columns [8]-[11]. Indeed, pixel-level processing is generally dismissed because pixel sizes are often too large to be of practical use. However, as CMOS image sensors scale to 0.18  $\mu$ m processes and below, integrating a processing element at each pixel or group of neighboring pixels becomes feasible. More significantly, employing a processing element per pixel offers the opportunity to achieve massively parallel computations and thus the ability to exploit the high speed imaging capability of CMOS image sensors [12]-[15]. This also benefits the implementation of new complex applications at standard rates and improves the performance of existing video applications such as motion vector estimation [16]–[18], multiple capture with dynamic range [19]–[21], motion capture [22], pattern recognition [23].

As integrated circuits keep scaling down following the Moore's Law, recent trends show significant papers talking about the design of digital pixels [24]–[27] taking advantage of the increasing number of available transistors at the pixel to perform analog to digital conversion. This trend is mainly motivated by the significant advantages of pixel-level A/D conversion such as high SNR, lower power consumption, very low speeds of conversions, ... Nevertheless, the resulting implementations of in-pixel ADC are rather area consuming, strongly restricting the image processing capability of CMOS sensors.

In this paper, we discuss hardware implementation issues of a high speed CMOS imaging system embedding low level image processing. For this purpose, we designed, fabricated and tested a proof-of-concept  $64 \times 64$  pixel CMOS analog sensor with per-pixel programmable processing element in a standard 0.35  $\mu$ m double-poly quadruple-metal CMOS technology. The main objectives of our design are: (1) to evaluate the speed of the sensor, and, in particular, to reach a 10 000 frames/s rate, (2) to demonstrate a versatile and programmable processing unit at pixel-level, (3) to provide a original platform dedicated to embedded image processing.

The rest of the paper is organized as follows. The section II is dedicated to the description of the operational principle at pixel-level in the sensor. The main characteristics of the sensor architecture are described in the section III. The section IV talks about the design of the circuit. The details of the photodiode structure, the embedded analog memories, and the arithmetic unit are successively described. Finally, some experimental results of high speed image acquisition with processing at pixel-level are presented in the last section of this paper.

# II. EMBEDDED ALGORITHMS AT PIXEL LEVEL

In a traditional point of view, a CMOS sensor can be seen as an array of independent pixels, each including a photodetector (PD) and a processing element (PE) built upon few transistors. Existing works on analog pixel-level image processing can be classified into two main categories. The first one is intrapixel, in which processing is performed on the individual pixels in order to improve image quality, such as the classical Active Pixel Sensor or APS [8], [28], as shown on the Fig. 1(a).

The second category is interpixel, where the processing is dedicated to groups of pixels in order to perform some early vision processing and not merely to capture images. The transistors, which take place around the photo-detector, can be seen as a real on-chip analog signal processor which improves the functionality of the sensor. This typically allows local and/or global pixel calculations. Our work takes place in this second category because our main objective is the implementation of various in-situ image processing using local neighborhood (such as spatial gradients, Sobel and Laplacian operators). Based on this design concept, this obliges to rethink the spatial distribution of the processing resources, so that each computational unit can easily use a programmable neighborhood of pixels. Consequently, in our design, each processing element takes place in the middle of four adjacent pixels, as shown on the Fig. 1(b). The key of this distribution of the pixel-level processors is to realize both compactness of the metal interconnexions with pixels and generality of high speed processing based on neighborhood of pixels.



Fig. 1. Photosites with (a) intra-pixel and (b) inter-pixel processing

## A. Spatial Gradients

The structure of our processing unit is tailor-made for the computation spatial gradients based on a 4-neighborhood pixels algorithm, as depicted in Fig. 2.



Fig. 2. Evaluation of Spatial Gradients

The main idea for evaluating the spatial gradients [29] is based on the definition of the first-order derivative of a 2-D function performed in the vector direction  $\xi$ , which can be

expressed as:

$$\frac{\partial V(x,y)}{\partial \stackrel{\rightarrow}{\xi}} = \frac{\partial V(x,y)}{\partial x'} \cos(\beta) + \frac{\partial V(x,y)}{\partial y'} \sin(\beta) \quad (1)$$

where  $\beta$  is the vector's angle.

A discretization of the Eq. 1 at the pixel-level, according to the Figure 2, would give:

$$\frac{\partial V}{\partial \xi} = (V_2 - V_4)\cos(\beta) + (V_1 - V_3)\sin(\beta) \tag{2}$$

where  $V_i$ ,  $i \in \{1;4\}$  is the luminance at the pixel i, *i.e.*, the photodiode output. In this way, the local derivative in the direction of vector  $\vec{\xi}$  is continuously computed as a linear combination of two basis functions, the derivatives in the x' and y' directions. Using a four-quadrant multiplier [30], [31] (see section IV-C for details of design and implementation), the product of the derivatives by a cosine function can be easily computed. The output product P, as shown on the Fig. 3, is given by:

$$P = V_1 \cos(\beta) + V_2 \sin(\beta) - V_3 \sin(\beta) - V_4 \cos(\beta) \quad (3)$$



Fig. 3. Implementation of multipliers at pixel-level

Consequently, the processing element implemented at the pixel-level carries out a linear combination of the four adjacent pixels by the four associated weights ( $coef_i$ ,  $i \in \{1; 4\}$ ). To evaluate the Eq. 3, the following values have to be given to the coefficients:

$$\begin{pmatrix} coef1 & coef2 \\ coef3 & coef4 \end{pmatrix} = \begin{pmatrix} sin(\beta) & cos(\beta) \\ -sin(\beta) & -cos(\beta) \end{pmatrix}$$
(4)

From such a viewpoint, horizontal and vertical gradients can be straightforwardly evaluated by respectively fixing the value of  $\beta$  as  $0^{\circ}$  and  $90^{\circ}$ .

## B. Sobel operator

The structure of our architecture is also well-adapted to various algorithms based on convolutions using binary masks on a neighborhood of pixels. As example, the evaluation of the Sobel algorithm with our chip leads to the result directly centered on the photo-sensor and directed along the natural

3

axes of the image according to the figure 4(a). In order to compute the mathematical operation, a 3x3 neighborhood is applied on the whole image, as described on the Fig. 4(b).



Fig. 4. (a) Array architecture, (b) 3x3 mask used by the four processing elements

To carry out the discretized derivatives in two dimensions, along the horizontal and vertical axes, it is necessary to build two 3x3 matrices called h1 and h2 (see Eq. 5).

$$h_1 = \begin{pmatrix} -1 & 0 & 1 \\ -2 & 0 & 2 \\ -1 & 0 & 1 \end{pmatrix} \quad h_2 = \begin{pmatrix} -1 & -2 & -1 \\ 0 & 0 & 0 \\ 1 & 2 & 1 \end{pmatrix} \tag{5}$$

Within the four processing elements numbered from 1 to 4, as shown on the Fig. 4(a), four 2x2 masks act locally on the image. According to the Eq. 5, this allows the evaluation of the following series of operations:

$$h_{1}: \begin{array}{ll} I_{11} = -(I_{1} + I_{4}) & I_{21} = -(I_{1} + I_{2}) \\ I_{12} = +(I_{3} + I_{6}) & I_{22} = -(I_{2} + I_{3}) \\ I_{13} = +(I_{6} + I_{9}) & I_{23} = +(I_{8} + I_{9}) \\ I_{14} = -(I_{4} + I_{7}) & I_{24} = +(I_{7} + I_{8}) \end{array}$$
(6)

with the values  $I_{1k}$  and  $I_{2k}$  provided by the processing element (k). Then, from these trivial operations, the discrete amplitudes of the derivatives along the vertical axis ( $I_{h1} = I_{11} + I_{12} + I_{13} + I_{14}$ ) and the horizontal axis ( $I_{h2} = I_{21} + I_{22} + I_{23} + I_{24}$ ) can be computed. The evaluation of the horizontal and vertical gradients takes two retina cycles, one for each gradient<sup>1</sup>.

## C. Second-order detector: Laplacian

Edge detection based on some second-order derivatives such as the Laplacian can also be implemented on our architecture. Unlike spatial gradients previously described, the Laplacian is a scalar (see Eq. 7) and does not provide any indication about the edge direction.

$$\triangle = \begin{pmatrix} 0 & 1 & 0 \\ 1 & -4 & 1 \\ 0 & 1 & 0 \end{pmatrix} \tag{7}$$

From this 3x3 mask, the following operations can be extracted according to the principles used previously for the evaluation of the Sobel operator:

$$I_{11} = I_4 - I_5$$

$$\triangle : I_{12} = I_2 - I_5$$

$$I_{13} = I_6 - I_5$$

$$I_{14} = I_8 - I_5$$
(8)

The discrete amplitudes of the second-order derivative is given by:  $I_{\triangle} = I_{11} + I_{12} + I_{13} + I_{14}$ . This operations can be carried out in only one retina cycle.

## III. OVERVIEW OF THE CHIP ARCHITECTURE

As in a traditional image sensor, the core of the chip presented in this paper is constructed of a two-dimensional (2-D) pixel array, here of 64 columns and 64 rows with random pixel ability, and some peripheral circuits. It contains about  $160\,000$  transistors on a 3.675 mm  $\times$  3.775 mm die. The full layout of the retina is depicted in Fig. 5 and the main chip characteristics are listed in Table I.



Fig. 5. Layout of the full retina

TABLE I
CHIP CHARACTERISTICS

| Technology                    | 0.35μm 2-poly 4-metal CMOS                     |
|-------------------------------|------------------------------------------------|
| Array size                    | 64 × 64                                        |
| Chip size                     | 11 mm <sup>2</sup>                             |
| Number of transistors         | 160 000                                        |
| Number of transistors / pixel | 38                                             |
| Pixel size                    | $35 \ \mu\mathrm{m} \times 35 \ \mu\mathrm{m}$ |
| Sensor Fill Factor            | 25 %                                           |
| Dynamics power consumption    | 110 mW                                         |
| Supply voltage                | 3.3 V                                          |
| Frame rate                    | 10 000 fps                                     |

Each individual pixel contains a photodiode for the lightto-voltage transduction and 38 transistors integrating all the analog circuitry dedicated to the image processing algorithms.

<sup>&</sup>lt;sup>1</sup>A retina cycle is defined as the time spent between two acquisition frames including thus acquisition and preprocessing of the image.

This amount of electronics includes a preloading circuit, two "Analog Memory, Amplifier and Multiplexer" structures ([AM]<sup>2</sup>) and an "Analog Arithmetic Unit" (A<sup>2</sup>U) based on a four-quadrant multiplier architecture. The full pixel size is 35  $\mu$ m  $\times$  35  $\mu$ m with a 25 % fill factor.

Fig. 6 shows a block diagram of the proposed chip. The architecture of the chip is divided into three main blocks as in many circuits widely described in the literature. First, the array of pixels (including photodiodes with their associated circuitry for performing the analog computation) takes place at the center. Second, below the chip core are the readout circuits with the three asynchronous output buses, the first one is dedicated to the image processing results whereas the two others provides parallel outputs for full high rate acquisition of raw images. Finally, the left part of the sensor is dedicated to a row decoder for addressing the successive rows of pixels. The pixel values are selected one row at a time and read out to vertical column buses connected to an output multiplexor.

The chip also contains test structures used for detailed characterization of the photodiodes and processing units. These test structures can be seen on the bottom left of the chip.



Fig. 6. Block diagram of the chip

The operation of the imaging system can be divided into four phases: reset, integration, image processing and readout. The reset, integration and pixel-level processing phases all occur in parallel over the full array of pixels ("snap-shot" mode) in order to avoid any distorsion due to a row-by-row reset. The control of the integration time can be supervised with the global output signal called Out\_int which provides the average incidental illumination of the whole matrix of pixels. So, if the average level of the image is too low, the exposure time may be increased. On the contrary, if the scene is too luminous, the integration period may be reduced.

#### IV. DESIGN OF THE CIRCUIT

## A. Photodiode Structure

As previously described in the section II, each pixel of our chip includes a photodiode and a processing unit dedicated to low level image processing based on neighborhoods. One of our main objectives focuses on the optimization of the pixel-level processors mapping in order to facilitate the access to the values of adjacent pixels. So, an original structure (depicted in Fig. 1(b)) was choosen. The major advantage of this structure is the minimization of the length of metal interconnection between adjacent pixels and the processing units, contributing 1) to a better fill factor and 2) to a higher framerate.

To achieve high-speed performance, one of the key elements is the photodiodes which should be designed and optimized carefully. Critical parameters in the design of photodiodes are the dark current and the spectral response [32] and the shape of photodiode layout, the structure of the photodiode and the layout have a significant influence on the performance of the whole imager [33], [34].

In our chip, photodiodes consist on N-type photodiodes based on a n<sup>+</sup>-type diffusion in a p-type silicon substrate. The depletion region is formed in the neighborhood of the photodiode cathode. Optically generated photocarriers diffuse to neighboring junctions [35]. We have analysed and tested three photodiodes shapes: the square photodiode classically used in litterature, the cross shape which is perfectly adapted to the optimized pixel-level processors mapping and finally, the octagonal shape based on 45° structures.



Fig. 7. (a) Square shape, (b) Cross shape, (c) Octagonal shape

The Fig. 7 illustrates these different photodiodes structures. For each of these shapes, the active area (displayed in grey dots) and the interelement isolation area with external connections (filled in grey) are represented. In the following of this paper, we use the term "Active layer surfaces" (Als) when talking about the active area of the photodiode and the term "Connection layers surfaces" (Cls) for the connections of the photodiodes. We can note that for each photodiode shape, the Cls has a the same width called  $\beta$  whereas each shape has its own dimension: the side a for the square photodiode, the side b for the cross photodiode and the side c of the internal square of the octagonal photodiode.

Based on these parameters, we can easily define the Cls and Als mathematical expressions by the following equations:

square shape: { 
$$Als = a^2$$
  $Cls = 4\beta(\beta + a)$  (9)

$$cross\ shape: \left\{ \begin{array}{ll} Als = 5b^2,\ b = \frac{a}{\sqrt{5}} \\ Cls = 12\beta(\beta + \frac{a}{\sqrt{5}}) \end{array} \right.$$
 (10)

octagonal shape : { 
$$Als = 7c^2, c = \frac{a}{\sqrt{7}}$$
$$Cls = 4\beta(1 + \sqrt{2})(\beta + \frac{a}{\sqrt{7}})$$
 (11)



Fig. 8. Photodiode Layout Rules

According to the Fig. 8, the design rules of the AMS-CMOS 0.35  $\mu$ m process lead to a minimal value of  $\beta$ =2.45 $\mu$ m. Starting from this result, we can plot comparative graphs of Cls for the three photodiodes shapes, as shown on the Fig. 9.



Fig. 9. Cls for the three different shapes expressed as a function of the side a of the square photodiode

In our design, we have fixed the fill factor (i.e. the ratio between the active area and the total pixel area) to 25% with a total pixel size of  $35\mu m \times 35\mu m$ . So, the values of Als and a can be easily infered:  $Als = 306.25\mu m^2$  and  $a = 17.5\mu m$ . From the Fig. 9, we can see (1) that the cross shape appears to be not realistic because of the important value of Cls ( $Cls = 295\mu m^2$ ) and (2) that the square and the octagonal shapes have similar values (respectively  $191\mu m^2$  and  $173\mu m^2$ ). Finally, the octagonal shape was choosen because of three main properties:

1) The surface dedicated to the interconnections is about 12% lower compared to a square shape,

- 2) The depletion region is more efficient at the edges of the photodiode,
- 3) This shape, based on  $45^{\circ}$  structures, is technologically realizable by the founder.

Experimental data and detailed characterization of the different photodiodes strenghten our choice. The spectral responses of the square shape and the octagonal shape are shown in Fig. 10. The measurement of spectral responses was performed by using an instrument of light generator with its wavelength from 400 nm to 1100 nm. The octagonal structure has better better performances than the square shape for all the wavelengthes.



Fig. 10. Spectral responses in the photodiode structures of type square, and type octagonal

From the above measurement results, the photodiode structure of type octagonal was choosen as photodetector of our chip. The figure 11 illustrates the arrangement of pixels and the computation of spatial gradients in this configuration, as described previously in this paper.



Fig. 11. (a) Array of pixel based on octagonal photodiodes, (b) Evaluation of spatial gradients

## B. Pixel-level Analog Memory, Amplifier and Multiplexer

To increase the algorithmic possibilities of the architecture, the key point is the separation of the acquisition of the light inside the photodiode and the readout of the stored value at pixel-level [36]. So, the storage element should keep the output voltage of the previous frames whereas the sensors integrates photocurrent of a new frame. So, for each pixel

of our chip, we have designed and implemented two specific circuits, including an analog memory, an amplifier and a multiplexor as shown in Fig. 13.

With these circuits called [AM]<sup>2</sup> (Analog Memory, Amplifier and Multiplexer), the capture sequence can be made in the first memory in parallel with a readout sequence and/or processing sequence of the previous image stored in the second memory, as shown in the Fig. 12.



Fig. 12. Parallelism between capture sequence and readout sequence

Such a strategy has several advantages:

- 1) The framerate can be increased (up to 2x) without reducing the exposure time.
- The image acquisition is decorrelated from image processing, implying that the architecture performances are always the highest, and the framerate of processing is maximum,
- 3) A new image is always available without spending any integration time.



Fig. 13. Schematic of the [AM]<sup>2</sup> structure

The chip operates at a single 3.3 V power supply. In each pixel, as shown in Fig. 13, the photosensor is a NMOS photodiode associated with a PMOS transistor reset, which represents the first stage of the acquisition circuit. The pixel array is held in a reset state until the "init" signal goes high. Then, the photodiode discharges according to incidental luminous flow. This signal is polarized around of  $V_{\rm DD}/2$ 

(*i.e.* the half power supply voltage). Behind this first stage of acquisition, two identical subcircuits take place. One of these subcircuits is selected when either the "store1" signal or the "store2" signal is turned on. Then, the associated analog switch is open allowing the capacitor to store the pixel value. Consequently, the  $C_{AM}$  capacitors are able to store, during the frame capture, the pixel values, either from the switch 1 or the switch 2. Each of the capacitors is followed by an inverter, polarized on  $V_{DD}/2$ . This inverter serves as an amplifier of the stored value. It provides a value which is proportional to the pixel incidental illumination. Finally, the readout of the stored values are activated by a last switch controled by the "read1" or "read2" signals.

# C. Pixel-level Analog Arithmetic Unit: A<sup>2</sup>U

The analog arithmetic unit (A<sup>2</sup>U) represents the central part of the pixel and includes four multipliers (called M1, M2, M3 and M4), as illustrated on the Fig. 14. The four multipliers are all interconnected with a diode-connected load (*i.e.*, a NMOS transistor with gate connected to drain). The operation result at the "node" point is a linear combination of the four adjacent pixels.



Fig. 14. The A<sup>2</sup>U structure

The Fig. 15 shows the experimental results of this multiplier structure with cosine signals as inputs:

$$coe f_i = A.cos(2\pi f_1) \text{ with } f_1 = 2.5kHz$$
 (12)

$$V_i = B.\cos(2\pi f_2) \text{ with } f_2 = 20kHz$$
 (13)

In this case, the output Node value can be written as following:

$$Node = \frac{A.B}{2} \left[ \cos(2\pi (f_2 - f_1)) + \cos(2\pi (f_2 + f_1)) \right]$$
 (14)

The signal's spectrum, represented on the figure 15(b) contains two main frequencies (17.5 kHz and 22.5 kHz) around the carrier frequency. The residues which appear in the spectrum are known as "inter-modulations products". They are mainly due to the nonlinearity of the structure (around 10 kHz and 30 kHz) and the defects input pads insulation (at 40 kHz). However, the amplitude of these inter modulations products are significantly lower than the two main frequencies.



(b) Frequency spectrum of the result of multiplication

Fig. 15. Benchmark of the four-quadrant multiplier

Furthermore, in order to obtain the best linearity of the multiplier, the amplitude of the signal  $V_i$  has been limited to a range of 0.6-2.6 V in our benchmarks. In the real chip, the signal  $V_i$  corresponds to the voltage coming from the pixel and can be easily included in this range.

# V. EXPERIMENTAL RESULTS

The layout of a 2x2 pixel block is depicted on the Fig. 16. This layout is symmetrically built in order to reduce the fixed pattern noise among the four pixels and to ensure uniform spatial sampling.

An experimental  $64 \times 64$  pixel image sensor has been developed in a  $0.35 \mu m$ , 3.3 V, standard CMOS process with poly-poly capacitors. This prototype has been sent to foundry at the beginning of 2006 and was available at the end of the third quarter of the year.



Fig. 16. Layout of four pixels

The Fig. 17 describes the experimental results of successive acquisitions and signal processing in a individual pixel. Each acquisition occurs when one of the two signals "read 1" or "read 2" goes high. For each of these acquisitions, various levels of illumination are applied. The two outputs ("out 1" and "out 2") give a voltage corresponding to the incidental illumination on the pixels. The calibration of the structure is ensured by the biasing (Vbias = 1,35V). Moreover, in this characterization, the output called "node" computes the difference between "out 1" and "out2". For this purpose, the coefficients are fixed at the following values:  $coef1 = -coef2 = V_{DD}$  and  $coef3 = coef4 = V_{DD}/2$ .



Fig. 17. High speed sequence capture with basic image processing

MOS transistors operate in sub-threshold region. There is no energy spent for transferring information from one level of processing to another level. According to the experimental results, the voltage gain of the amplifier stage of the two  $[AM]^2$  is Av = 12 and the disparities on the output levels





Fig. 18. (a) Block diagram of the hardware platform, (b) Prototyping embedded platform including FPGA board, interface ADC and CMOS sensor

are about 4.3 %.

The hardware part of the imaging system contains a one million Gates Spartan-3 FPGA board with 32MB SDRAM embedded. This FPGA board is the XSA-3S1000 from XESS Corporation. An interface acquisition circuit includes three ADC AD9048, high speed amplifiers LM6171 and others elements. The figure 18 shows the schematic and some pictures of the experimental platform.

Table II summarizes the imaging system characterization results.

TABLE II
CHIP MEASUREMENTS

| Conversion gain                        | $14 \ \mu\text{V/e}^- \text{ rms}$ |
|----------------------------------------|------------------------------------|
| Sensitivity                            | 0.15 V/lux.s                       |
| Fixed Pattern Noise retina (FPN), dark | 225 $\mu$ V rms                    |
| Thermal reset noise                    | 68 μV rms                          |
| Output levels disparities              | 4.3%                               |
| Voltage gain of the amplifier stage    | 12                                 |
| Linear flux                            | 98.5%                              |

Fig. 19 shows experimental image results. First, the Fig. 19(a) shows an image acquired from a 10 000 frames/s (integration time of 1 ms). Except amplification of the photodiodes signal, no other processing is performed on this raw image. Fig. 19(b) to Fig 19(d) show different images with pixel-level image processing at frame rate of about 5 000 frames/s. From left to right, horizontal and vertical Sobel filter and Laplacian operator images are displayed.

## VI. CONCLUSION

An experimental pixel sensor implemented in a standard digital CMOS  $0.35\mu m$  process was described. Each  $35\mu m \times 35\mu m$  pixel contains 38 transistors implementing a circuit with photo-current integration, two [AM]<sup>2</sup> (Analog Memory, Amplifier and Multiplexer), and a A<sup>2</sup>U (Analog Arithmetic Unit).

Experimental chip reveals that raw images acquisition at  $10\,000$  frames per second can be easily achieved using the parallel  $A^2U$  implemented at pixel level. With basic image processing, the maximal frame rate slows to reach about  $5\,000$  fps.

The next step in our research will be the design of a similat circuit in a modern 130nm CMOS technology. The main objective will be to design a pixel of less than 10  $\mu$ m x 10  $\mu$ m. In the same time, we will focus on the development of a fast analog to digital converter (ADC). The integration of this ADC on future chips will allow us to provide new and sophisticated vision systems on chip dedicated to digital embedded image processing at thousands of frames per second.

#### ACKNOWLEDGMENT

The authors would like to thank...

### REFERENCES

- [1] E. Fossum, "Active pixel sensors: Are CCDs dinosaurs?" *International Society for Optical Engineering (SPIE)*, vol. 1900, pp. 2–14, 1993.
- [2] ——, "CMOS Image Sensor: Electronic Camera On A CHIP," *IEEE Transactions on Electron Devices*, vol. 44, no. 10, pp. 1689–1698, October 1997.
- [3] P. Seitz, "Solid-State Image Sensing," Handbook of computer Vision and Applications, vol. 1, pp. 165–222, 2000.
- [4] D. Litwiller, "CCD vs. CMOS: Facts and Fiction," *Photonics Spectra*, pp. 154–158, January 2001.
- [5] M. Loinaz, K. Singh, A. Blanksby, D. Inglis, K. Azadet, and B. Ackland, "A 200mv 3.3v CMOS Color Camera IC Producing 352 × 288 24-b Video at 30 Frames/s," *IEEE Journal of Solid-State Circuits*, vol. 33, no. 12, pp. 2092–2103, 1998.
- [6] S. Smith, J. Hurwitz, M. Torrie, D. Baxter, A. Holmes, M. Panaghiston, R. Henderson, A. Murrayn, S. Anderson, and P. Denyer, "A singlechip 306x244-pixel CMOS NTSC video camera," in *In ISSCC Digest* of technical papers, San Fransisco, CA, 1998, pp. 170–171.
- [7] A. El Gamal, D. Yang, and B. Fowler, "Pixel level processing Why, What and How?" in *Proceedings of the SPIE Electronic Imaging '99 conference*, vol. 3650, January 1999, pp. 2–13.
- [8] O. Yadid-Pecht and A. Belenky, "In-Pixel Autoexposure CMOS APS," IEEE Journal of Solid-State Circuits, vol. 38, no. 8, pp. 1425–1428, August 2003.









Fig. 19. (a) Raw image at 10 000 fps (b) Output Sobel horizontal image, (c) Output Sobel vertical image, (d) Output Laplacian image

- [9] P. Acosta-Serafini, M. Ichiro, and C. Sodini, "A 1/3" VGA Linear Wide Dynamic Range CMOS Image Sensor Implementing a Predictive Multiple Sampling Algorithm With Overlapping Integration Intervals," *IEEE Journal of Solid-State Circuits*, vol. 39, no. 9, pp. 1487–1496, September 2004.
- [10] L. Kozlowski, G. Rossi, L. Blanquart, R. Marchesini, Y. Huang, G. Chow, J. Richardson, and D. Standley, "Pixel Noise Suppression via SoC Management of Target Reset in a 1920 × 1080 CMOS Image Sensor," *IEEE Journal of Solid-State Circuits*, vol. 40, no. 12, pp. 2766– 2776, December 2005.
- [11] M. Sakakibara, S. Kawahito, D. Handoko, N. Nakamura, M. Higashi, K. Mabuchi, and H. Sumi, "A High-Sensitivity CMOS Image Sensor With Gain-Adaptative Column Amplifiers," *IEEE Journal of Solid-State Circuits*, vol. 40, no. 5, pp. 1147–1156, May 2005.
- [12] A. Krymsky and T. Niarong, "A 9-V/Lux 5000-Frames/s 512 x 512 CMOS Sensor," *IEEE Transactions on Electron Devices*, vol. 50, no. 1, pp. 136–143, January 2003.
- [13] G. Cembrano, A. Rodriguez-Vazquez, R. Galan, F. Jimenez-Garrido, S. Espejo, and R. Dominguez-Castro, "A 1000 FPS at 128 × 128 Vision Processor With 8-Bit Digitized I/O," *IEEE Journal of Solid-State Circuits*, vol. 39, no. 7, pp. 1044–1055, July 2004.
- [14] L. Lindgren, J. Melander, R. Johansson, and B. Mller, "A Multiresolution 100-GOPS 4-Gpixels/s Programmable Smart Vision Sensor for Multisense Imaging," *IEEE Journal of Solid-State Circuits*, vol. 40, no. 6, pp. 1350–1359, June 2005.
- [15] Y. Sugiyama, M. Takumi, H. Toyoda, N. Mukozaka, A. Ihori, T. kurashina, Y. Nakamura, T. Tonbe, and S. Mizuno, "A High-Speed CMOS Image With Profile Data Acquiring Function," *IEEE Journal of Solid-State Circuits*, vol. 40, pp. 2816–2823, 2005.
- [16] D. Handoko, K. S, Y. Takokoro, M. Kumahara, and A. Matsuzawa, "A CMOS image sensor for local-plane motion vector estimation," in Symposium of VLSI Circuits, vol. 3650, June 2000, pp. 28–29.
- [17] S. Lim and A. El Gamal, "Integrating Image Capture and Processing – Beyond Single Chip Digital Camera," in *Proceedings of the SPIE Electronic Imaging '2001 conference*, vol. 4306, San Jose, CA, january 2001.
- [18] X. Liu and A. El Gamal, "Photocurrent estimation from multiple nondestructive samples in a CMOS image sensor," in *Proceedings of the* SPIE Electronic Imaging '2001 conference, vol. 4306, San Jose, CA, january 2001.
- [19] D. Yang, A. El Gamal, B. Fowler, and H. Tian, "A 640 x 512 CMOS Image Sensor with Ultra Wide Dynamix Range Floating-Point Pixel-Level ADC," *IEEE Journal of Solid-State Circuits*, vol. 34, pp. 1821– 1834, December 1999.
- [20] O. Yadid-Pecht and E. Fossum, "CMOS APS with autoscaling and customized wide dynamic range," in *IEEE Workshop on Charge-Coupled Devices and Advanced Image Sensors*, vol. 3650, June 1999, pp. 48–51.
- [21] D. Stoppa, A. Somoni, L. Gonzo, M. Gottardi, and G.-F. Dalla Betta, "Novel CMOS Image Sensor With a 132-dB Dynamic Range," *IEEE Journal of Solid-State Circuits*, vol. 37, no. 12, pp. 1846–1852, December 2002.
- [22] X. Liu and A. El Gamal, "Simultaneous image formation and motion blur restoration via multiple capture," in *IEEE International Conference* on Acoustics, Speech and Signal Processing, vol. 3, 2001, pp. 1841– 1844.
- [23] C.-Y. Wu and C.-T. Chiang, "A Low-Photocurrent CMOS Retinal Focal-Plane Sensor With a Pseudo-BJT Smoothing Network and an

- Adaptative Current Schmitt Trigger for Scanner Applications," *IEEE Sensors Journal*, vol. 4, no. 4, pp. 510–518, August 2004.
- [24] D. Yang, B. Fowler, and A. El Gamal, "A Nyquist-Rate Pixel-Level ADC for CMOS Image Sensors," *IEEE Journal of Solid-State Circuits*, vol. 34, no. 3, pp. 348–356, March 1999.
- [25] S. Kleinfelder, S. Lim, X. Liu, and A. El Gamal, "A 10 000 Frames/s CMOS Digital Pixel Sensor," *IEEE Journal of Solid-State Circuits*, vol. 36, no. 12, pp. 2049–2059, December 2001.
- [26] A. Harton, M. Ahmed, A. Beuhler, F. Castro, L. Dawson, B. Herold, G. Kujawa, K. Lee, R. Mareachen, and T. Scaminaci, "High dynamic range CMOS image sensor with pixel level ADC and in-situ image enhancement," in Sensors and Camera Systems for Scientific and Industrial Applications VI. Proceedings of the SPIE, vol. 5677, Mar 2005, pp. 67–77.
- [27] Y. Chi, U. Mallik, E. Choi, M. Clapp, G. Gauwenberghs, and R. Etienne-Cummings, "Cmos pixel-level adc with change detection," in *Proceedings of the International Symposium on Circuits and Systems (ISCAS)*, May 2006, pp. 1647–1650.
- [28] O. Yadid-Pecht, B. Pain, C. Staller, C. Clark, and E. Fossum, "CMOS Active Pixel Sensor Star Tracker with Regional Electronic Shutter," *IEEE Journal of Solid-State Circuits*, vol. 32, no. 2, pp. 285–288, February 1997.
- [29] M. Barbaro, P. Burgi, A. Mortara, P. Nussbaum, and F. Heitge, "A 100x100 Pixel silicon retina for gradient extraction with steering filter capabilities and temporal output coding," *IEEE Journal of Solid-State Circuits*, vol. 37, no. 2, pp. 160–172, February 2002.
- [30] C. Ryan, "Applications of a four-quadrant multiplier," *IEEE Journal of Solid-State Circuits*, vol. 5, no. 1, pp. 45–48, Feb 1970.
- [31] S. Liu and Y. Hwang, "CMOS Squarer and Four-Quadrant Multiplier," *IEEE Transactions on Circuits and Systems-I:Fundamental Theory and Applications*, vol. 42, no. 2, pp. 119–122, Feb 1995.
- [32] C. Wu, Y. Shih, J. Lan, C. Hsieh, C. Huang, and J. Lu, "Design, optimization, and performance analysis of new photodiode structures for CMOS active-pixel-sensor (APS) imager applications," *IEEE Sensors Journal*, vol. 4, no. 1, pp. 135–144, February 2004.
- [33] I. Shcherback, A. Belenky, and O. Yadid-Pecht, "Empirical dark current modeling for complementary metal oxide semiconductor active pixel sensor," *Optical Engineering*, vol. 41, no. 6, pp. 1216–1219, June 2002.
- [34] I. Shcherback and O. Yadid-Pecht, "Photoresponse analysis and pixel shape optimization for CMOS active pixel sensors," *IEEE Transactions* on *Electron Devices*, vol. 50, no. 1, pp. 12–18, January 2003.
- [35] J. Lee and R. Hornsey, "CMOS Photodiodes with Substrate Openings for Higher Conversion Gain in Active Pixel Sensor," in *IEEE Workshop* on CCDs and Advanced Image Sensors, Crystal Bay, Nevada, June 2001.
- [36] G. Chapinal, S. Bota, M. Moreno, J. Palacin, and A. Herms, "A 128 × 128 CMOS Image Sensor With Analog Memory for Synchronous Image Capture," *IEEE Sensors Journal*, vol. 2, no. 2, pp. 120–127, April 2002.



Jérôme Dubois is a Normalien of the. 2001 promotion. He obtained a competitive examination, in Electrical Engineering, for post on the teaching staff of first cycle universities, in July 2004. He receive Master Degree, in Image Processing, in June 2005. He is currently Ph.D student and Instructorship at Laboratory LE2I and University of Burgundy. His research interests include the design, development implementation, and testing of silicon retinas for multi-processing and high speed image sensor.

Dominique Ginhac Biography text here.

 $\label{thm:michel Paindavoine} \textbf{Michel Paindavoine} \ \operatorname{Biography} \ \operatorname{text} \ \operatorname{here}.$