Develop a lensless optoelectronic neural network architecture to improve the privacy protection capability of end-to-end face recognition

  ”Lensless sensors can output unvisualized images, with high privacy, high security and other characteristics. Combined with the photoelectric intelligent neural network supporting it, it will play an important supporting role in large-scale or even ultra-large-scale intelligent perception applications, and then It can be used in smart security, smart home, autonomous driving and other next-generation terminals. The reviewer said that this work will bring great progress to optical neural networks.” said Professor Chen Hongwei from the Department of Electronic Engineering of Tsinghua University.

Chen Hongwei

Lensless sensor module

  Recently, the research group proposed a lensless optoelectronic neural network architecture. This is a subject that has been oriented to industrial and edge visual perception since the beginning of the research, and has strong applicability. The lensless optoelectronic neural network architecture can be used for machine vision tasks, using a passive optical reticle inserted in the imaging optical path to perform convolution operations in the optical domain, thereby solving the problems caused by incoherent light sources and broadband optical signal processing in natural scenes. challenge.

  Taking the handwritten digit recognition task as an example to verify the performance of the optical convolution in this architecture, the handwritten digit recognition accuracy using a single-core reticle can reach 93.47%. By arranging multiple cores in parallel on the reticle to implement single-layer multi-channel convolution operations, the recognition accuracy can be improved to 97.21%. Compared with traditional machine vision links, it can save about 50% of energy consumption.
  With the help of this achievement, the optoelectronic hybrid neural network calculation in natural light scenes can be realized. For specific tasks, the optoelectronic full link can be jointly optimized, and the volume and power consumption can be extremely simplified, making it easy to deploy in edge devices. In addition, the passive optical reticle can not only perform convolution operations on the incident optical scene, but also perform natural image encryption, thereby forming an aliased image of light scene information that is unrecognizable to the human eye, which can be used in privacy-preserving applications. in various visual task scenarios.
  Compared with the traditional encryption and decryption method, this method omits the steps of decrypting the visual image on the sensor side and the server side, realizes the privacy protection of the whole process from optical acquisition to the completion of the visual task, and fundamentally guarantees the privacy of sensor imaging. When in use, a sensor chip module with a size of less than one centimeter is used to directly photograph the face, and the device can return the identity information of the person. Similarly, the recognition of various features such as faces, movements, expressions, postures, and QR codes can also be completed.
Making optically lensless coding truly work for the job

  Vision is an important way for humans to perceive the world, and many devices have also been developed to collect visual information, such as cameras and video cameras. Generally speaking, in the camera architecture, a precise lens system, coupled with the back-end photosensitive chip and data unit, completes data processing and finally obtains an image.
  In recent years, people’s understanding of imaging has deepened, and a variety of imaging modalities have been developed to obtain more image information, such as polarization, depth, and spectrum. The acquisition of this information can be accomplished by the computational imaging method of “optical encoding + computational decoding”.
  Compared with the “what you see is what you get” mode of traditional lens imaging, computational imaging adopts the method of “first encoding, then imaging, and finally demodulation”, encoding the image in the optical domain, and decoding the back-end algorithm to obtain a higher quality image. . It can be said that computational imaging has fundamentally changed the imaging method, liberating imaging from the heavy lens system, and taking an important step in the miniaturization of imaging systems.
  In recent years, computational imaging has gained more applications, such as face recognition, pedestrian detection, etc. The rise of various intelligent edge devices has put forward higher requirements for the miniaturization of imaging systems. Thus, lensless imaging was born. Its appearance greatly reduces the volume and weight of the entire system and provides more convenience for equipment deployment.
  An intelligent visual perception system includes an optical end imaging system, a photoelectric image detection chip, an image processing chip, and a computing unit that performs visual tasks. The lensless imaging system uses a more excellent optical encoding device to obtain better imaging results. The chip side uses devices with lower power consumption and higher sensitivity, so it can perform more accurate detection; on the algorithm side, with a larger data set and higher computing power, more accurate classification and identification results can be obtained. .
  Since each link is independently optimized, there will be a lot of redundancy in the integrated architecture of sensing, storage and computing. For example, the lensless system is only used for imaging in the entire architecture, and the evaluation index of its imaging quality is also formulated for the human eye. Machine vision is different from the human eye, and graphics that can be understood by the human eye may not be the best for machines.
  Based on this, the team came up with the following idea: Why not directly face the task of machine vision, skip the imaging link, optimize the optical path and various parts in cascade, so that the optical lensless coding can truly serve the task, instead of pursuing human beings What does it look like to the eye? This is the purpose of this research.

For machine vision tasks, skip imaging

  Chen Hongwei said that the above ideas are similar to optical computing. If computing is done on light, and the computing itself is directly driven by task performance, then the entire system is the full-link cascade optimization he conceived. Taking the most common neural network as an example, there are currently many all-optical neural networks that can complete visual tasks, such as diffractive neural networks.
  Although the optical operations are parallelized, the computing speed is greatly improved, and there is no additional power consumption. However, due to the limitation of the principle of diffractive neural network, only coherent monochromatic laser can be used as the light source. Although the calculation can be done on light, it cannot be applied in natural light scenes.
  Other technologies based on the optoelectronic hybrid neural network also need to be equipped with a lens system to complete the computing function, which is also contrary to the miniaturization requirements of edge devices. So, the team began to think: Can a lensless optoelectronic hybrid neural network architecture be built so that it can not only complete the neural network function under natural light, but also meet the requirements of miniaturization of edge devices?

  Based on the above concept and using geometric optics theory, the research group explored a method to complete the optical domain convolution operation through a lensless mask under incoherent light, and realized low-cost optical feature extraction and optical encryption. The verification is completed under the two system architectures of image classification and face encryption recognition. What’s more, the system can work in natural light environment and can be perfectly integrated with existing machine vision systems.

When the face information passes through the lensless system, the corresponding ID can be directly obtained

  In today’s data explosion, people pay more and more attention to private information such as faces, and various information encryption methods have emerged. Compared with the encryption calculated on the electricity, the power consumption of the parallel encryption operation on the optical is a very important advantage.
  Therefore, the scenario realized by the team is: when the face information passes through a lensless system, the ID corresponding to the face can be directly obtained. The entire link achieves end-to-end privacy protection, and what the chip detects is only an image that the human eye cannot understand at all. But for the machine, it can easily identify the identity of the person. In addition, the front-end optical encoding and chip-side are only centimeter-sized and can be easily integrated into various edge systems.
  On May 4, 2022, a related paper with the title “LOEN: Lensless Optoelectronic Neural Networks for Enhanced Machine Vision” was published in the top optics journal published by Nature.
  For this achievement, the reviewers commented: “The authors propose a lensless optoelectronic neural network for machine vision tasks. The method uses a lensless mask to convolve the optical image and use subsequent electrical hardware for recognition. The The method reduces the requirement of optical coherence for traditional optical neural networks and can be used in natural scenes. Optical encryption can also be used for facial recognition for privacy protection. This idea is innovative and has been comprehensive theoretical analysis and reliable experiments verify.”
Machine Vision Tasks with Lensless Opto-Neural Network Architectures

  According to reports, the research group led by Chen Hongwei has conducted in-depth research on key devices and technologies in the fields of optoelectronic signal processing, integrated microwave photonics, photonic intelligence, and optoelectronic systems.

Lensless sensor module

  In fact, the visual perception of privacy protection was gradually established after the research was carried out. Chen Hongwei and team members found in the experiment that the convolution in the optical domain naturally has the characteristics of image blur, so they discussed on a whim: Is blurred image a good thing or a bad thing? After investigating the current demand for face data privacy protection in the industry and the existing means, he believes that the privacy protection function of the system will have great application prospects after the project is implemented.
  In recent years, the advantages of parallelism, high speed and low loss of light have been gradually paid attention to. Optical computing is considered to be able to break through the bottleneck of electronic computing, and optical neural networks have gradually entered the attention of academia and industry. Some research teams at home and abroad have carried out research on optical neural networks. In China, the laboratory is also an early team to carry out optical computing.
  Chen Hongwei is concerned that the current optical computing is still far from practical, and many key issues have not been well resolved, such as integration, miniaturization, low power consumption and ease of use. Since most of the optical neural network calculations require the use of coherent lasers as light sources, or optoelectronic hybrid neural networks based on large-sized lens groups, it is difficult to use in machine vision tasks in natural lighting scenarios, so it cannot be deployed in autonomous driving. , robots and other IoT peripherals.
  Combined with the development needs of an intelligent society, the team proposed this project, which is to use a lensless optoelectronic neural network architecture to complete machine vision tasks. At present, the team has realized the convolution operation in the optical domain, and the follow-up plan is to make more in-depth explorations in optical computing and cascade optimization of the entire link.
  First, they will try to put more calculations in the optical domain to further reduce the power consumption and calculation amount of the entire system. Specifically, the research group intends to try to integrate nonlinear materials into existing systems to achieve stable nonlinear operations under natural light. After completing the addition of nonlinear operations, the single-layer convolutional layer can be expanded into multiple layers, and more calculations can be completed by using the parallelism on the light, which can not only make the calculation results more accurate, but also further reduce the power consumption and calculation amount.
  At the same time, the team found in previous research that the image signal processing process corresponding to the current mature image sensor is a relatively black-box process. The final vision task.
  Therefore, the team plans to carry out task-oriented modeling of the image signal processing process to remove unnecessary processing links, and to redefine the evaluation indicators of parameters of each link for the task, so that the optical side, the chip side and the algorithm side are more complex. They are tightly cascaded together to optimize the complete link, ultimately maximizing power savings and performance gains.
  In the future, the team will continue to explore on the road of industrialization, complete the packaging of modules, and further improve the performance of visual tasks on the basis of existing systems. At the same time, it will also actively carry out school-enterprise cooperation to overcome various difficulties in the industrialization of engineering, implement the system and actually use it in intelligent equipment, so as to complete the transformation and upgrade of the existing machine vision system.