ICLR 2022 Spotlight | MSU combined with MIT-IBM proposed the first black box defense framework

Author:Data School Thu Time:2022.09.23

Source: Heart of the machine

This article is about 2600 words, it is recommended to read for 6 minutes

This article introduces a research paper on black box defense work. The code and model have been open source and have been accepted by ICLR 2022 as Spotlight Paper.

Today, an article about the black box defense work of Michigan State University and Mit-IBM AI laboratory is introduced today. This article has been received by ICLR 2022 as Spotlight Paper. The code and model are open source.

not

Thesis address: https://openreView.net/forum? ID = w9g_imphlqd

Project address: https://github.com/damon-demon/black- box-deFense

Trustworthy ml initiative: https://www.trustworthyml.org/home

zoom online address: https://us02web.zoom.us/83664690773? pwd = wljoqzjdy0lhvm0rvjnsazhddz0909

not

1. Background

Machine learning models, especially deep neural networks, have outstanding performance in various prediction tasks, but these models are usually lacking Robustness. For example, adding some people's eyes that cannot be perceived on the input (Adversarial Perturnion) can cause neural network misjudgment. At present, there are many work research against attacks (Adversarial Attack), and successfully applied to different application scenarios, such as: picture classification, object recognition, picture reconstruction. Victim Model (Victim Model) can be divided into white box models (all model information can be obtained by attackers), and black box models (model information cannot be known).

In view of the popularity of fighting attacks, how to improve model robustness and not be affected by attacks have become the current research focus. ADVERSARIAL Training is one of the most effective ways. It is proposed by a variety of Empirical Defense methods. Another type of defense is certified defense. Unlike empirical defense, it can provide the theoretical guarantee of successful defense within a certain disturbance intensity, that is, within a certain disturbance intensity, empiricalist defense may confront new confrontation against new confrontation The attack fails, but the certification defense will not fail. In addition, although the field of Adversarial Defens has greatly developed, almost all defense is only aimed at the white box model, but in practical applications, this assumption of the white box model will limit its application. For example, the public of the model parameters will lead to the leakage of training data, which will affect user privacy. White box defense can indeed use multiple structures of different structures (Surrogate Model) instead of black box models for confrontation learning. However, in some areas (such as: medical field), there are no training models for defense for a task. Therefore, this article raises a question:

Is it possible to design a defense method for black boxes? (During the training process, only the input and model output use it as the training data)

2. Question explanation

Randomized Smoolhing (RS) uses pictures with random Gaussian noise for target model training, and the noise reduction Smoothing (DS) only adds a drop in front of it without changing the target model parameters. The noiseer (Denoiser) only updates the parameters of the noise reduction during training, and finally make the noise cancer and the target model as a whole. Random smoothness and noise reduction are certified defense, but for the application scenario of black box defense, noise reduction is more suitable. Therefore, this article further builds the black box defense framework based on the smooth drop of noise. The target model is a black box, which will be interrupted by the process of spreading Backpropagation, BP), and then gradient (Gradient) cannot be obtained by reverse propagation. Therefore, the problem that needs to be solved is how to estimate the training gradient of the noise reduction to update the parameters.

3. Method

First-order optimization (FO) requires gradient to be required, and zero-order optimization (ZO) does not need. Zero -order optimization will estimate the gradient through the difference between the function output.

Random Gradient Estimation (RGE) is a random variable of the same shape to the original input, and the gradient estimation is performed through the difference between the output and the original output, as shown below. Among them, variables,

Smoothing parameter,

For Q random variables. The random gradient estimation is unstable, and the Q value needs to be increased to enhance its stability, and the amount of computing will be doubled. Another method is the coordinated gradient estimation (CGE), which only changes the value of the element in one position at a time, and finds its corresponding gradient, repeatedly perform D times, as in the formula below. Although the estimation of the coordinate gradient is more stable, when the dimension D of the variable is large, the calculation volume will be unacceptable. This is why zero -level optimization is currently only used to fight against attacks, because the dimension of confrontation disturbance is a similar dimension, and the dimension of the model parameter is far greater than the dimension of the picture. Obviously, the use of zero -level optimization update model parameters is not unreasonable for black box defense.

Utilizing the chain rule, the gradient solution of the noise reduction parameter can be decomposed into two parts, such as the formula below. Furthermore, you only need to estimate the gradient of the output of the noise cancel device. However, the dimension of the output of the noise cancel device is equivalent to the dimensions of the picture, and it is still impossible to use the coordinate gradient estimation.

not

FO-DS and ZO-DS are the first-order optimized versions and zero-level optimization versions with smooth noise reduction. As shown in the table below, the use of random gradient estimates can not obtain the ideal effect, and there is a significant gap compared to the first -order optimization results.

Obviously, in order to use more stable and accurate coordinate gradient estimates, it is necessary to further reduce the dimension of the target variable. As shown in the figure below, this article inserts a pre -trained selfoencoder (AE) in the middle of the noise reduction and black box model. A self -encoder consists of an encoder and a decoder. The encoder and the noise cancel device are classified as the white box module. The parameters will be updated during the training process, and the decoder and the black box model will be regarded as a black box as a whole. The parameters will not be updated during the training process. This network framework is called, Zo Autoencoder-Based Ds (ZO-AE-DS). Under this black box defense framework, the dimension of the white box module output is greatly compressed, so that the coordinate gradient can be used.

4. Test

During the test phase, this article was evaluated on the effect of picture classification tasks on CIFAR-10, STL-10 and RESTRICTED ImageNet (R-IMAGNET) datasets. The evaluation indicators used are standard accuracy (SA) and different

Certificate accuracy under the radius (CA). It is worth noting that

When the radius is 0, the accuracy of the standard is equal to the accuracy of the certification. In addition, this article expands the ZO-AE-DS black box defense framework to the picture reconstruction task, and has also achieved the ideal results. The noise cancel device in this article chose the same DNCNN as smooth as noise reduction. The abbreviations in the experiment form are shown below. not

not

The table and chart below are experiments on the CIFAR-10 dataset.

First of all, in the case of different Q values, the effects of ZO-AE-DS are far beyond the ZO-DS.

Second, the best effect in the zero-order optimization method using CGE's ZO-AE-DS, and even achieved a better effect than FO-DS, which is due to the introduction of the self-edited code. ZO-AE-DS black box defense framework solves the problem that zero-level optimization cannot use under high-dimensional variables.

Third, it can be seen that the best effect of using the first -order optimization to directly update the target network parameters has achieved the best results, but this is inevitable.

not

Below is the result of extending ZO-AE-DS to the image reconstruction task and on the MNIST dataset. It can be seen that when applying the ZO-AE-DS black box defense framework to the image reconstruction task, it can still achieve similar effects to the FO-DS, which proves the effectiveness and scalability of the ZO-AE-DS black box defense framework.

not

5. Summary and discussion

This article mainly studies how to perform black box defense when the input and output of only target models.In order to solve the problem of black box defense, this article combines smooth noise reduction with zero-level optimization, and proposed an effective and scalable ZO-AE-DS black box defense framework.The gap between zero -level optimization and first -order optimization performance.Author: Zhang Yimeng, Michigan State Optml Lab, computer doctor is reading, research interests ⽅ to include AI security, 3D/2D computer vision, multi -modal, and model compression.

Edit: Wang Jing

- END -