[논문 리뷰] SpectralDefense: Detecting Adversarial Attacks on CNNs in the Fourier Domain

Abstract

Adversarial defenses 는 두가지 카테고리로 분류가 가능한데, 하나는 네크워크의 아키텍쳐나 학습 방법을 바꾸는 것 이고, 다른 하나는 이미지 전처리를 하는 것 이다.

네크워크 아키텍쳐 변경은 adv.training이 주로 이용되는데 방대한 양의 adv.sample로 network를 학습시키는 작업이 필요하다.

이미지 전처리에는 예를 들어 JPEG compression이나 이미지에 노이즈 첨가, 또는 단순히 adv.exp detection

adv.exp detection은 PCA기반 또는 통계적인 방법 사용 => 근데 단순한 데이터셋에서만 effective 한 빈도 높음 (MNIST)

adv/non-adv 구분 위해 추가적인 네트워크 사용하는 연구도 있지만, adaptive attack을 하면 많이 실패함 (Carlini and Wagner[5], Tramer et al. [10])

이 연구에서는, Fourier domain representation of an image or its feature maps 을 이용하여 이미지를 분류

방법 1

input images에만 의존하는 (네트워크 이용 x) 방법으로 Magnitude Fourier Spectrum (MFS) 이용

푸리에 도메인에서는 AA가 어떤 impact가 있는 것 인지 분석, phase and the magnitude spectrum 에서의 차이도 분석

이 방법은 FGSM, PGD, and BIM 에서는 성공적이였지만, Deepfool and C&W 에서는 별로

방법 2

그래서 response of the network (feature map)이용하는 방법 사용

Fourier spectrum of feature maps 이용 과정에서 magnitude Fourier spectrum (MFS) and the phase Fourier spectrum (PFS) 을 이용했는데, PFS performs very well on all five attack methods.

Fourier transformation은 two-dimensional discrete Fourier transformation (DFT) 이용

Both method 에 train a standard Logistic Regression : adv/non-adv 구분 위해

two state-of-the-art detectors : Local Intrinsic Dimensionality (LID) [19] and the Mahalanobis Distance (M-D) [20]

RELATED WORK

Adversarial Detection

Grosse et al. [24] used the statistical test of maximum mean discrepancy to detect adversarial samples. Using the correlation between helpful images based on influence functions and the k-nearest neighbors in the embedding space of the DNN

Two strong and popular detectors are the Local Intrinsic Dimensionality (LID) [19] and the Mahalanobis Distance (M-D)

DATA GENERATION

Attack Methods

1) Fast Gradient Method (FGSM)

2) Basic Iterative Method (BIM, I-FGSM)

3) Projected Gradient Descent (PGD)

4) Deepfool

5) Carlini&Wagner Method

Data Pipeline

As stated by Carlini & Wagner [5], detection methods should not only be evaluated on MNIST but more complex datasets, therefore we evaluate on CIFAR-10 and CIFAR100.

위 Fig.2 에서 successfully attacked 된 비율

adversarial example 엡실론 변화(x축)에 따른 detection 성공 rate(y축)

FOURIER BASED DETECTION

Fourier Analysis

Fourier Features of Input Images

applied DFT independently on each of the three color channels

flatten this array into a 3027-dimensional vector(32 x 32 x 3)

table 2는 binary classifier: Logistic Regression (LR), K-Nearest Neighbors (KNN), Gaussian Naive Bayes (GNB), Decision Tree (DT), Random Forest (RF), and Support Vector Machine (SVM) 이용해서 성능 평가

가장 성능이 높은 LR에 대해서 각각 공격기법들의 성공률 평가

Table V and Table VI we observe for both methods that by only looking at the lowest or highest 25% frequencies the performance is low. On the other hand, considering only one of the mid-frequency bands we already achieve a very good result. This confirms the observation that adversarial attacks are not a high-frequency issue [27], but rather a mid-frequency issue.

그니까 0부터 8까지, 0부터 16까지 이런식으로 구간별로 나눠서 detection성능을 확인한 것인데 mid frequency가 있을때 성능이 높다

Fourier Features of Layers

이제 이 이미지에서 DFT하던걸 feature 단에서 다 해봤는데, 각자 잘하는 feature가 있었다!

MFS and PFS 성능 비슷

저작자표시 비영리 변경금지

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

arXiv.org