Welcome to Perceptron Robustness Benchmark’s page!¶
Perceptron is a benchmark to test safety and security properties of neural networks for perceptual tasks.
It comes with support for many frameworks to build models including
- TensorFlow
- PyTorch
- Keras
- Cloud API
- PaddlePaddle (In progress)
See currently supported evaluation metrics, models, adversarial criteria, and verification methods in Summary.
See current Leaderboard.
Overview¶
perceptron
benchmark improves upon the existing adversarial
toolbox such as cleverhans
, foolbox
, IBM ART
,
advbox
in three important aspects:
- Consistent API design that enables easy evaluation of models across different deep learning frameworks, computer vision tasks, and adversarial criterions.
- Standardized metric design that enables DNN models’ robustness to be compared on a large collection of security and safety properties.
- Gives verifiable robustness bounds for security and safety properties.
More specifically, we compare perceptron
with existing DNN benchmarks
in the following table:
Properties | Perceptron | Cleverhans | Foolbox | IBM ART |
---|---|---|---|---|
Multi-platform support | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) |
Consistent API design | \(\checkmark\) | \(\cdot\) | \(\checkmark\) | \(\cdot\) |
Custom adversarial criteria | \(\checkmark\) | \(\cdot\) | \(\checkmark\) | \(\cdot\) |
Multiple perceptual tasks | \(\checkmark\) | \(\cdot\) | \(\cdot\) | \(\cdot\) |
Standardized metrics | \(\checkmark\) | \(\cdot\) | \(\checkmark\) | \(\cdot\) |
Verifiable robustness bounds | \(\checkmark\) | \(\cdot\) | \(\cdot\) | \(\cdot\) |
Explanation of compared properties:
- Multi-platform support: supports at least the three deep learning frameworks,
Tensoflow
,PyTorch
, andKeras
.- Consistent API design: implementations of evaluation methods are platform-agnostic. More specifically, the same piece of code for an evaluation method (e.g., a
C&W
attack) can run against models across all platforms (e.g.,Tensorflow
,PyTorch
, andcloud API
).- Custom adversarial criterion: a criterion defines under what circumstances an
(input, label)
pair is considered an adversary. Customized adversarial criteria other thanmisclassification
should be supported.- Multiple perceptual tasks: supports computer vision tasks other than
classification
, e.g.,object detection
andface recognition
.- Standardized metrics: enables DNN models’ robustness to be comparable on all security and safety properties.
- Verifiable robustness bounds: supports verification of certain safety properties. Returns either a verifiable bound, indicating that the model is robust against perturbations within that bound, or return counter-examples.
Running benchmarks¶
You can run evaluation against DNN models with chosen parameters using launcher
.
For example:
python perceptron/launcher.py \
--framework keras \
--model resnet50 \
--criteria misclassification\
--metric carlini_wagner_l2 \
--image example.png
In above command line, the user lets the framework as keras
, the model as resnet50
,
the criterion as misclassification
(i.e., we want to generate an adversary which is
similar to the original image but has different predicted label), the metric as
carlini_wagner_l2
, the input image as example.png
.
You can try different combinations of frameworks, models, criteria, and metrics. To see more options using -h for help message.
python perceptron/launcher.py -h
We also provide a coding example which serves the same purpose as above command line. Please refer to Examples for more details.