Welcome to Perceptron Robustness Benchmark’s page!

Perceptron is a benchmark to test safety and security properties of neural networks for perceptual tasks.

It comes with support for many frameworks to build models including

  • TensorFlow
  • PyTorch
  • Keras
  • Cloud API
  • PaddlePaddle (In progress)

See currently supported evaluation metrics, models, adversarial criteria, and verification methods in Summary.

See current Leaderboard.


perceptron benchmark improves upon the existing adversarial toolbox such as cleverhans, foolbox, IBM ART, advbox in three important aspects:

  • Consistent API design that enables easy evaluation of models across different deep learning frameworks, computer vision tasks, and adversarial criterions.
  • Standardized metric design that enables DNN models’ robustness to be compared on a large collection of security and safety properties.
  • Gives verifiable robustness bounds for security and safety properties.

More specifically, we compare perceptron with existing DNN benchmarks in the following table:

DNN Benchmarks Comparison
Properties Perceptron Cleverhans Foolbox IBM ART
Multi-platform support \(\checkmark\) \(\checkmark\) \(\checkmark\) \(\checkmark\)
Consistent API design \(\checkmark\) \(\cdot\) \(\checkmark\) \(\cdot\)
Custom adversarial criteria \(\checkmark\) \(\cdot\) \(\checkmark\) \(\cdot\)
Multiple perceptual tasks \(\checkmark\) \(\cdot\) \(\cdot\) \(\cdot\)
Standardized metrics \(\checkmark\) \(\cdot\) \(\checkmark\) \(\cdot\)
Verifiable robustness bounds \(\checkmark\) \(\cdot\) \(\cdot\) \(\cdot\)

Explanation of compared properties:

  • Multi-platform support: supports at least the three deep learning frameworks, Tensoflow, PyTorch, and Keras.
  • Consistent API design: implementations of evaluation methods are platform-agnostic. More specifically, the same piece of code for an evaluation method (e.g., a C&W attack) can run against models across all platforms (e.g., Tensorflow, PyTorch, and cloud API).
  • Custom adversarial criterion: a criterion defines under what circumstances an (input, label) pair is considered an adversary. Customized adversarial criteria other than misclassification should be supported.
  • Multiple perceptual tasks: supports computer vision tasks other than classification, e.g., object detection and face recognition.
  • Standardized metrics: enables DNN models’ robustness to be comparable on all security and safety properties.
  • Verifiable robustness bounds: supports verification of certain safety properties. Returns either a verifiable bound, indicating that the model is robust against perturbations within that bound, or return counter-examples.

Running benchmarks

You can run evaluation against DNN models with chosen parameters using launcher. For example:

python perceptron/launcher.py \
    --framework keras \
    --model resnet50 \
    --criteria misclassification\
    --metric carlini_wagner_l2 \
    --image example.png

In above command line, the user lets the framework as keras, the model as resnet50, the criterion as misclassification (i.e., we want to generate an adversary which is similar to the original image but has different predicted label), the metric as carlini_wagner_l2, the input image as example.png.

You can try different combinations of frameworks, models, criteria, and metrics. To see more options using -h for help message.

python perceptron/launcher.py -h

We also provide a coding example which serves the same purpose as above command line. Please refer to Examples for more details.

Indices and tables