1. Policy Prediction Evaluation

This page describes the metrics and the data-format adopted by DBNet 2018 challenge. To make models more generic to real-world scenarios, varied metrics are designed to test the performance of models (average performance, worst cases, etc.). The participants need to follow the guidelines and generate results on the test sets, for which ground-truth annotations are hidden. The generated results must follow the format described below and uploaded to our server.

2. Data Format

2.1 Prepared Data

Download DBNet-2018 challenge data here and organize the folders as follows (in dbnet-2018/):

├── train
├─  └── i [56 folders]
├─      ├── dvr_66x200    [<= 120 images]
├─      ├── dvr_1920x1080 [<= 120 images]
├─      ├── points_16384  [<= 120 clouds]
├─      └── behavior.csv  [labels]
├── val
├─  └── j [20 folders]
├─      ├── dvr_66x200    [<= 120 images]
├─      ├── dvr_1920x1080 [<= 120 images]
├─      ├── points_16384  [<= 120 clouds]
├─      └── behavior.csv  [labels]
└── test
    └── k [20 folders]
        ├── dvr_66x200    [<= 120 images]
        ├── dvr_1920x1080 [<= 120 images]
        └── points_16384  [<= 120 clouds]
    

In general, the train/val/test ratio is approximatingly set to 8:1:1 and all of the val/test data are released already. Almost five eighths of training data are still pre-processed and will be uploaded soon.

Please note that the data in subfolders of train/, val/ and test/ are continuous and time-ordered. The ith line of behavior.csv correponds to i-1.jpg in dvf_66x200/ and i-1.las in points_16384/. Moreover, if you don't intend to utilize prepared data directly, please download and pre-process the raw data in your favorite methods.

2.2 Raw Data

Download DBNet raw data here and organize the folders as follows


    ├── train
    ├─  └── i [284 folders]
    ├─      ├── dvr_1920x1080 [<= 120 images]
    ├─      ├── raw_points    [<= 120 clouds]
    ├─      └── behavior.csv  [labels]
    ├── val
    ├─  └── j [20 folders]
    ├─      ├── dvr_1920x1080 [<= 120 images]
    ├─      ├── raw_points    [<= 120 clouds]
    ├─      └── behavior.csv  [labels]
    └── test
        └── k [20 folders]
            ├── dvr_1920x1080 [<= 120 images]
            └── raw_points    [<= 120 clouds]

2.3 Video Data

Download DBNet video data here (we organize the video with date). At the same time, we provide some tools for the convenience of participants.

2.4 Prediction Format

The format of predictions for test sets is consistent to the labels in training and validation set (behavior.csv). Specifically, there are only prediction values delimited by the comma (no description words or sentences). The first and second columns indicate vehicle speed and steering wheel angle, respectively.

Examples:

  • 23.5,-23
  • 27.3,-15
  • 27.2,-7
  • 26.5,12

3. Metrics

The following five metrics are used for characterizing the performance of driving policy learning on DBNet 2018. The implementations of these metrics could be found in the demo GitHub.

Accuracy:  In accuracy computing, how many predictions are correct needs to be counted. When the bias between prediction and ground truth is smaller than tolerance threshold, we count this prediction are a correct case. We let `acc (< x)` denote the accuracy within x bias.

Area Under Curve (AUC):  To overall evaluate the performance, the area under curve (AUC) metric is calculated according to the accuracy vs threshold curve. Then the AUC will be normalized to 0~1. Larger AUC indicates overall better performance.

Maximum Error (ME):  Maximum error (ME) metric is set due to the consideration of safety. It must be carefully considered in practice even if the average errors are perfect. Formally, ME is the maximum error in all the predictions for steering wheel angles or vehicle speeds.

Average Error (AE):  In contrast to ME, Average error (AE) metric is set to measure the overall performance in a long term without considering the suddenly unexpected large error. Formally, AE is the average error in all the predictions for steering wheel angles or vehicle speeds.

Average Maximum Error (AME):  Because of the data is split into two-minute continuous periods, the performance for each period should also be considered. Average maximum error (AME) is calculated by averaging the MEs for each two-minute period. AME is similar to ME in function, but concentrates on sequence performance.

4. Submit the results

From now, only the email (fill in email_addr here) is supported for submitting the results. A submitting system will be deployed to automatically collect results and measure the performance.

5. Leaderboard

Method Setting Accuracy Auc ME AE AME
nvidia-pn Video+Laser Points angle 70.65%(<5) 0.7799 29.46 4.23 20.88
spend 82.21%(<3) 0.8701 18.56 1.80 9.68