資料內容:
PETA: It contains a total of 19, 000 pedestrian images
taken by real surveillance cameras [54]. These images are
randomly divided into 9, 500 training images, 1, 900 velidation
images, and 7, 600 test images. Each pedestrian image has
61 binary attributes and 4 multi-category attributes. Because
the distribution of some attributes is very uneven, the existing
methods mainly focus on 35 attributes of the 61 attributes.
PA100K: It is the largest open-source pedestrian attribute
dataset, with 26 pedestrian attributes annotated [55]. It con
tains 100, 000 pedestrian images collected by the surveillance
cameras, with 80, 000 images for training, 10, 000 images for
validation, and 10, 000 images for testing.
RAP: It has two versions, and the RAP-v1 dataset [51] is
used in our experiment. This dataset contains 41, 585 pedes
trian images collected from 26 indoor surveillance cameras,
including 69 binary attributes, while the existing methods
mainly focus on 51 attributes, and each of those is with a
proportion greater than 1%. The training set of this dataset
contains 33, 268 images, and the rest are used for testing.
According to the existing methods, we adopt five metrics
for evaluation: mean average precision (mA), accuracy (Accu),
precision (Prec), recall (Recall), and F1 score (F1)