Abstract

Object detectors have improved in recent years, obtaining better results and faster inference time. However, small object detection is still a problem that has not yet a definitive solution. The autonomous weapons detection on Closed-circuit television (CCTV) has been studied recently, being extremely useful in the field of security, counter-terrorism, and risk mitigation. This article presents a new dataset obtained from a real CCTV installed in a university and the generation of synthetic images, to which Faster R-CNN was applied using Feature Pyramid Network with ResNet-50 resulting in a weapon detection model able to be used in quasi real-time CCTV (90 ms of inference time with an NVIDIA GeForce GTX-1080Ti card) improving the state of the art on weapon detection in a two stages training. In this work, an exhaustive experimental study of the detector with these datasets was performed, showing the impact of synthetic datasets on the training of weapons detection systems, as well as the main limitations that these systems present nowadays. The generated synthetic dataset and the real CCTV dataset are available to the whole research community.

Datasets

This study presents two new datasets, being these: “Mock attack dataset” and “Unity synthetic dataset”.

Mock attack dataset

This dataset has been manually annotated and collected during a mock attack, after obtaining all the permissions by our University and the security personnel. Details are presented below, indicating each of the cameras used during the mock attack and the scenarios they present.

Infrastructure for data acquisition is composed of three surveillance cameras located at different places in the same area covering two different corridors and one entrance, forming different scenarios. The description of each camera is as follows:

Cam1: located in one of the two corridors, it presents some conflicting objects, such as doors or bins, and the lighting is uniform. The time of the video sequence for this camera is 40 min and 25 s. Selecting the five segments with movement, the total duration is 5 min and 4 s. These segments were manually annotated at 2 frames per second (FPS), resulting in a total of 607 frames.
Cam7: located in the other corridor, this camera presents similarities with Cam1, both in scenery and lighting. However, Cam7 presents more conflicting objects, such as a fire extinguisher or objects on the walls. The duration of the sequence for this camera is 1 h, 3 min and 34 s. Selecting the segments with movement, the total duration is 29 min and 16 s. These segments were annotated at 2 FPS, resulting in a total of 3511 frames.
Cam5: located at the entrance of a university module, presents some conflicting objects, such as a black carpet on the floor, and also irregular lighting with rays covering part of the scene. The duration of the video sequence for this camera is 39 min and 7 s. Choosing segments with movement, the time decreases to 8 min and 36 s. This sequence was annotated at 2 FPS, resulting in a total of 1031 frames.

Unity synthetic dataset

This dataset was generated by modeling in Unity Game Engine a scenario that emulates a part of a city and an educational center within it. Several cameras capture the movements of multiple characters, made up of 11 different models and 7 animations. These images enhance the generated datasets with 11 different objects: 4 types of handguns, 5 types of rifles, a knife, and a smartphone. This dataset consists of three splits with 500 (U0.5), 1000 (U1) and 2500 (U2.5) images.

Download Datasets

Full dataset: Hugging Face link.
Mock attack dataset: US - Mock Attack Cam 1, 5, 7.
Unity synthetic dataset: US - Unity Synthetic Dataset U0.5, U1 and U2.5.

Terms of use

This dataset can be used for academic research free of charge, citing the paper as we explain below. If you seek to use the data for commercial purposes please contact us.

Citation

If you use our dataset, please kindly cite the following paper: Real-time gun detection in CCTV: An open problem. Neural Networks (2020), doi: https://doi.org/10.1016/j.neunet.2020.09.013.

@article{SalazarGonzalez2020,
title = "Real-time gun detection in CCTV: An open problem",
journal = "Neural Networks",
year = "2020",
issn = "0893-6080",
doi = "https://doi.org/10.1016/j.neunet.2020.09.013",
url = "http://www.sciencedirect.com/science/article/pii/S0893608020303361",
author = "Salazar Gonz{\'{a}}lez, Jose L. and Zaccaro, Carlos and {\'{A}}lvarez-Garc{\'{i}}a, Juan A. and Soria-Morillo, Luis M. and Sancho Caparrini, Fernando",
}

License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. Contact the authors of this work for commercial use.