| dc.description.abstract | Violence is one of the biggest concerns in the world. It leads to big problems in both society and the economy. This is an important issue, according to both the World Health Organization and the Institute for Economics and Peace. Video surveillance has become a big part of keeping people safe. However, someone has to view the video in traditional systems for them to be effective.
Viewers are limited and tend to miss violent events in time. It becomes increasingly difficult to monitor violent events when there are too many cameras. This proves the importance of automated real-time detection for violent events. The aim of this thesis is to create a deep learning method that makes detecting these events more accurate, fast, and efficient. It suggests a hybrid approach
that combines two powerful tools: Convolutional Neural Network (CNNs) and Recurrent Neural Network (RNNs). It uses a pre-trained ResNet50 model for extracting spatial features. It then follows up with a Long Short-Term Memory (LSTM) network for extracting temporal features. This approach works very well for both space and time features. It doesn't need as much processing
power as more complicated models like 3D CNNs. It explores two ways in which the video data is preprocessed for the model. The first is through evenly spaced RGB frames. The other is where it explores the differences between frames. This involves calculating the absolute difference between frames. The disparities in frames highlight motion. This is important for the identification of violence. They also remove noise in stationary backgrounds. This makes it simpler to locate
things more surely and using less computer power. This makes it practical for real-time applications. The UCF Crime dataset is used for testing. It contains more than 1,900 real-life videos. These videos contain issues such as light intensity levels, diverse angles, and an imbalance between violent and non-violent videos. Both methods were tested under equal conditions.
Accuracy and the Area Under the Receiver Operating Characteristic Curve (AUROC) were used for testing the models. Both the frame difference approach and the frame difference method were correct 85% of the time. The frame difference approach also outperformed the other by having a higher AUROC. This is 0.9091 compared to 0.8892 by the other. This is more effective at
separating the two. It also improved training time by 2.9% and inference time by 4.2%. This indicates it is more accurate and efficient for CNN-LSTM models in detecting violence. The model is very effective in practical scenarios. It can apply in a range of scenarios concerning surveillance. Through the focus on motion, it is able to separate the violent scenes from the non-violent scenes even in crowded settings. Due to its effectiveness, it is also appropriate for resource-poor devices. The model is able to apply in the current systems used for surveillance. This is even in public centers or public transport. This development is an improvement in the systems for intelligent surveillance. This is a practical approach towards public safety in a complex city setting. This is a practical approach towards public safety. This is a practical approach towards public safety. This is a practical approach towards public safety. This can be implemented in schools, airports, and other busy places where it is necessary to detect violence immediately. This method ensures that people are protected because it solves present problems and allows people to develop. | en_US |