Show simple item record

dc.contributor.authorSayem, Md
dc.contributor.authorRakibul Hasan, Md.
dc.contributor.authorIslam Anika, Morium
dc.date.accessioned2025-12-21T07:53:35Z
dc.date.available2025-12-21T07:53:35Z
dc.date.issued2025-12-13
dc.identifier.urihttp://ar.iub.edu.bd/handle/11348/1037
dc.description.abstractModern Vision-Language Model (VLM) has shown tremendous performance in multi- modal reasoning, captioning, retrieval and generative tasks, but it is also fatally suscep- tible to two types of attacks: adversarial perturbation and jailbreak prompts. In order to tackle these balancing threats, the current thesis presents two lightweight, model-free detection systems that will improve the security and resilience of modern multimodal systems. The former contribution is LMDF, which is a semantic-consistency-based ad- versarial detection model to detect perturbation-based attacks by evaluating cross-modal correspondence between image and text embeddings. Based on the conceptual underpin- nings of contrastive learning, LMDF uses the fact that adversarial perturbations, though imperceptible on the eye, introduce quantifiable distortions in the shared embedding space. LMDF identifies adversarial manipulations of frozen pretrained languages encod- ings like CLIP and BLIP-2 with high accuracy through the evaluation of cosine similarity between language and vision encodings. Major experiments between various datasets and attack algorithms (FGSM, PGD and Adversarial Patch) show high effectiveness with maximum accuracy and AUC scores reaching up to 91.2 and AUC up to 0.950 with mini- mum computational costs; only two forward passes and similarity calculations are needed. The second addition is a multimodal jailbreak detecting framework based on confidence which expands the concepts of Free Jailbreak Detection to vision-language environment. This method compares temperature scaled token probability distributions produced by decoder-based VLMs and derives five important statistical properties, namely minimum token confidence, first-token confidence, mean token confidence, entropy and confidence standard deviation. Jailbreak induces the typical instability of these confidence profiles which allow effective classification with a small threshold-based detector. Empirical val- idation shows great discriminative ability with AUC = 0.979, 90 percent accuracy and F1-score = 0.907 at an optimal temperature setting without adjusting its model, gradi- ent access, or evident computing cost.Combined, these two detection modules deal with different yet more and more common multimodal attack vectors. This paper combines semantic alignment analysis of adversarial perturbations with behavioral analysis based on confidence to offer a consistent, practical, and efficient defense mechanism to protect the modern VLMs. The suggested frameworks promote the objective of developing reli- able, multimodal AIs, which can work safely in the real-world environment, high-stakes, and adversarial settings.en_US
dc.publisherIUBen_US
dc.subjectMultimodal Attack Detectionen_US
dc.subjectVision-Language Modelsen_US
dc.subjectCross-Modal Consistencyen_US
dc.titleA Unified Lightweight Framework for Detecting Multimodal Attacksen_US
dc.typeThesisen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record


Copyright © 2002-2021  IUB Academic Repository.
Maintained by  Library Information Technology (LIT)
LIT