Applying Artificial Intelligence in Cybersecurity Analytics and Cyber Threat Detection (eBook)

Shilpa Mahajan, Mehak Khurana, Vania Vieira Estrela (Herausgeber)

eBook Download: EPUB

2024 | 1. Auflage
368 Seiten
Wiley (Verlag)
978-1-394-19646-3 (ISBN)

APPLYING ARTIFICIAL INTELLIGENCE IN CYBERSECURITY ANALYTICS AND CYBER THREAT DETECTION

Comprehensive resource providing strategic defense mechanisms for malware, handling cybercrime, and identifying loopholes using artificial intelligence (AI) and machine learning (ML)

Applying Artificial Intelligence in Cybersecurity Analytics and Cyber Threat Detection is a comprehensive look at state-of-the-art theory and practical guidelines pertaining to the subject, showcasing recent innovations, emerging trends, and concerns as well as applied challenges encountered, and solutions adopted in the fields of cybersecurity using analytics and machine learning. The text clearly explains theoretical aspects, framework, system architecture, analysis and design, implementation, validation, and tools and techniques of data science and machine learning to detect and prevent cyber threats.

Using AI and ML approaches, the book offers strategic defense mechanisms for addressing malware, cybercrime, and system vulnerabilities. It also provides tools and techniques that can be applied by professional analysts to safely analyze, debug, and disassemble any malicious software they encounter.

With contributions from qualified authors with significant experience in the field, Applying Artificial Intelligence in Cybersecurity Analytics and Cyber Threat Detection explores topics such as:

Cybersecurity tools originating from computational statistics literature and pure mathematics, such as nonparametric probability density estimation, graph-based manifold learning, and topological data analysis
Applications of AI to penetration testing, malware, data privacy, intrusion detection system (IDS), and social engineering
How AI automation addresses various security challenges in daily workflows and how to perform automated analyses to proactively mitigate threats
Offensive technologies grouped together and analyzed at a higher level from both an offensive and defensive standpoint

Providing detailed coverage of a rapidly expanding field, Applying Artificial Intelligence in Cybersecurity Analytics and Cyber Threat Detection is an essential resource for a wide variety of researchers, scientists, and professionals involved in fields that intersect with cybersecurity, artificial intelligence, and machine learning.

Shilpa Mahajan, PhD, is an Associate Professor in the School of Engineering and Technology at The NorthCap University, India.

Mehak Khurana, PhD, is an Associate Professor in the School of Engineering and Technology at The NorthCap University, India.

Vania Vieira Estrela, PhD, is a Professor with the Telecommunications Department of the Fluminense Federal University, Brazil.

1
Analysis of Malicious Executables and Detection Techniques

Geetika Munjal and Tushar Puri

Amity School of Engineering and Technology, Amity University, Noida, Uttar Pradesh, India

1.1 Introduction

An instruction set created to harm a system is known as malware, which is short for malicious software [1]. The production of malware is increasing, making it more challenging for security firms to identify it. Traditionally, security firms and antivirus vendors employed antivirus software to distinguish between dangerous and clean data. Most of these tools compare the malicious programs to a database of well‐known malware signatures using a signature‐based method to identify them [2, 3]. The signature of an executable file serves as its distinctive identifier, and signatures can be generated using static, dynamic, and hybrid methodologies. However, this technique’s drawback is that it is ineffective at detecting new malware samples. Due to the continuous increase in the quantity of new malware samples, these signatures must be continually updated [3].

Static analysis, the method that extracts features from a program’s binary code by examining it and building models that illustrate the features, was developed to counter these tactics. These techniques are used to distinguish between hazardous and useful files. However, static analysis is easily evaded since malware authors utilize numerous code obfuscation techniques, like metamorphic and polymorphic approaches. Despite providing valuable insight into the behavior of programs, functions, and parameters, static analysis can still be unreliable [1].

Dynamic analysis, on the other hand, implements the software inside a secure environment to observe its behavior. This method exposes the code obfuscation strategies used by malware authors and works well with compressed files. However, dynamic analysis needs to be carried out within a secure environment to prevent system damage and can be time‐consuming. Additionally, malware may behave differently in a virtual (secure) environment compared to an actual environment, leading to an incorrect log of behavior [4].

Combining static and dynamic analysis techniques can result in a more effective and reliable malware detection strategy. The main categories of executable malicious code (MC) are (i) MC that has been injected, such as worms that use buffer overflow exploits to inject their code into active software processes, (ii) dynamically generated malware (MC), and (iii) obfuscated malware (MC), which includes, viruses, Trojan horses, and worms that cloak their code via data manipulations and obscure computations to avoid detection and analysis. Polymorphic viruses or Trojans are an example of obfuscated malware [1]. Static feature‐based analysis seems to be effective and efficient, as it enables network detection when the algorithm is loaded into memory [5, 6]. However, when the malicious file or code is compressed or encrypted, it becomes more challenging to detect. As a result, dynamic feature analysis must first unpack or decrypt the CPU instructions before being executed. Dynamic analysis for detecting network malware may not be practical due to the rapidity of network traffic [1].

Malicious executables are classified into three types based on how malware is transmitted: viruses, Trojan horses, and worms [7]. They infect already‐running programs, causing them to become “infected” and spread to other programs when they are run. Worms, on the other hand, are standalone programs that propagate throughout a network, usually by taking advantage of bugs in the software that is operating on networked machines. Trojan horses disguise themselves as legitimate applications while carrying out harmful tasks. Malicious executables aren’t really usually easily categorized and can behave in a variety of ways. Virus detection tools, including McAfee Virus Scan are extensively used, and Dell suggests Norton Antivirus for any and all new computers [7]. Although the titles of these programs include the term “virus,” some also detect worms and Trojan horses. This approach of looking for recognized patterns of MC, called signature‐based detection, is effective in detecting previously known threats [8]. However, it is not always effective against new and unknown threats [9]. In response to these limitations, a new approach to virus detection called behavior‐based detection has emerged. Based on their behavior, this strategy employs artificial intelligence (AI) and deep learning (DL) algorithms to discover and categorize new and unknown risks [10].

Behavior‐based detection relies on monitoring the actions of a piece of software, looking for signs of malicious behavior [8]. If a piece of software is behaving in a way that is deemed suspicious, it can be classified as a potential threat and further analyzed. This approach is more proactive and effective against new and unknown threats than traditional signature‐based detection [11]. In recent years, AI and machine learning (ML) algorithms have become more sophisticated, making it possible to automatically detect malware in real‐time and without human intervention [12].

1.2 Malicious Code Classification System

A static analysis approach is proposed to automate the discovery and categorization of the type of file without executing it, using a MC classification model. The classification system takes all files, including MC, normal files, and source files, as input data. During the pre‐processing step, the portable executable (PE) information extraction module and the picture production module are used to produce input data that is used in the classification stage. In the subsequent classification step, a variety of algorithms, including convolutional neural network (CNN), random forest, gradient boosting, and decision tree algorithms, are used to decide if the input is malicious. The final classification of MC is achieved by integrating the results from each model. The classification outcomes are stored in a database that includes information about the data along with a single value indicating whether or not the data is harmful. The system uses a learning model that has been developed using different algorithms as a preparation step. The input file is processed and converted into input data for the model by extracting hash values, PE data, and performing image conversion.

Hash Extraction: The input data is first transformed into an eigenvalue from its hash value to determine if the input data is duplicated. In the database update step, the classification outcome of newly entered data is incorporated into the database, and duplicate data is updated using the extracted hash value as a primary key.

Data extraction from PE: The header and sections of the PE structure contain the necessary data for PE files to function correctly in Windows. The capability to identify installed dynamic link libraries (DLLs) as well as the functions they perform using the import address table (IAT) inside the PE Header enables the extraction of malignancy‐related data from PE structures without the need to execute MC. If the file contains a PE structure, the header and section portions may be used to extract 55 characteristics, including entropy and packers. The binary file’s packing information is located using the Yet Another Reverse Engineering Framework (YARA) rule configuration, using signatures to recognize and categorize MC types. The image creation module visualizes and converts the input file for CNN by transforming the input data into a one‐dimensional vector [13].

1.3 Literature Review

In the field of malware detection, two major techniques have been employed: static analysis and dynamic analysis. The application of ML methods has been proposed to improve the performance of malware detection. Schultz et al. [1] introduced a method of using ML to detect new malicious executables by using three distinct byte sequences, readable texts, and PE as static features. The method was tested on 4266 different files and achieved an accuracy of 97.11% using the Bayes algorithm for classification. Usukhbayar et al. [2] presented a framework that utilized three static features, including data from the PE Header, application programming interface (API) function calls made by DLLs, and DLLs. They chose the subset of characteristics using data mining techniques like information gain and tested three different classification methodologies: Svms, Naive Bayes (NB), and J48 where maximum accuracy was obtained by J48 at 98%. Tzu‐Yen Wang et al. [3] used data contained in the PE Headers to detect malware. Their dataset consisted of 9771 different programs, including backdoors, email worms, Trojan horses, and viruses. The accuracy rates for viruses, email worms, Trojan horses, and backdoors were 97.19%, 93.96%, 84.11%, and 89.54%, respectively, demonstrating high detection rates for email worms and viruses. With the advancement of dynamic malware analysis, researchers have shifted from static feature extraction to dynamic analysis. Tian et al’s use of Weka classifiers to extract dynamic characterestics (API call sequences) out of an executable file operating in a virtual environment to separate malware from trustworthy software and identify the malware family. The dataset included 1824 executables, and the accuracy was 97%. Wang et al. [5] also proposed the use...

Erscheint lt. Verlag	22.3.2024
Sprache	englisch
Themenwelt	Mathematik / Informatik ► Informatik ► Netzwerke
Themenwelt	Technik ► Elektrotechnik / Energietechnik
ISBN-10	1-394-19646-6 / 1394196466
ISBN-13	978-1-394-19646-3 / 9781394196463

Haben Sie eine Frage zum Produkt?

EPUB (Adobe DRM)
Größe: 15,8 MB

Kopierschutz: Adobe-DRM
Adobe-DRM ist ein Kopierschutz, der das eBook vor Mißbrauch schützen soll. Dabei wird das eBook bereits beim Download auf Ihre persönliche Adobe-ID autorisiert. Lesen können Sie das eBook dann nur auf den Geräten, welche ebenfalls auf Ihre Adobe-ID registriert sind.
Details zum Adobe-DRM

Dateiformat: EPUB (Electronic Publication)
EPUB ist ein offener Standard für eBooks und eignet sich besonders zur Darstellung von Belletristik und Sachbüchern. Der Fließtext wird dynamisch an die Display- und Schriftgröße angepasst. Auch für mobile Lesegeräte ist EPUB daher gut geeignet.

Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen eine Adobe-ID und die Software Adobe Digital Editions (kostenlos). Von der Benutzung der OverDrive Media Console raten wir Ihnen ab. Erfahrungsgemäß treten hier gehäuft Probleme mit dem Adobe DRM auf.
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen eine Adobe-ID sowie eine kostenlose App.
Geräteliste und zusätzliche Hinweise

Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.