Atomic Search

Atomic Search is a Python package I developed for my thesis research on malicious JavaScript detection. The package provides a method to identify suspicious syntax inside obfuscated scripts using an atomic–molecule search approach. It was built to support my machine learning pipeline and to solve issues related to syntax hiding techniques commonly used in JavaScript obfuscation.

Category

Machine Learning

Client

For Thesis Purpose

Start Date

September 2024

End Date

October 2024

Description

The main purpose of this project is to extract meaningful syntax patterns from heavily obfuscated JavaScript, where malicious logic is often hidden through concatenation, splitting, or restructuring. Atomic Search works by breaking code into smaller fragments and recombining them to detect target syntax that may indicate harmful behavior. This tool became an essential part of my research workflow because it allowed me to generate clearer datasets, improve feature extraction, and enhance the accuracy of the machine learning models used in my study.The goal is there are many variations of passages of Lorem Ipsum available, but the majority have suffered alteration in some form, by injected humour, or randomised words which don’t look even slightly believable.

There are many variations of passages of Lorem Ipsum available, but the majority have suffered alteration in some form, by injected humour, or randomised words which don’t look even slightly believable. If you are going to use a passage of Lorem Ipsum, you need to be sure there isn’t anything embarrassing hidden in the middle of text.

THE STORY

The idea for Atomic Search emerged when I found that most existing JavaScript detection methods struggled with obfuscated samples. During my thesis work, I encountered cases where malicious functions were hidden across multiple fragments, making them difficult to detect with traditional parsing. I then designed an approach that treats code as atomic units, combines them into meaningful structures, and analyzes them systematically. Building this package required continuous iteration, debugging, and validation through real samples, ultimately improving the overall data quality used in my machine learning experiments.

OUR APPROACH

My approach was to design a lightweight yet effective algorithm capable of handling messy and unpredictable JavaScript structures. I implemented custom extraction, molecule formation, logging, and testing modules to ensure the tool worked reliably for research purposes. The package includes automated tasks, clear directory organization, and a structured workflow to support experimentation and reproducibility. By combining algorithm design with practical engineering, Atomic Search became a solid foundation for my malicious JavaScript detection research.