Highlights:
  • HTML files are being used in phishing attacks, mimicking trusted websites
  • HT-ML Guard is designed to analyze HTML files and accurately determine whether they are malicious, enhancing cyber security defenses.
  • Check Point Harmony Email and Collaboration customers remain protected against this type of attack.

HTML files are commonly used in the initial stages of cyber attacks. Cyber criminals craft these files to look legitimate, tempting unsuspecting users into a false sense of security. The main goal of these malicious HTML files is to deceive users into either disclosing sensitive information, such as login credentials and personal details, or downloading and executing malicious payloads that can compromise their systems. Check Point research found that over half of malicious files are HTML attachments.

To combat this popular tactic, TheatCloud AI has developed a new engine: HT-ML Guard. This machine learning-based engine is designed to analyze HTML files and accurately determine whether they are malicious, enhancing cyber security defenses.

HTML Phishing Attacks are a Growing Threat

There is an increase in email-based attacks, notably those involving HTML attachments (see figure 1 and figure 2). Over the last 6 years, email-based attacks surged from 33% to 88%, as compared to web-based attack vectors. Cyber attackers are exploiting HTML files more frequently due to their ability to embed malicious code and links that appear legitimate to unsuspecting users.

Figure 1 – Email vs. Web Attack vectors in 2018-2023

Figure 2 – Email top malicious file types in 2023

In phishing attacks, these HTML files often mimic trusted websites, presenting login forms or other input fields to capture user credentials. Users believe they are interacting with a legitimate site, but their information is instead sent to the attacker.

The below (see figure 3) shows two examples of HTML scams, one that mimics a word document that requests a download, while the other mimics a login to steal credentials.

Figure 3 – HTML files mimicking trusted website

Other malicious HTML files might include embedded scripts that automatically download malware to the user’s device upon opening the file. These scripts can exploit vulnerabilities in the user’s browser or operating system, leading to the installation of ransomware, spyware, or other harmful software.

How HT-ML Guard Works

In developing our machine learning model to prevent malicious HTML files, we created a structured approach while ensuring the protection of our intellectual property. Here are the key stages of our model development process:

Figure 4 – HT-ML Guard prediction pipeline

  1. Data Collection

Check Point’s research team is dedicated to tracking the latest malicious campaigns, ensuring our model is always trained on the most relevant and current threats.

  1. Parse HTML

By breaking down HTMLs into their constituent parts, we can systematically examine each component, ensuring a thorough inspection of the entire file.

  1. Extract Features

From the parsed HTML, we extract a diverse set of features crucial for distinguishing between benign and malicious files. These features are derived from various aspects of HTML, including JavaScript, metadata, links, and more.

  1. Model Prediction

Using the extracted features, the model evaluates the likelihood of the HTML file being malicious or benign.

This approach ensures a comprehensive analysis of HTML files while maintaining the confidentiality of our proprietary methods and algorithms.

Model Detections and Results

The implementation of our machine learning model has significantly enhanced ThreatCloud AI’s ability to detect malicious and phishing HTML files. Here are some key highlights of its performance:

  1. Increased Detections

The model has more than doubled the number of malicious and phishing HTML files detected, significantly bolstering customers’ cyber security defenses, and reducing the risk of successful attacks.

  1. High Accuracy

Accuracy is a critical factor for any detection system, and our model excels in this regard. It boasts an impressively low false positive rate of approximately 1:1000 benign HTML files. This level of precision ensures that our security team can focus on genuine threats without being bogged down by erroneous alerts.

  1. Superior Performance Compared to Other Vendors

In comparative evaluations, our model has consistently outperformed other vendors’ detection systems by between 40%-250%. It has successfully identified threats that many other vendors have ignored or missed entirely. This capability is particularly noteworthy in cases where new malicious campaigns have zero detections on Virus Total, highlighting our model’s advanced threat recognition and its ability to stay ahead of evolving cyber threats.

Case Study: Detecting a Sophisticated Phishing Campaign

To illustrate the effectiveness of HT-ML guard, let’s delve into a case study where the model successfully identified a sophisticated phishing campaign that had gone unnoticed by other security solutions.

The Threat: Attackers crafted an HTML file that mimicked a widely used corporate login page, complete with company logos and legitimate-looking forms.

Here is a step-by-step breakdown of the attack:

Figure 5 – Case Study Email Attack Chain

  1. Phishing Email:

The user receives an email with a fake order notification. The email appears legitimate, often containing company branding and professional language to gain the user’s trust.

Figure 6 – Phishing mail content

  1. Malicious HTML Attachment:

The email includes an attachment — an HTML file containing the order list. This attachment is designed to look like an innocent document.

  1. Deceptive Message:

Upon opening the HTML file, the user is presented with a message stating that the document cannot be opened directly. The message instructs the user to download the document to view the order list.

Figure 7 – HTML open in browser and automatically downloads .tar file

  1. Content Smuggling and Auto-download:

The HTML file uses a technique known as content smuggling to automatically download a TAR file named “order list” without the user’s knowledge. This download is triggered by hidden scripts within the HTML file.

The HTML file contained JavaScript code with well-documented comments used to download LNK file. However, the attacker modified the content to download a TAR file containing a VBE script instead. This deceptive change was aimed at bypassing security filters and delivering the malicious payload.

Figure 8 – HTML smuggling script

  1. TAR file

The TAR file contains a VBE (Visual Basic Script Encoded) file named “order list.”

  1. Execution of VBE File:

When the user clicks on this file, believing it to be the order list, the VBE file is executed.

Figure 9 – VBE file content

The VBE file contains an encrypted EXE file. Upon execution, the VBE file decrypts the EXE file, saves it to the user’s system, and then runs it.

Figure 10 – VBE creates an EXE file and then executes it with WScript.Shell

  1. Malware Deployment:

The EXE file is a variant of LokiBot, a notorious piece of malware designed to steal sensitive information such as login credentials, banking information, and other personal data.

Figure 11 – EXE file created  

Figure 12 – EXE analysis with Anyrun

Key Indicators:

  • Phishing Message: The form elements in the HTML file were designed to capture user credentials, a common indicator of phishing attempts.
  • Content Smuggling: Malicious payloads were embedded within benign elements, intended to bypass traditional security filters. The model effectively identified the large data blob and, using contextual indicators, determined that it was malicious HTML.

Results Our engines detected the malicious file, which was blocked by our security software. This was a zero day attack, and ThreatCloud AI made the first detection of it in the wild. Our research team confirmed that a few days later the malicious files were publicly identified on community boards by victims who had been attacked.

Impact: Thanks to the timely detection by our model, the zero-day phishing campaign was prevented before it could cause any harm.

This case exemplifies how our machine learning model not only enhances our detection capabilities but also provides a critical layer of defense against emerging and sophisticated cyber threats.

Check Point’s HT-ML Guard, part of ThreatCloud AI, revolutionizes Threat Prevention, providing industry leading security as part of Check Point’s  Harmony Email and Collaboration.

To learn about Check Point threat prevention, schedule a demo or a free security checkup to assess your security posture. 

 

You may also like