feat: OLE CF and VBA modules implemented #285

davidmagnotti · 2025-01-14T11:40:51Z

OLE CF (Object Linking and Embedding Compound File) format is a file format used for legacy Microsoft Office files, such as documents, workbooks, presentations, and others. It's also used with Visual Basic for Applications (VBA) which is known more commonly as Office macros.

I've implemented two modules for parsing OLE CF files and VBA. I've also expanded the dump command to allow dumping of file metadata such as stream metadata (OLE CF) and macros (VBA).

An example of how you could use this to identify the use of auto-execute macro method names like "Document_New":

import "vba"

rule detect_document_new
{
    condition:
        for any module in vba.module_code : (
            module matches /document_new/i
        )
}

- Added support for parsing OLE CF and VBA (macro-enabled Office) files.

- Addressed infinite loop issue in OLE CF parser.

plusvic · 2025-01-14T12:11:41Z

lib/src/modules/olecf/parser.rs

+        while current < MAX_REGULAR_SECTOR {
+            chain.push(current);
+            let next = match self.get_fat_entry(current) {
+                Ok(n) => n,
+                Err(_) => break,
+            };
+            if next >= MAX_REGULAR_SECTOR || next == FREESECT || next == ENDOFCHAIN {
+                break;
+            }
+            current = next;
+        }


The fuzzer found an out-of-memory (OOM) issue caused by this loop. The loop can be infinite while parsing some files, making the chain vector to grow forever.

I'm attaching a file that reproduces the issue (it must be unzipped).

oom-5ee7fdcab613b7416687f03d1e103519e27dd46e.zip

Great catch, fixed with this change that's committed (confirmed it correctly processes the attached file, whereas previously it didn't):

while current < MAX_REGULAR_SECTOR { // Prevent cycles by keeping track of visited sectors if chain.contains(&current) { // We've seen this sector before - it's a cycle break; } chain.push(current); let next = match self.get_fat_entry(current) { Ok(n) => n, Err(_) => break, }; // Check validity of next sector if next >= MAX_REGULAR_SECTOR || next == FREESECT || next == ENDOFCHAIN { break; } current = next; }

Thank you for addressing the other changes, too.

plusvic · 2025-01-20T11:50:44Z

lib/src/modules/olecf/parser.rs

+    }
+
+    fn get_regular_stream_data(&self, start_sector: u32, size: u64) -> Result<Vec<u8>, &'static str> {
+        let mut data = Vec::with_capacity(size as usize);


The fuzzer found another case in which the parser consumes too much memory due to unsanitized input. Particularly, the size passed to this call of Vec::with_capacity comes directly from the file, without any sanitaztion. It can be as large as 4GB.

This is a file that can reproduce the issue.
oom-7c7ec1f9736aed84501f8152250a8431e55917c7.zip

I highly recommend you to run the fuzzer in your side for a while to find issues in the parser. The fuzzer usually does a great a job at finding bugs. You can follow these steps:

rustup default nightly cd lib/fuzz cargo fuzz run vba_parser

plusvic · 2025-02-04T09:31:47Z

According to the specification the first 109 DIFAT entries are located after the header, in the first sector, but the DIFAT can contain more than 109 entries, and I can't see where the possibly remaining entries are parsed. It looks like only the first 109 entries are parsed. This could be that a FAT larger than 109 sectors won't be parsed correctly.

…s a stream.

davidmagnotti · 2025-07-12T20:44:58Z

Apologies for delay, fixed the OOM and updated from upstream/main. Though, I couldn't seem to get the fuzzer to work (last command you provided said command not found? tried building and other variations with fuzzer, no dice).

plusvic · 2025-07-14T07:44:14Z

Apologies for delay, fixed the OOM and updated from upstream/main. Though, I couldn't seem to get the fuzzer to work (last command you provided said command not found? tried building and other variations with fuzzer, no dice).

Sorry, you need to install cargo-fuzz with cargo install cargo-fuzz. See: https://github.com/rust-fuzz/cargo-fuzz

1ndahous3 · 2025-09-04T11:24:20Z

@davidmagnotti Hi, are you planning to finalize the modules in the near future? They seem very useful and I intend to contribute.

davidmagnotti · 2025-09-05T12:28:11Z

@davidmagnotti Hi, are you planning to finalize the modules in the near future? They seem very useful and I intend to contribute.

Apologies Roman, I do, life has just been busy. Need to run the fuzzer per Victor’s feedback, resolve any issues it identifies, then we should be good to merge.

davidmagnotti added 2 commits January 12, 2025 19:34

feat: OLE CF and VBA Modules Added

f46f723

- Added support for parsing OLE CF and VBA (macro-enabled Office) files.

Fix for OOM from Infinite Loop

00bca34

- Addressed infinite loop issue in OLE CF parser.

davidmagnotti mentioned this pull request Jan 14, 2025

feat: OLE CF and VBA modules implemented #274

Merged

fuzz: implement fuzzer for vba module.

105269c

plusvic requested changes Jan 14, 2025

View reviewed changes

plusvic and others added 5 commits January 14, 2025 13:23

style: fix clippy warnings.

b263827

style: fix clippy warning

ace9f2d

Merge branch 'main' of https://github.com/davidmagnotti/yara-x

16c3ad6

chore: remove println

0c0f4ec

style: apply rustfmt

1246649

plusvic reviewed Jan 20, 2025

View reviewed changes

plusvic added 4 commits January 20, 2025 13:35

style: apply rustfmt

7f53480

refactor: some changes to make parse_header easier to follow.

123bf31

Merge branch 'main' into vba

11e5dd5

refactor: simplify the parsing of the DIFAT entries after the header.

bd7cff2

plusvic and others added 4 commits February 4, 2025 11:37

refactor: simplify follow_chain function.

cc22412

refactor: put stream names and sizes under a structure that represent…

ba1f2dd

…s a stream.

Fixed OOM found from fuzzer

0345f10

Merge remote-tracking branch 'upstream/main'

c1b4b8d

davidmagnotti mentioned this pull request Jul 30, 2025

fix: decode_cpusubtype handling and misc others pstirparo/machofile#11

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

feat: OLE CF and VBA modules implemented #285

feat: OLE CF and VBA modules implemented #285

davidmagnotti commented Jan 14, 2025

Uh oh!

plusvic Jan 14, 2025

Uh oh!

davidmagnotti Jan 18, 2025

Uh oh!

plusvic Jan 20, 2025

Uh oh!

plusvic Jan 20, 2025

Uh oh!

plusvic commented Feb 4, 2025

Uh oh!

davidmagnotti commented Jul 12, 2025

Uh oh!

plusvic commented Jul 14, 2025

Uh oh!

1ndahous3 commented Sep 4, 2025

Uh oh!

davidmagnotti commented Sep 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

feat: OLE CF and VBA modules implemented #285

Are you sure you want to change the base?

feat: OLE CF and VBA modules implemented #285

Conversation

davidmagnotti commented Jan 14, 2025

Uh oh!

plusvic Jan 14, 2025

Choose a reason for hiding this comment

Uh oh!

davidmagnotti Jan 18, 2025

Choose a reason for hiding this comment

Uh oh!

plusvic Jan 20, 2025

Choose a reason for hiding this comment

Uh oh!

plusvic Jan 20, 2025

Choose a reason for hiding this comment

Uh oh!

plusvic commented Feb 4, 2025

Uh oh!

davidmagnotti commented Jul 12, 2025

Uh oh!

plusvic commented Jul 14, 2025

Uh oh!

1ndahous3 commented Sep 4, 2025

Uh oh!

davidmagnotti commented Sep 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants