Skip to content

Conversation

@davidmagnotti
Copy link
Contributor

OLE CF (Object Linking and Embedding Compound File) format is a file format used for legacy Microsoft Office files, such as documents, workbooks, presentations, and others. It's also used with Visual Basic for Applications (VBA) which is known more commonly as Office macros.

I've implemented two modules for parsing OLE CF files and VBA. I've also expanded the dump command to allow dumping of file metadata such as stream metadata (OLE CF) and macros (VBA).

An example of how you could use this to identify the use of auto-execute macro method names like "Document_New":

import "vba"

rule detect_document_new
{
    condition:
        for any module in vba.module_code : (
            module matches /document_new/i
        )
}

- Added support for parsing OLE CF and VBA (macro-enabled Office) files.
- Addressed infinite loop issue in OLE CF parser.
Comment on lines 249 to 259
while current < MAX_REGULAR_SECTOR {
chain.push(current);
let next = match self.get_fat_entry(current) {
Ok(n) => n,
Err(_) => break,
};
if next >= MAX_REGULAR_SECTOR || next == FREESECT || next == ENDOFCHAIN {
break;
}
current = next;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fuzzer found an out-of-memory (OOM) issue caused by this loop. The loop can be infinite while parsing some files, making the chain vector to grow forever.

I'm attaching a file that reproduces the issue (it must be unzipped).

oom-5ee7fdcab613b7416687f03d1e103519e27dd46e.zip

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch, fixed with this change that's committed (confirmed it correctly processes the attached file, whereas previously it didn't):

        while current < MAX_REGULAR_SECTOR {
            // Prevent cycles by keeping track of visited sectors
            if chain.contains(&current) {
                // We've seen this sector before - it's a cycle
                break;
            }
            
            chain.push(current);
            
            let next = match self.get_fat_entry(current) {
                Ok(n) => n,
                Err(_) => break,
            };
            
            // Check validity of next sector
            if next >= MAX_REGULAR_SECTOR || next == FREESECT || next == ENDOFCHAIN {
                break;
            }
            
            current = next;
        }

Thank you for addressing the other changes, too.

}

fn get_regular_stream_data(&self, start_sector: u32, size: u64) -> Result<Vec<u8>, &'static str> {
let mut data = Vec::with_capacity(size as usize);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fuzzer found another case in which the parser consumes too much memory due to unsanitized input. Particularly, the size passed to this call of Vec::with_capacity comes directly from the file, without any sanitaztion. It can be as large as 4GB.

This is a file that can reproduce the issue.
oom-7c7ec1f9736aed84501f8152250a8431e55917c7.zip

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I highly recommend you to run the fuzzer in your side for a while to find issues in the parser. The fuzzer usually does a great a job at finding bugs. You can follow these steps:

rustup default nightly
cd lib/fuzz
cargo fuzz run vba_parser

@plusvic
Copy link
Member

plusvic commented Feb 4, 2025

According to the specification the first 109 DIFAT entries are located after the header, in the first sector, but the DIFAT can contain more than 109 entries, and I can't see where the possibly remaining entries are parsed. It looks like only the first 109 entries are parsed. This could be that a FAT larger than 109 sectors won't be parsed correctly.

@davidmagnotti
Copy link
Contributor Author

Apologies for delay, fixed the OOM and updated from upstream/main. Though, I couldn't seem to get the fuzzer to work (last command you provided said command not found? tried building and other variations with fuzzer, no dice).

@plusvic
Copy link
Member

plusvic commented Jul 14, 2025

Apologies for delay, fixed the OOM and updated from upstream/main. Though, I couldn't seem to get the fuzzer to work (last command you provided said command not found? tried building and other variations with fuzzer, no dice).

Sorry, you need to install cargo-fuzz with cargo install cargo-fuzz. See: https://github.com/rust-fuzz/cargo-fuzz

@1ndahous3
Copy link
Contributor

@davidmagnotti Hi, are you planning to finalize the modules in the near future? They seem very useful and I intend to contribute.

@davidmagnotti
Copy link
Contributor Author

@davidmagnotti Hi, are you planning to finalize the modules in the near future? They seem very useful and I intend to contribute.

Apologies Roman, I do, life has just been busy. Need to run the fuzzer per Victor’s feedback, resolve any issues it identifies, then we should be good to merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants