Skip to content

Conversation

madhub
Copy link

@madhub madhub commented Sep 20, 2025

This pull request introduces a new otelcol.exporter.file component, which allows metrics, logs, and traces to be written to disk with options for rotation, compression, and grouping by resource attribute. The implementation includes documentation, code, dependency updates, and thorough validation and testing. This addition enables users to persist telemetry data locally in either JSON or Protocol Buffers format and provides flexible configuration for file management.

New Feature: File Exporter Component

  • Added the otelcol.exporter.file component, enabling telemetry data export to disk with support for file rotation, compression (zstd), and grouping output files by resource attribute. The component wraps the upstream OpenTelemetry Collector Contrib fileexporter. [1] [2]

Documentation and Usage

  • Created comprehensive documentation for otelcol.exporter.file, detailing arguments, blocks (rotation, group_by, debug_metrics), configuration options, usage examples, and technical details.

Codebase Integration

  • Registered the new component in the Alloy system and ensured it is available for use by importing it in the component registry (all.go).

Dependency Management

  • Added github.com/open-telemetry/opentelemetry-collector-contrib/exporter/fileexporter as a dependency in go.mod to support the new exporter.

Validation and Testing

  • Implemented robust validation logic for configuration options, including edge cases for path, format, compression, rotation, and group_by settings. Added unit tests to verify defaulting, validation, conversion, and unmarshalling of configuration.

This change provides a flexible and robust mechanism for exporting telemetry data to files, suitable for local debugging, archival, or integration with other file-based workflows.…ration options

PR Description

Which issue(s) this PR fixes

Fixes #4398

Notes to the Reviewer

Following tests were done

  1. Formatting test with file exporter
    ./alloy fmt example-file-exporter.alloy

  2. Run test with file OTLP input & file export
    ./alloy run test-file-exporter.alloy

  3. Conversion test OpenTelemetry configuration and converting it to alloy
    ./alloy run test-file-exporter.alloy

  4. End 2 end test : OTLP input , file export & sending sample OTel Log record using Curl
    ./alloy run test-converted.alloy

Alloy configuration used for testing

// Example configuration showing otelcol.exporter.file working with other components

// A test receiver that generates traces
otelcol.receiver.otlp "test" {
  grpc {
    endpoint = "127.0.0.1:4317"
  }
  
  http {
    endpoint = "127.0.0.1:4318"
  }
  
  output {
    metrics = [otelcol.exporter.file.json_output.input]
    logs    = [otelcol.exporter.file.json_output.input] 
    traces  = [otelcol.exporter.file.json_output.input]
  }
}

// File exporter with JSON format
otelcol.exporter.file "json_output" {
  path   = "/tmp/alloy-telemetry.json"
  format = "json"
  flush_interval = "2s"
  
  debug_metrics {
    disable_high_cardinality_metrics = true
  }
}

// File exporter with rotation
otelcol.exporter.file "rotated_output" {
  path   = "/tmp/alloy-rotated.json"
  format = "json"
  
  rotation {
    max_megabytes = 10
    max_days      = 7
    max_backups   = 5
    localtime     = true
  }
  
  debug_metrics {
    disable_high_cardinality_metrics = true
  }
}

// File exporter with compression
otelcol.exporter.file "compressed_output" {
  path        = "/tmp/alloy-compressed.json"
  format      = "proto"
  compression = "zstd"
  
  debug_metrics {
    disable_high_cardinality_metrics = true
  }
}

// Note: Debug exporter removed as it requires --stability.level=experimental

Example test message sent via curl

curl -X POST http://localhost:4318/v1/logs   -H "Content-Type: application/json"   -d '{
    "resourceLogs": [
      {
        "resource": {
          "attributes": [
            { "key": "service.name", "value": { "stringValue": "demo-service" } },
            { "key": "service.version", "value": { "stringValue": "1.0.0" } },
            { "key": "host.name", "value": { "stringValue": "demo-host" } }
          ]
        },
        "scopeLogs": [
          {
            "scope": {
              "name": "demo-logger",
              "version": "1.0.0"
            },
            "logRecords": [
              {
                "timeUnixNano": "1694700000000000000",
                "severityText": "INFO",
                "body": { "stringValue": "Hello from OpenTelemetry logs!" },
                "attributes": [
                  { "key": "env", "value": { "stringValue": "dev" } },
                  { "key": "region", "value": { "stringValue": "ap-south-1" } }
                ]
              }
            ]
          }
        ]
      }
    ]
  }'

PR Checklist

  • [x ] CHANGELOG.md updated
  • [ x] Documentation added
  • [x ] Tests updated
  • [x ] Config converters updated

@madhub madhub requested review from clayton-cornell and a team as code owners September 20, 2025 17:56
@CLAassistant
Copy link

CLAassistant commented Sep 20, 2025

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

✅ clayton-cornell
❌ madhub


madhub seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

@madhub
Copy link
Author

madhub commented Sep 21, 2025

CLA assistant check Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
madhub seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

Done
image

@clayton-cornell clayton-cornell added the type/docs Docs Squad label across all Grafana Labs repos label Sep 24, 2025
@clayton-cornell
Copy link
Contributor

There are some changes we need to make to the docs, but.. they are a lot to put into the GitHub webUI. Would you mind if I edited the doc input in your fork and pushed back to GitHub?

@madhub
Copy link
Author

madhub commented Sep 25, 2025

There are some changes we need to make to the docs, but.. they are a lot to put into the GitHub webUI. Would you mind if I edited the doc input in your fork and pushed back to GitHub?

Pls go-ahead

@clayton-cornell
Copy link
Contributor

@madhub I've made some changes to the doc and pushed to your branch. The changes are primarily style and linting. I reorganized a few bits to conform a bit better to the general style currently in-use in the component docs. I also added the metadata and othe rinfo for Community written/supported components.

Copy link
Author

@madhub madhub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@madhub
Copy link
Author

madhub commented Sep 26, 2025

@clayton-cornell anything pending from my side ?,

@clayton-cornell
Copy link
Contributor

Next is a code review from @grafana/grafana-alloy-maintainers

@drconopoima
Copy link

Is this extension with the "compression: zstd" feature enabled not adding a filename extension?

I see the testing code includes cases with compression and filename ".json"

otelcol.exporter.file "compressed" {
  path        = "/tmp/traces.jsonl"
  format      = "proto"
  compression = "zstd"
}
alloyCfg := `
path = "/tmp/*/test.json"
format = "json"
append = false
compression = "zstd"
flush_interval = "5s"
rotation {
  max_megabytes = 50
  max_days = 7
  max_backups = 10
  localtime = true
}
group_by {
  enabled = true
  resource_attribute = "service.name"
  max_open_files = 50
}

I searched for ".zst" or ".gz" and found no references in the code

@madhub
Copy link
Author

madhub commented Oct 1, 2025

Is this extension with the "compression: zstd" feature enabled not adding a filename extension?

I see the testing code includes cases with compression and filename ".json"

otelcol.exporter.file "compressed" {
  path        = "/tmp/traces.jsonl"
  format      = "proto"
  compression = "zstd"
}
alloyCfg := `
path = "/tmp/*/test.json"
format = "json"
append = false
compression = "zstd"
flush_interval = "5s"
rotation {
  max_megabytes = 50
  max_days = 7
  max_backups = 10
  localtime = true
}
group_by {
  enabled = true
  resource_attribute = "service.name"
  max_open_files = 50
}

I searched for ".zst" or ".gz" and found no references in the code
Compression support is included the fileexporter, see the code here fileexporter
also here https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/ce1bb622659c2a7b1782a154f2d7d9dc0a9b81e2/exporter/fileexporter/factory.go#L38

@drconopoima
Copy link

drconopoima commented Oct 1, 2025

I see in the repo you linked that for other contrib exporter and other compression algorithm it appended ".gz" to the files without specifying:
as logs_103016847.json.gz, I set compression: gzip, but did not specify the file extension
I'm guessing the same behaviour applies to Zstd, but I was unable to find examples if it appended ".zst" or ".gz", as I have learned that Zstd is gzip compatible

@madhub
Copy link
Author

madhub commented Oct 1, 2025

I see in the repo you linked that for other contrib exporter and other compression algorithm it appended ".gz" to the files without specifying: as logs_103016847.json.gz, I set compression: gzip, but did not specify the file extension I'm guessing the same behaviour applies to Zstd, but I was unable to find examples if it appended ".zst" or ".gz", as I have learned that Zstd is gzip compatible

I tested it with zstd, and it seems the file is Zstd-compressed with a length prefix. This behavior is also documented here https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/exporter/fileexporter/README.md#file-format
image

Here is the sample output after removing the 4 bytes. You can clearly see zstd magic string 28b5 2ffd

image

For Reference

// Example configuration showing otelcol.exporter.file working with other components

// A test receiver that generates traces
otelcol.receiver.otlp "test" {
  grpc {
    endpoint = "127.0.0.1:4317"
  }
  
  http {
    endpoint = "127.0.0.1:4318"
  }
  
  output {
    metrics = [otelcol.exporter.file.compressed_output.input]
    logs    = [otelcol.exporter.file.compressed_output.input] 
    traces  = [otelcol.exporter.file.compressed_output.input]
  }
}

// File exporter with JSON format
otelcol.exporter.file "json_output" {
  path   = "/tmp/alloy-telemetry.json"
  format = "json"
  flush_interval = "2s"
  
  debug_metrics {
    disable_high_cardinality_metrics = true
  }
}

// File exporter with rotation
otelcol.exporter.file "rotated_output" {
  path   = "/tmp/alloy-rotated.json"
  format = "json"
  
  rotation {
    max_megabytes = 10
    max_days      = 7
    max_backups   = 5
    localtime     = true
  }
  
  debug_metrics {
    disable_high_cardinality_metrics = true
  }
}

// File exporter with compression
otelcol.exporter.file "compressed_output" {
  path        = "/tmp/alloy-compressed.json"
  format      = "json"
  compression = "zstd"
  
  debug_metrics {
    disable_high_cardinality_metrics = true
  }
}

// Note: Debug exporter removed as it requires --stability.level=experimental

@drconopoima
Copy link

Thanks for the demonstration. It seems to have produced output in some format that could be characterized by file extension labels .json.zst.bytecount, but we'll have to instruct in configuration to add the extension labels.

Unrelated to this MR, just as an aside, it seems to have added a byte count even when using json format, while documented behaviour appears to indicate that it's only added for protocol buffers

@madhub
Copy link
Author

madhub commented Oct 2, 2025

Thanks for the demonstration. It seems to have produced output in some format that could be characterized by file extension labels .json.zst.bytecount, but we'll have to instruct in configuration to add the extension labels.

Unrelated to this MR, just as an aside, it seems to have added a byte count even when using json format, while documented behaviour appears to indicate that it's only added for protocol buffers

I also tried setting the format to proto instead of json. The file is zstd-compressed with a length prefix. See the attached image


// File exporter with compression
otelcol.exporter.file "compressed_output" {
  path        = "/tmp/alloy-compressed.proto"
  format      = "proto"
  compression = "zstd"

  debug_metrics {
    disable_high_cardinality_metrics = true
  }
}
image @drconopoima anything pending from my side ?

@drconopoima
Copy link

drconopoima commented Oct 7, 2025

@drconopoima anything pending from my side ?

My doubts are resolved, I'm not an approver/reviewer. In my view, the result of the compression is as they are coded for in the dependencies opentelemetry-contrib, the problem I see, in general the file type prefixed by length is not adequately labeled by a file extension.

From this point onwards I'm not talking about the merge, but the behaviour of added dependencies as per your helpful demonstrations:

If/and/when this is used, if the users aren't aware of the behaviour, many backed-up outputs may get discarded as unprocessable/corrupted if they don't strip out the length prefix (even in json, where the readme says it shouldn't appear), or aren't configuring alloy to append the compression file extension '.zst' suffix.

There is also no hashing to ensure that the length prefix was calculated for the right ensuing payload contents, it would be rather easy to zero-out and re-write the prefix with any other amount that ensures allocated memory will not suffice.

I think the general approach could be to store the hash of the length size alongside the payload, and prefix with the hash of both, then calculate the hash before trusting the length, to ensure it's not tampered with, otherwise consider the length prefix as not being present. Only issue is that this nullifies the desired effect of not needing to load the full file before allocating.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/docs Docs Squad label across all Grafana Labs repos
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Include Community-Supported fileexporter in Grafana Alloy
4 participants