-
Notifications
You must be signed in to change notification settings - Fork 545
Document simple_aggregation for kinesis/firehose #2299
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
WalkthroughAdds documentation for a new Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes
Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (2)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
pipeline/outputs/firehose.md (1)
24-25: Fix minor typos in adjacent rows (compression,role_arn).
Line 24: “arrowis only an available…” → “only available…”. Line 25: remove the stray backtick in “(for cross account access`)”.
🧹 Nitpick comments (1)
pipeline/outputs/firehose.md (1)
30-30: Clarify that aggregated records concatenated with newlines require downstream consumers to split on newlines to recover individual events.
The documented byte limit (1,024,000 bytes) is correct per AWS Firehose limits. However, the documentation should note that when simple_aggregation concatenates multiple log records with newlines into a single Firehose record, downstream consumers must split the received record on\nto recover individual events. Also mention that if log payloads themselves contain embedded newlines, this framing approach may require additional handling depending on serialization.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
pipeline/outputs/firehose.md(1 hunks)pipeline/outputs/kinesis.md(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-12-12T14:30:10.698Z
Learnt from: kalavt
Repo: fluent/fluent-bit-docs PR: 2294
File: pipeline/inputs/kafka.md:147-168
Timestamp: 2025-12-12T14:30:10.698Z
Learning: In Fluent Bit v4.0.4+, when using AWS MSK IAM authentication (rdkafka.sasl.mechanism: aws_msk_iam), the rdkafka.security.protocol is automatically set to SASL_SSL and the AWS region is auto-detected from the broker hostname for standard MSK endpoints. The aws_msk_iam_cluster_arn parameter was removed - users only need to set rdkafka.sasl.mechanism: aws_msk_iam (and optionally aws_region for custom DNS/PrivateLink).
Applied to files:
pipeline/outputs/kinesis.md
| | `sts_endpoint` | Custom endpoint for the STS API. | _none_ | | ||
| | `auto_retry_requests` | Immediately retry failed requests to AWS services once. This option doesn't affect the normal Fluent Bit retry mechanism with backoff. Instead, it enables an immediate retry with no delay for networking errors, which might help improve throughput when there are transient/random networking issues. | `true` | | ||
| | `external_id` | Specify an external ID for the STS API, can be used with the `role_arn` parameter if your role requires an external ID. | _none_ | | ||
| | `simple_aggregation` | Enable simple aggregation to combine multiple records into single API calls. This reduces the number of requests and can improve throughput. When enabled, multiple log records are concatenated with newlines and sent as a single record to Kinesis, up to the maximum record size limit (1,048,556 bytes). | `false` | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🌐 Web query:
AWS Kinesis Data Streams maximum record size limit bytes PutRecords
💡 Result:
Maximum per-record size for PutRecords: 10 MiB (10,485,760 bytes). [1][2]
Sources:
[1] AWS Kinesis Data Streams quotas and limits.
[2] PutRecords API reference (each record can be as large as 10 MiB; whole request limit 10 MiB).
🌐 Web query:
Fluent Bit out_kinesis_streams simple_aggregation maximum record size 1048556 bytes
💡 Result:
Fluent Bit's Firehose/Kinesis output aggregation limit is 1 MiB (1,048,576 bytes) — the plugin's simple_aggregation will join records only while the aggregate doesn't exceed that 1 MiB maximum. [1][2]
If you see 1,048,556 bytes (1048556) that's almost certainly a typo/off-by-20 error; the documented limit is 1,048,576 bytes (1 MiB). [1][2]
Sources:
- amazon-kinesis-firehose-for-fluent-bit plugin docs. [1]
- AWS Kinesis Data Streams limits (data payload up to 10 MiB; per-record/aggregation guidance). [2]
Correct the Kinesis max record size to 1,048,576 bytes (1 MiB), not 1,048,556.
The value 1,048,556 appears to be a typo. AWS Kinesis Data Streams supports up to 10 MiB per record; Fluent Bit's simple_aggregation enforces a 1 MiB (1,048,576 bytes) limit for aggregated payloads. Also note that when simple_aggregation concatenates records with newlines, downstream consumers must split and parse them individually.
🤖 Prompt for AI Agents
In pipeline/outputs/kinesis.md around line 30, the documented maximum record
size is incorrect (1,048,556 bytes); update the value to 1,048,576 bytes (1 MiB)
to match Fluent Bit's simple_aggregation enforced limit, and keep the note that
concatenated records are newline-delimited so downstream consumers must split
and parse each record individually.
Signed-off-by: Shelby Hagman <[email protected]>
f352e35 to
ce08c15
Compare
Documentation for simple_aggregation added to out_kinesis_firehose and out_kinesis_streams - fluent/fluent-bit#11284
Summary by CodeRabbit
simple_aggregationconfiguration option for Firehose and Kinesis outputs. When enabled, multiple records are concatenated with newlines and sent as a single API call (up to 1,024,000 bytes per record). Defaults to false.✏️ Tip: You can customize this high-level summary in your review settings.