kolayxlsxstream

package module
v1.0.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 1, 2025 License: MIT Imports: 13 Imported by: 0

README

KolayXlsxStream - High-Performance XLSX Streaming for Go

Go Reference Go Report Card

A high-performance Go library for streaming XLSX files with constant memory usage and direct S3 support. Export millions of rows without worrying about memory constraints.

🚀 Features

  • Constant Memory Usage (O(1)): Memory usage remains constant regardless of file size (<1MB for local, ~33MB for S3)
  • Direct S3 Streaming: Stream directly to AWS S3 using multipart uploads (no temporary files)
  • Zero Disk I/O: No temporary files created during export
  • Blazing Fast: 600,000+ rows/second (local file), 50,000-110,000 rows/second (S3)
  • Automatic Multi-Sheet: Automatically creates new sheets when Excel's 1,048,576 row limit is reached
  • Configurable Compression: Adjust ZIP compression level (0-9) for speed vs. size tradeoffs
  • Type Safety: Native Go types (string, int, float64, bool, etc.)
  • Production Tested: Successfully exported 2 million rows (60MB files) with AWS S3
  • Clean API: Simple, intuitive interface inspired by the PHP version

📦 Installation

go get github.com/turgutahmet/kolayxlsxstream

🎯 Quick Start

Basic Example - Local File
package main

import (
    "log"
    "github.com/turgutahmet/kolayxlsxstream"
)

func main() {
    // Create a file sink
    sink, err := kolayxlsxstream.NewFileSink("output.xlsx")
    if err != nil {
        log.Fatal(err)
    }

    // Create a writer
    writer := kolayxlsxstream.NewWriter(sink)

    // Start file with headers
    headers := []interface{}{"Name", "Email", "Age"}
    writer.StartFile(headers)

    // Write rows
    writer.WriteRow([]interface{}{"John Doe", "[email protected]", 30})
    writer.WriteRow([]interface{}{"Jane Smith", "[email protected]", 25})

    // Finish and get statistics
    stats, err := writer.FinishFile()
    if err != nil {
        log.Fatal(err)
    }

    log.Printf("Wrote %d rows in %.2f seconds", stats.TotalRows, stats.Duration)
}
S3 Streaming Example
package main

import (
    "context"
    "log"

    "github.com/aws/aws-sdk-go-v2/config"
    "github.com/aws/aws-sdk-go-v2/service/s3"
    "github.com/turgutahmet/kolayxlsxstream"
)

func main() {
    // Load AWS config
    cfg, _ := config.LoadDefaultConfig(context.TODO())
    client := s3.NewFromConfig(cfg)

    // Create S3 sink
    ctx := context.Background()
    sink, err := kolayxlsxstream.NewS3Sink(ctx, client, "my-bucket", "exports/report.xlsx")
    if err != nil {
        log.Fatal(err)
    }

    // Create writer and export
    writer := kolayxlsxstream.NewWriter(sink)
    writer.StartFile([]interface{}{"ID", "Product", "Quantity", "Price"})

    // Stream 1 million rows to S3
    for i := 1; i <= 1000000; i++ {
        writer.WriteRow([]interface{}{i, "Product", 10, 99.99})
    }

    stats, _ := writer.FinishFile()
    log.Printf("Uploaded to S3: %d rows, %.0f rows/sec", stats.TotalRows, stats.RowsPerSecond)
}

📖 Documentation

Core Components
Writer

The main XLSX writer that handles streaming data to a sink.

// Create writer with default config
writer := kolayxlsxstream.NewWriter(sink)

// Create writer with custom config
config := kolayxlsxstream.DefaultConfig()
config.CompressionLevel = 9  // Maximum compression
config.MaxRowsPerSheet = 500000  // Custom row limit
writer := kolayxlsxstream.NewWriter(sink, config)
Methods
  • StartFile(headers ...[]interface{}) error: Initialize the file, optionally with headers
  • WriteRow(values []interface{}) error: Write a single row
  • WriteRows(rows [][]interface{}) error: Write multiple rows
  • FinishFile() (*Stats, error): Finalize the file and return statistics
  • SetCompressionLevel(level int) error: Set compression level (0-9)
  • SetBufferSize(size int) error: Set buffer size
  • SetMaxRowsPerSheet(rows int) error: Set maximum rows per sheet
Sinks
FileSink

Writes to a local file.

sink, err := kolayxlsxstream.NewFileSink("/path/to/output.xlsx")
S3Sink

Streams directly to AWS S3 using multipart uploads.

ctx := context.Background()
options := kolayxlsxstream.DefaultS3Options()
options.PartSize = 10 * 1024 * 1024  // 10MB parts
options.ACL = types.ObjectCannedACLPrivate
options.StorageClass = types.StorageClassIntelligentTiering

sink, err := kolayxlsxstream.NewS3Sink(ctx, s3Client, "bucket", "key", options)
Configuration
type Config struct {
    CompressionLevel int    // ZIP compression (0-9, default: 6)
    BufferSize       int    // Buffer size in bytes (default: 64KB)
    MaxRowsPerSheet  int    // Max rows per sheet (default: 1,048,576)
    SheetNamePrefix  string // Sheet name prefix (default: "Sheet")
}
Statistics
type Stats struct {
    TotalRows      int64   // Total data rows written
    TotalSheets    int     // Total sheets created
    FileSize       int64   // Total file size (if available)
    Duration       float64 // Duration in seconds
    RowsPerSecond  float64 // Average rows/second
    BytesPerSecond float64 // Average bytes/second
}

🎨 Complete Examples

The examples/ directory contains real-world usage scenarios:

Basic Usage
cd examples/basic && go run main.go
Large Files (250k rows, multi-sheet)
cd examples/large && go run main.go
S3 Streaming (with dry-run mode)
cd examples/s3 && go run main.go -dry-run -rows 50000
# Or with actual S3:
go run main.go -bucket my-bucket -rows 100000
CSV to XLSX Conversion
cd examples/csv-to-xlsx && go run main.go -input sample.csv -output output.xlsx
Database Export
cd examples/database && go run main.go -sample 100000 -output export.xlsx
Memory Profiling
cd examples/memory && go run main.go

🎨 Advanced Code Examples

Large Dataset with Progress Tracking
sink, _ := kolayxlsxstream.NewFileSink("large.xlsx")
writer := kolayxlsxstream.NewWriter(sink)

headers := []interface{}{"ID", "Name", "Email", "Score"}
writer.StartFile(headers)

totalRows := 5000000
batchSize := 1000

for i := 0; i < totalRows; i += batchSize {
    batch := make([][]interface{}, batchSize)
    for j := 0; j < batchSize; j++ {
        rowNum := i + j + 1
        batch[j] = []interface{}{
            rowNum,
            fmt.Sprintf("User %d", rowNum),
            fmt.Sprintf("user%[email protected]", rowNum),
            float64(rowNum % 100),
        }
    }
    writer.WriteRows(batch)

    if (i+batchSize)%100000 == 0 {
        fmt.Printf("Progress: %d rows\n", i+batchSize)
    }
}

stats, _ := writer.FinishFile()
fmt.Printf("Done! %d rows in %.2f seconds\n", stats.TotalRows, stats.Duration)
Custom Compression for Speed
// Ultra-fast mode (minimal compression)
config := kolayxlsxstream.DefaultConfig()
config.CompressionLevel = 1  // Fastest compression
config.BufferSize = 128 * 1024  // 128KB buffer

sink, _ := kolayxlsxstream.NewFileSink("fast.xlsx")
writer := kolayxlsxstream.NewWriter(sink, config)
Multi-Sheet Export
// Automatically creates new sheets every 100k rows
config := kolayxlsxstream.DefaultConfig()
config.MaxRowsPerSheet = 100000
config.SheetNamePrefix = "Data"

writer := kolayxlsxstream.NewWriter(sink, config)
writer.StartFile([]interface{}{"Column1", "Column2"})

// Write 250k rows -> creates 3 sheets: Data1, Data2, Data3
for i := 0; i < 250000; i++ {
    writer.WriteRow([]interface{}{i, fmt.Sprintf("Value %d", i)})
}

stats, _ := writer.FinishFile()
fmt.Printf("Created %d sheets\n", stats.TotalSheets)  // Output: 3

🔧 Performance Tips

  1. Batch Writes: Use WriteRows() instead of WriteRow() when possible
  2. Compression: Lower compression (1-3) for speed, higher (6-9) for file size
  3. Buffer Size: Increase buffer size for better throughput (64KB-256KB)
  4. S3 Part Size: Use larger parts (32MB-100MB) for better S3 performance

📊 Performance Benchmarks

Comprehensive Benchmark Results

Tested on Apple M4, Go 1.23, Compression Level 1 (fastest)

Rows Local Speed Local Memory Local Time S3 Speed S3 Memory S3 Time File Size
100 48,631 rows/s 0 MB 0.00s 201 rows/s 1 MB (±0) 1.02s 0.00 MB
500 136,626 rows/s 0 MB 0.00s 822 rows/s 1 MB (±0) 1.12s 0.02 MB
1K 198,354 rows/s 0 MB 0.01s 1,570 rows/s 1 MB (±1) 1.14s 0.03 MB
5K 313,072 rows/s 0 MB 0.02s 5,289 rows/s 1 MB (±2) 1.45s 0.15 MB
10K 479,633 rows/s 0 MB 0.02s 3,041 rows/s 1 MB (±4) 3.80s 0.30 MB
25K 559,883 rows/s 0 MB 0.04s 4,782 rows/s 2 MB (±5) 5.74s 0.75 MB
50K 588,989 rows/s 0 MB 0.08s 32,805 rows/s 3 MB (±9) 2.04s 1.50 MB
100K 607,758 rows/s 0 MB 0.16s 12,355 rows/s 5 MB (±15) 8.58s 3.00 MB
250K 598,598 rows/s 0 MB 0.42s 112,810 rows/s 9 MB (±26) 2.77s 7.54 MB
500K 598,574 rows/s 0 MB 0.84s 107,857 rows/s 17 MB (±49) 5.14s 15.10 MB
750K 599,608 rows/s 0 MB 1.25s 64,397 rows/s 33 MB (±96) 12.13s 22.67 MB
1.0M 595,634 rows/s 0 MB 1.68s 54,525 rows/s 33 MB (±96) 18.83s 30.21 MB
1.5M 577,251 rows/s 0 MB 2.60s 72,855 rows/s 33 MB (±96) 21.10s 45.38 MB
2.0M 576,384 rows/s 0 MB 3.47s 91,518 rows/s 33 MB (±96) 22.37s 60.58 MB

Note: Tests with 1M+ rows automatically create multiple sheets (Excel limit: 1,048,576 rows per sheet)

Note: ± values in S3 Memory column indicate memory fluctuation during streaming due to periodic part uploads (32MB buffer)

Understanding Memory Behavior
Local File System: True O(1) Memory
  • Constant Memory: 0-0.5MB regardless of file size
  • No Growth: Memory stays flat even for millions of rows
  • Speed: 580,000-600,000 rows/second consistently
  • Scalable: Successfully tested up to 2 million rows
S3 Streaming: Controlled Memory Growth

The ± values in S3 memory represent normal memory fluctuation during streaming:

Buffer Accumulation Phase (↑ Memory Growth)

  • Data is compressed and buffered until reaching 32MB part size
  • Memory grows gradually as buffer fills

Part Upload Phase (↓ Memory Drop)

  • When buffer reaches 32MB, it's uploaded to S3
  • After upload, memory drops back to baseline
  • This creates the characteristic sawtooth pattern
Memory
  ▲
78MB │     ╱╲      ╱╲      ╱╲
     │    ╱  ╲    ╱  ╲    ╱  ╲
33MB │   ╱    ╲  ╱    ╲  ╱    ╲
     │  ╱      ╲╱      ╲╱      ╲
2MB  │─╯
     └────────────────────────────▶ Time
       ↑Upload  ↑Upload  ↑Upload

Example: 1M Rows Test

  • Average memory: 33MB
  • Fluctuation: ±96MB
  • Pattern: Memory oscillates between ~2MB (after upload) and ~129MB (before upload)
  • This is completely normal and expected behavior for streaming
Performance Highlights
  • Local File System: ~600,000 rows/second with true O(1) memory
  • S3 Streaming: 50,000-110,000 rows/second with controlled memory
  • Memory Efficiency: Local uses <1MB, S3 averages 33MB per million rows
  • Multi-sheet Support: Automatic sheet creation at Excel's 1,048,576 row limit
  • Production Ready: Successfully tested with 2 million rows (60MB files)
Comparison with Other Go Libraries
Library 1M Rows Time Memory Usage Disk Usage S3 Support
excelize ~45 sec ~500MB+ Full file Indirect
xlsx ~60 sec ~800MB+ Full file Indirect
KolayXlsxStream (Local) ✅ 1.68s ✅ 0 MB ✅ Zero N/A
KolayXlsxStream (S3) ✅ 18.83s ✅ 33MB avg ✅ Zero ✅ Direct

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

📄 License

MIT License - see LICENSE file for details.

🙏 Credits

Inspired by the PHP version: kolay-xlsx-stream

📞 Support


Made with ❤️ for the Go community

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func S3SinkFromReader

func S3SinkFromReader(ctx context.Context, client S3ClientAPI, bucket, key string, reader io.Reader, options ...*S3Options) error

S3SinkFromReader creates an S3Sink and copies data from a reader This is a convenience function for uploading existing data to S3

Types

type Config

type Config struct {
	// CompressionLevel sets the ZIP compression level (0-9, default: 6)
	// 0 = no compression, 9 = maximum compression
	CompressionLevel int

	// BufferSize sets the buffer size in bytes (default: 64KB)
	BufferSize int

	// MaxRowsPerSheet sets the maximum rows per sheet (default: 1048576)
	// When this limit is reached, a new sheet is automatically created
	MaxRowsPerSheet int

	// SheetNamePrefix is the prefix for auto-generated sheet names (default: "Sheet")
	SheetNamePrefix string
}

Config holds configuration for the XLSX writer

func DefaultConfig

func DefaultConfig() *Config

DefaultConfig returns the default configuration

type FileSink

type FileSink struct {
	// contains filtered or unexported fields
}

FileSink writes data to a local file

func NewFileSink

func NewFileSink(path string) (*FileSink, error)

NewFileSink creates a new FileSink that writes to the specified file path

func (*FileSink) Close

func (fs *FileSink) Close() error

Close implements io.Closer interface

func (*FileSink) Path

func (fs *FileSink) Path() string

Path returns the file path

func (*FileSink) Write

func (fs *FileSink) Write(p []byte) (n int, err error)

Write implements io.Writer interface

type S3ClientAPI added in v1.0.0

type S3ClientAPI interface {
	CreateMultipartUpload(ctx context.Context, params *s3.CreateMultipartUploadInput, optFns ...func(*s3.Options)) (*s3.CreateMultipartUploadOutput, error)
	UploadPart(ctx context.Context, params *s3.UploadPartInput, optFns ...func(*s3.Options)) (*s3.UploadPartOutput, error)
	CompleteMultipartUpload(ctx context.Context, params *s3.CompleteMultipartUploadInput, optFns ...func(*s3.Options)) (*s3.CompleteMultipartUploadOutput, error)
	AbortMultipartUpload(ctx context.Context, params *s3.AbortMultipartUploadInput, optFns ...func(*s3.Options)) (*s3.AbortMultipartUploadOutput, error)
}

S3ClientAPI defines the interface for S3 operations needed by S3Sink

type S3Options

type S3Options struct {
	// PartSize is the size of each multipart upload part in bytes (default: 32MB)
	// Must be at least 5MB (except for the last part)
	PartSize int64

	// ACL sets the canned ACL for the object (e.g., "private", "public-read")
	ACL types.ObjectCannedACL

	// ContentType sets the MIME type of the object (default: "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet")
	ContentType string

	// Metadata sets custom metadata for the object
	Metadata map[string]string

	// StorageClass sets the storage class (e.g., STANDARD, INTELLIGENT_TIERING, GLACIER)
	StorageClass types.StorageClass

	// ServerSideEncryption sets the server-side encryption method (e.g., AES256, aws:kms)
	ServerSideEncryption types.ServerSideEncryption

	// SSEKMSKeyId sets the KMS key ID for server-side encryption with KMS
	SSEKMSKeyId *string
}

S3Options contains optional configuration for S3 uploads

func DefaultS3Options

func DefaultS3Options() *S3Options

DefaultS3Options returns the default S3 options

type S3Sink

type S3Sink struct {
	// contains filtered or unexported fields
}

S3Sink writes data to AWS S3 using multipart upload

func NewS3Sink

func NewS3Sink(ctx context.Context, client S3ClientAPI, bucket, key string, options ...*S3Options) (*S3Sink, error)

NewS3Sink creates a new S3 sink that writes to the specified bucket and key

func (*S3Sink) Abort

func (s *S3Sink) Abort() error

Abort cancels the multipart upload (useful for error handling)

func (*S3Sink) Close

func (s *S3Sink) Close() error

Close implements io.Closer interface and completes the multipart upload

func (*S3Sink) PartCount

func (s *S3Sink) PartCount() int

PartCount returns the number of parts uploaded

func (*S3Sink) TotalBytes

func (s *S3Sink) TotalBytes() int64

TotalBytes returns the total bytes written so far

func (*S3Sink) Write

func (s *S3Sink) Write(p []byte) (n int, err error)

Write implements io.Writer interface

type Sink

type Sink interface {
	io.Writer
	io.Closer
}

Sink is the interface that wraps basic Write and Close methods for streaming data. Implementations can write to local files, S3, or any other destination.

type Stats

type Stats struct {
	TotalRows      int64   // Total number of data rows written (excluding headers)
	TotalSheets    int     // Total number of sheets created
	FileSize       int64   // Total file size in bytes
	Duration       float64 // Total duration in seconds
	RowsPerSecond  float64 // Average rows per second
	BytesPerSecond float64 // Average bytes per second
}

Stats contains statistics about the written XLSX file

type Writer

type Writer struct {
	// contains filtered or unexported fields
}

Writer is the main XLSX streaming writer

func NewWriter

func NewWriter(sink Sink, config ...*Config) *Writer

NewWriter creates a new XLSX writer with the given sink and optional config

func (*Writer) FinishFile

func (w *Writer) FinishFile() (*Stats, error)

FinishFile finalizes the XLSX file and returns statistics

func (*Writer) SetBufferSize

func (w *Writer) SetBufferSize(size int) error

SetBufferSize sets the buffer size

func (*Writer) SetCompressionLevel

func (w *Writer) SetCompressionLevel(level int) error

SetCompressionLevel sets the ZIP compression level (0-9)

func (*Writer) SetMaxRowsPerSheet

func (w *Writer) SetMaxRowsPerSheet(rows int) error

SetMaxRowsPerSheet sets the maximum rows per sheet

func (*Writer) StartFile

func (w *Writer) StartFile(headers ...[]interface{}) error

StartFile initializes the XLSX file and optionally writes headers

func (*Writer) WriteRow

func (w *Writer) WriteRow(values []interface{}) error

WriteRow writes a single row to the current sheet

func (*Writer) WriteRows

func (w *Writer) WriteRows(rows [][]interface{}) error

WriteRows writes multiple rows to the current sheet

Directories

Path Synopsis
examples
basic command
csv-to-xlsx command
database command
large command
memory command
s3 command

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL