document

package
v1.6.7 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Sep 16, 2025 License: MIT Imports: 16 Imported by: 0

Documentation

Overview

Package document contains Document structs and Parsers prepare for RAG

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func EscapeMarkdown added in v1.1.8

func EscapeMarkdown(s string) string

EscapeMarkdown escapes special characters in a string for Markdown

func StripUnprintable added in v1.1.8

func StripUnprintable(s string) string

Types

type Content added in v1.1.9

type Content struct {
	// contains filtered or unexported fields
}

Content is a document container with metadata

func (*Content) Bytes added in v1.1.9

func (d *Content) Bytes() []byte

func (*Content) Meta added in v1.1.9

func (d *Content) Meta() map[string]string

func (*Content) ReadFrom added in v1.1.9

func (d *Content) ReadFrom(r io.Reader) (int64, error)

func (*Content) String added in v1.1.9

func (d *Content) String() string

func (*Content) Write added in v1.1.9

func (d *Content) Write(p []byte) (n int, err error)

Write implements the io.Writer interface. It appends the given data to the content of the document.

func (*Content) WriteAt added in v1.1.9

func (d *Content) WriteAt(p []byte, off int64) (n int, err error)

WriteAt implements the io.WriterAt interface. It writes the given data at the specified offset in the document's content. If the offset is beyond the current content length, it pads with zeros.

func (*Content) WriteTo added in v1.1.9

func (d *Content) WriteTo(w io.Writer) (n int64, err error)

WriteTo implements the io.WriterTo interface. It writes the content of the document to the provided io.Writer.

type Document

type Document interface {
	io.ReaderFrom
	io.Writer
	io.WriterAt
	io.WriterTo
	String() string
	Meta() map[string]string
}

type File

type File struct {
	Content
	// contains filtered or unexported fields
}

func NewFile

func NewFile(fname string) (*File, error)

func (*File) Close

func (d *File) Close() error

func (*File) Read

func (d *File) Read(p []byte) (int, error)

func (*File) ReadAt added in v1.1.9

func (d *File) ReadAt(p []byte, off int64) (int, error)

func (*File) Size added in v1.1.9

func (d *File) Size() int64

func (*File) Stat added in v1.1.9

func (d *File) Stat() (os.FileInfo, error)

type FileInfo added in v1.1.9

type FileInfo struct {
	// contains filtered or unexported fields
}

FileInfo represents the file information for an S3 object.

func (*FileInfo) IsDir added in v1.1.9

func (s *FileInfo) IsDir() bool

IsDir returns true if the file is a directory.

func (*FileInfo) ModTime added in v1.1.9

func (s *FileInfo) ModTime() time.Time

ModTime returns the modification time.

func (*FileInfo) Mode added in v1.1.9

func (s *FileInfo) Mode() fs.FileMode

Mode returns the file mode bits.

func (*FileInfo) Name added in v1.1.9

func (s *FileInfo) Name() string

Name returns the base name of the file.

func (*FileInfo) Size added in v1.1.9

func (s *FileInfo) Size() int64

Size returns the length in bytes for regular files; system - dependent for others.

func (*FileInfo) Sys added in v1.1.9

func (s *FileInfo) Sys() interface{}

Sys returns the underlying data source (can return nil).

type Http

type Http struct {
	Content
	// contains filtered or unexported fields
}

func NewHttp

func NewHttp(opts ...HttpOption) (*Http, error)

func (*Http) Close added in v1.1.9

func (h *Http) Close() error

func (*Http) Read

func (h *Http) Read(p []byte) (n int, err error)

Read reads up to len(p) bytes into p.

func (*Http) ReadAt added in v1.1.9

func (h *Http) ReadAt(p []byte, off int64) (n int, err error)

ReadAt implements the io.ReaderAt interface.

func (*Http) Seek added in v1.1.9

func (h *Http) Seek(offset int64, whence int) (int64, error)

Seek sets the offset for the next Read or Write on file to offset, interpreted according to whence: 0 means relative to the origin of the file, 1 means relative to the current offset, and 2 means relative to the end.

func (*Http) Size added in v1.1.9

func (h *Http) Size() int64

func (*Http) Stat added in v1.1.9

func (h *Http) Stat() (os.FileInfo, error)

Stat returns the FileInfo structure describing file.

type HttpConfig

type HttpConfig struct {
	// contains filtered or unexported fields
}

type HttpOption

type HttpOption func(*HttpConfig)

func WithHttpClient

func WithHttpClient(client *http.Client) HttpOption

func WithHttpMethod

func WithHttpMethod(method string) HttpOption

func WithHttpURL

func WithHttpURL(url string) HttpOption

func WithPayload

func WithPayload(payload io.Reader) HttpOption

type Parser

type Parser interface {
	Parse(context.Context, ParserReader, io.Writer) error
}

type ParserReader added in v1.1.9

type ParserReader interface {
	io.Reader
	io.ReaderAt
	Size() int64
}

type S3 added in v1.1.9

type S3 struct {
	Content Content
	// contains filtered or unexported fields
}

func NewS3 added in v1.1.9

func NewS3(opts ...S3Option) (*S3, error)

NewS3 creates a new S3File instance.

func (*S3) Close added in v1.1.9

func (s *S3) Close() error

Close implements the fs.File interface.

func (*S3) Read added in v1.1.9

func (s *S3) Read(p []byte) (n int, err error)

Read implements the io.Reader interface.

func (*S3) ReadAt added in v1.1.9

func (s *S3) ReadAt(p []byte, off int64) (n int, err error)

ReadAt implements the io.ReaderAt interface.

func (*S3) Size added in v1.1.9

func (s *S3) Size() int64

func (*S3) Stat added in v1.1.9

func (s *S3) Stat() (os.FileInfo, error)

Stat implements the fs.File interface.

type S3Option added in v1.1.9

type S3Option func(*S3)

func WithS3Bucket added in v1.1.9

func WithS3Bucket(bucket string) S3Option

func WithS3Client added in v1.1.9

func WithS3Client(clt *s3.Client) S3Option

func WithS3Key added in v1.1.9

func WithS3Key(key string) S3Option

Directories

Path Synopsis
Package parsers include different parsers implementation
Package parsers include different parsers implementation
docx
Package docx a parser for docx
Package docx a parser for docx
html
Package html a parser for html
Package html a parser for html
pdf
Package pdf a parser for PDF
Package pdf a parser for PDF
pptx
Package pptx a Parser for pptx
Package pptx a Parser for pptx
xlsx
Package xlsx a xlsx parser
Package xlsx a xlsx parser

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL