contentanalysis

package module

v0.0.0-...-9da26a4 Latest Latest Go to latest Published: Nov 6, 2025 License: MIT Imports: 12 Imported by: 5

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/fbaube/contentanalysis

Links

Open Source Insights

README ¶

contentanalysis

Examine a content item to determine the (Mime) type of its content.

Documentation ¶

Overview ¶

Package analysis is TBS.

Index ¶

func CollectKeysOfNonNilMapValues(M map[string]*CT.FilePosition) []string
type ContentAnalysis
- func NewContentAnalysis(pFSI *FU.FSItem) (*ContentAnalysis, error)
type Doctype
type MimeType

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func CollectKeysOfNonNilMapValues ¶

func CollectKeysOfNonNilMapValues(M map[string]*CT.FilePosition) []string

Types ¶

type ContentAnalysis ¶

type ContentAnalysis struct {
	// ContypingInfo is simple fields:
	// FileExt MType MimeType's
	XU.ContypingInfo
	// ContentityBasics does NOT include Raw
	// (the entire input content)
	XU.ContentityBasics
	// KeyElms is: (Root,Meta,Text)ElmExtent
	// KeyElmsWithRanges
	// ContentitySections is: Text_raw, Meta_raw, MetaFormat; MetaProps SU.PropSet
	// ContentityRawSections
	// XmlInfo is: XmlPreambleFields, XmlDoctype, XmlDoctypeFields, ENTITY stuff
	// ** XmlInfo **
	// XmlContype is an enum: "Unknown", "DTD", "DTDmod", "DTDent",
	// "RootTagData", "RootTagMixedContent", "MultipleRootTags", "INVALID"}
	XmlContype string
	// XmlPreambleFields is nil if no preamble - it can always
	// default to xmlutils.STD_PreambleFields (from stdlib)
	*XU.ParsedPreamble
	// XmlDoctypeFields is a ptr - nil if ContypingInfo.Doctype
	// is "", i.e. if there is no DOCTYPE declaration
	*XU.ParsedDoctype
	// DitaInfo
	DitaFlavor  string
	DitaContype string
}

ContentAnalysis is the results of content analysis on the contents of a non-embedded [FSItem]. .

func NewContentAnalysis ¶

func NewContentAnalysis(pFSI *FU.FSItem) (*ContentAnalysis, error)

NewContentAnalysis is called only by NewContentityRecord(..). It has very different handling for XML content versus non-XML content. Most of the function is making several checks for the presence of XML. When a file is identified as XML, we have much more info available, so processing becomes both simpler and more complicated.

Binary content is tagged as such and no further examination is done. So, the basic top-level classificaton of content is:

Binary
XML (when a DOCTYPE is detected)
Everything else (incl. plain text, Markdown, and XML/HTML that lacks DOCTYPE)

If the argument is "dirlike" (dir, symlink, etc.), then NewContentAnalysis returns (nil, nil).

If the first argument "sCont" (the content) is less than six bytes, return (nil, nil) to indicate that there is not enough content with which to do anything productive or informative. .

func (*ContentAnalysis) DoAnalysis_bin ¶

func (pCA *ContentAnalysis) DoAnalysis_bin() error

DoAnalysis_bin doesn't do any further processing for binary, cos we basically trust that the sniffed MIME type is sufficient, and return. .

func (*ContentAnalysis) DoAnalysis_sch ¶

func (pCA *ContentAnalysis) DoAnalysis_sch() error

DoAnalysis_sch will handle DTDs and related files, and the code is mostly written but not yet integrated, so this func doesn't really worry about it yet. .

func (*ContentAnalysis) DoAnalysis_txt ¶

func (pCA *ContentAnalysis) DoAnalysis_txt(sCont string) error

DoAnalysis_txt is called when the content is identified as non-XML. It does not expect to see binary content. .

func (*ContentAnalysis) DoAnalysis_xml ¶

func (pCA *ContentAnalysis) DoAnalysis_xml(pXP *XU.XmlPeek, sCont string) error

func (ContentAnalysis) IsXML ¶

func (pCA ContentAnalysis) IsXML() bool

IsXML is true for all XML, including all HTML.

func (ContentAnalysis) RawType ¶

func (pCA ContentAnalysis) RawType() SU.Raw_type

MarkupType returns an enum with values of SU.Raw_type_*

func (*ContentAnalysis) String ¶

func (pCA *ContentAnalysis) String() string

type Doctype ¶

type Doctype string

type MimeType ¶

type MimeType string

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL