Documentation
¶
Overview ¶
Package analysis is TBS.
Index ¶
- func CollectKeysOfNonNilMapValues(M map[string]*CT.FilePosition) []string
- type ContentAnalysis
- func (pCA *ContentAnalysis) DoAnalysis_bin() error
- func (pCA *ContentAnalysis) DoAnalysis_sch() error
- func (pCA *ContentAnalysis) DoAnalysis_txt(sCont string) error
- func (pCA *ContentAnalysis) DoAnalysis_xml(pXP *XU.XmlPeek, sCont string) error
- func (pCA ContentAnalysis) IsXML() bool
- func (pCA ContentAnalysis) RawType() SU.Raw_type
- func (pCA *ContentAnalysis) String() string
- type Doctype
- type MimeType
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func CollectKeysOfNonNilMapValues ¶
func CollectKeysOfNonNilMapValues(M map[string]*CT.FilePosition) []string
Types ¶
type ContentAnalysis ¶
type ContentAnalysis struct {
// ContypingInfo is simple fields:
// FileExt MType MimeType's
XU.ContypingInfo
// ContentityBasics does NOT include Raw
// (the entire input content)
XU.ContentityBasics
// KeyElms is: (Root,Meta,Text)ElmExtent
// KeyElmsWithRanges
// ContentitySections is: Text_raw, Meta_raw, MetaFormat; MetaProps SU.PropSet
// ContentityRawSections
// XmlInfo is: XmlPreambleFields, XmlDoctype, XmlDoctypeFields, ENTITY stuff
// ** XmlInfo **
// XmlContype is an enum: "Unknown", "DTD", "DTDmod", "DTDent",
// "RootTagData", "RootTagMixedContent", "MultipleRootTags", "INVALID"}
XmlContype string
// XmlPreambleFields is nil if no preamble - it can always
// default to xmlutils.STD_PreambleFields (from stdlib)
*XU.ParsedPreamble
// XmlDoctypeFields is a ptr - nil if ContypingInfo.Doctype
// is "", i.e. if there is no DOCTYPE declaration
*XU.ParsedDoctype
// DitaInfo
DitaFlavor string
DitaContype string
}
ContentAnalysis is the results of content analysis on the contents of a non-embedded [FSItem]. .
func NewContentAnalysis ¶
func NewContentAnalysis(pFSI *FU.FSItem) (*ContentAnalysis, error)
NewContentAnalysis is called only by NewContentityRecord(..). It has very different handling for XML content versus non-XML content. Most of the function is making several checks for the presence of XML. When a file is identified as XML, we have much more info available, so processing becomes both simpler and more complicated.
Binary content is tagged as such and no further examination is done. So, the basic top-level classificaton of content is:
- Binary
- XML (when a DOCTYPE is detected)
- Everything else (incl. plain text, Markdown, and XML/HTML that lacks DOCTYPE)
If the argument is "dirlike" (dir, symlink, etc.), then NewContentAnalysis returns (nil, nil).
If the first argument "sCont" (the content) is less than six bytes, return (nil, nil) to indicate that there is not enough content with which to do anything productive or informative. .
func (*ContentAnalysis) DoAnalysis_bin ¶
func (pCA *ContentAnalysis) DoAnalysis_bin() error
DoAnalysis_bin doesn't do any further processing for binary, cos we basically trust that the sniffed MIME type is sufficient, and return. .
func (*ContentAnalysis) DoAnalysis_sch ¶
func (pCA *ContentAnalysis) DoAnalysis_sch() error
DoAnalysis_sch will handle DTDs and related files, and the code is mostly written but not yet integrated, so this func doesn't really worry about it yet. .
func (*ContentAnalysis) DoAnalysis_txt ¶
func (pCA *ContentAnalysis) DoAnalysis_txt(sCont string) error
DoAnalysis_txt is called when the content is identified as non-XML. It does not expect to see binary content. .
func (*ContentAnalysis) DoAnalysis_xml ¶
func (pCA *ContentAnalysis) DoAnalysis_xml(pXP *XU.XmlPeek, sCont string) error
func (ContentAnalysis) IsXML ¶
func (pCA ContentAnalysis) IsXML() bool
IsXML is true for all XML, including all HTML.
func (ContentAnalysis) RawType ¶
func (pCA ContentAnalysis) RawType() SU.Raw_type
MarkupType returns an enum with values of SU.Raw_type_*
func (*ContentAnalysis) String ¶
func (pCA *ContentAnalysis) String() string