Documentation
¶
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type Article ¶
type Article struct {
// Title is the heading that preceeds the article’s content, and the basis
// for the article’s page name and URL. It indicates what the article is
// about, and distinguishes it from other articles. The title may simply
// be the name of the subject of the article, or it may be a description
// of the topic.
Title string
// Byline is a printed line of text accompanying a news story, article, or
// the like, giving the author’s name
Byline string
// Dir is the direction of the text in the article.
//
// Either Left-to-Right (LTR) or Right-to-Left (RTL).
Dir string
// Content is the relevant text in the article with HTML tags.
Content string
// TextContent is the relevant text in the article without HTML tags.
TextContent string
// Excerpt is the summary for the relevant text in the article.
Excerpt string
// SiteName is the name of the original publisher website.
SiteName string
// Favicon (short for favorite icon) is a file containing one or more small
// icons, associated with a particular website or web page. A web designer
// can create such an icon and upload it to a website (or web page) by
// several means, and graphical web browsers will then make use of it.
Favicon string
// Image is an image URL which represents the article’s content.
Image string
// Length is the amount of characters in the article.
Length int
// Node is the first element in the HTML document.
Node *html.Node
}
Article represents the metadata and content of the article.
type Readability ¶
type Readability struct {
// MaxElemsToParse is the optional maximum number of HTML nodes to parse
// from the document. If the number of elements in the document is higher
// than this number, the operation immediately errors.
MaxElemsToParse int
// NTopCandidates is the number of top candidates to consider when the
// parser is analysing how tight the competition is among candidates.
NTopCandidates int
// CharThresholds is the default number of chars an article must have in
// order to return a result.
CharThresholds int
// ClassesToPreserve are the classes that readability sets itself.
ClassesToPreserve []string
// TagsToScore is element tags to score by default.
TagsToScore []string
// contains filtered or unexported fields
}
Readability is an HTML parser that reads and extract relevant content.
func New ¶
func New() *Readability
New returns new Readability with sane defaults to parse simple documents.
func (*Readability) IsReadable ¶
func (r *Readability) IsReadable(input io.Reader) bool
IsReadable decides whether the document is usable or not without parsing the whole thing. In the original `mozilla/readability` library, this method is located in `Readability-readable.js`.
Click to show internal directories.
Click to hide internal directories.