Documentation
¶
Overview ¶
Package creep implements a web crawler. It reads web pages and follows links to the rest of
the web, recursively, ad infinitum, within the limits provided. We use the term creep to avoid name clashes with other software called 'walk' and 'crawl'. I'm thinking of changing it to 'stroll'.
Index ¶
Constants ¶
View Source
const ExitCommandUrl string = "ExitExitExitExit" // Fake Url that tells goroutine to exit.
Variables ¶
This section is empty.
Functions ¶
func CreepWebSites ¶
func CreepWebSites(urls []string, maxPermittedUrls int, maxGoRo int, justOneDomain bool) <-chan *ResponseFromWeb
Main External entry point for package creep. Call only once at a time, but you can give it an array of urls to process.
Types ¶
type JobDataArray ¶
type JobDataArray struct {
Tests []JobData
}
var JobDescription JobDataArray
func LoadJobData ¶
func LoadJobData(filename string) *JobDataArray
type RequestUrl ¶
type RequestUrl struct {
Url string
}
For a channel of url requests. At one time I thought each request would be more than a string.
Click to show internal directories.
Click to hide internal directories.