Community First Yorkshire Jobs Scraper
Pricing
from $1.99 / 1,000 results
Community First Yorkshire Jobs Scraper
Scrape jobs and other portfolio content from communityfirstyorkshire.org.uk via WP-JSON portfolio CPT. Filter by taxonomy (default jobs ≈ 6 vacancies). Title, full HTML, location, apply email/URL, best-effort closing date + salary regex. JSON or CSV out.
Pricing
from $1.99 / 1,000 results
Rating
0.0
(0)
Developer
Muhamed Didovic
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
6 days ago
Last modified
Share
Scrape jobs (and other portfolio content) from communityfirstyorkshire.org.uk. Uses the public WP-JSON portfolio custom post type filtered by the portfolio_entries taxonomy (default jobs term = ~6 live vacancies). Each row carries title, full description HTML, location term, apply email/URL (extracted from body), and best-effort closing date + salary. JSON or CSV out, no compute charge per run, just per result.
How it works

✨ Why use this scraper?
Community First Yorkshire (CFY) is the rural voluntary-sector hub for North Yorkshire, York, and the Yorkshire Dales. Tracking who's hiring at rural Yorkshire charities? Cross-region CVS comparisons? Sourcing for paid roles outside the metro areas?
- 🎯 Three starting points. The default Jobs taxonomy filter (set
entityTerms: ["jobs"]), a direct/portfolio-item/<slug>/URL, or any/wp-json/wp/v2/portfolioURL. - ⚡ WP-JSON
portfolioCPT as the data source. Each item is a full WordPress portfolio entry with content, taxonomy, and_embed-able media. - 🏷️
portfolio_entriestaxonomy split. Term names are auto-split intocategories(Jobs, Leadership, Networks, etc.) vslocations(North Yorkshire, York, Pateley Bridge, Homeworking). - 📧 Apply email/URL from body. Regex-extracted from the content HTML (first
mailto:→applyEmail; first outbound http href →externalApplyUrl). - 📅 Closing date + salary (best-effort). Heuristic regex against body plain-text ("Closing date: …", "£X – £Y per annum"). Always falls back gracefully to null.
- 🌐 Beyond jobs. Filter by
volunteering,get-support,leadership,networks,advertise,podcast,membershipto pull other portfolio content. - 📤 Clean exports. One row per item with full HTML description inline. JSON + CSV exported automatically.
🎯 Use cases
| Team | What they build |
|---|---|
| Rural CVS recruiters | Daily new-vacancy feeds for North Yorkshire / York charities |
| Sector publications | Auto-populate Yorkshire voluntary-sector jobs sections |
| Workforce strategy | Rural vs urban pay benchmarks across Yorkshire |
| Aggregators | Apply emails / URLs for redirect-and-track use cases |
| Podcast / content discovery | Pull podcast term for the CFY podcast catalogue |
📥 Supported inputs
| URL pattern | Behaviour |
|---|---|
(empty + entityTerms: ["jobs"]) | Default — Jobs only (~6 vacancies) |
https://www.communityfirstyorkshire.org.uk/portfolio-item/<slug>/ | Single portfolio item |
https://www.communityfirstyorkshire.org.uk/wp-json/wp/v2/portfolio | All portfolio entries (43 items) |
https://www.communityfirstyorkshire.org.uk/wp-json/wp/v2/portfolio?portfolio_entries=87 | Filter by term ID (pass-through) |
Not supported: browser listing pages (CFY has no public /jobs/ page — content is rendered into a masonry on the homepage); hosts outside communityfirstyorkshire.org.uk.
🔄 How it works
- Resolve start URLs — either from explicit
startUrls, or built fromentityTerms(slug → numeric term ID via a known map). - Classify + translate each URL into the canonical
/wp-json/wp/v2/portfolioshape, optionally with?portfolio_entries=<id>&_embed=1. - Walk pagination via
X-WP-TotalPagesfrom the response header. - Parse each portfolio item:
- title, content HTML
portfolio_entriesterm names → split intocategoriesvslocations- body regex → apply email, external URL, closing date, salary (best-effort)
- Push one normalised row per item to the dataset.
⚙️ Input parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
startUrls | array | [] | Direct portfolio-item / WP-JSON URLs. Empty = use entityTerms. |
entityTerms | array | ["jobs"] | portfolio_entries taxonomy slugs to scrape. Allowed: jobs, volunteering, get-support, get-involved, leadership, networks, advertise, podcast, membership. |
enrichTaxonomies | boolean | true | When true, embeds taxonomy term names + featured image via WP-JSON _embed. |
postedWithinHours | integer | (none) | Only return rows posted in the last N hours (24 = last day, 72 = last 3 days). Empty/0 = all. Ideal for daily monitoring runs that only want fresh postings. |
maxItems | integer | 1000 | Hard cap on rows pushed. |
maxConcurrency / minConcurrency | integer | 5 / 1 | Parallel WP-JSON page-fetch limits. |
maxRequestRetries | integer | 5 | Retries before a failed request is given up. |
proxy | object | No proxy | Site does not anti-bot. |
📊 Output overview
Each scraped item is one single dataset row. The type field is "job" when the item is in the "Jobs" category, else "post". The cpt field is always "portfolio".
📦 Output sample
{"type": "job","cpt": "portfolio","source": "communityfirstyorkshire.org.uk","jobId": "24490","slug": "north-yorkshire-adviser-to-unpaid-carers-veterans-carers-plus-yorkshire","jobUrl": "https://www.communityfirstyorkshire.org.uk/portfolio-item/north-yorkshire-adviser-to-unpaid-carers-veterans-carers-plus-yorkshire/","wpJsonUrl": "https://www.communityfirstyorkshire.org.uk/wp-json/wp/v2/portfolio/24490","title": "North Yorkshire: Adviser to Unpaid Carers (Veterans), Carers Plus Yorkshire","description": "<div>About the role…</div>","descriptionText": "About the role…","companyName": null,"companyWebsite": "https://www.carersplus.net/","companyDomain": "carersplus.net","location": "North Yorkshire","locations": ["North Yorkshire"],"remote": false,"salary": {"currency": "GBP","min": 24000,"max": 27000,"raw": "£24,000 - £27,000 per annum"},"salaryRaw": "£24,000 - £27,000 per annum","categories": ["Jobs"],"employmentTypes": [],"contractType": null,"portfolioTerms": ["Jobs", "North Yorkshire"],"status": "publish","postedDate": "2026-05-15T09:26:35Z","closingDate": "Friday 30 May 2026","modifiedDate": "2026-05-15T09:26:35Z","applyType": "email","applyUrl": "https://www.communityfirstyorkshire.org.uk/portfolio-item/north-yorkshire-adviser-to-unpaid-carers-veterans-carers-plus-yorkshire/","applyEmail": "recruitment@carersplus.net","externalApplyUrl": "https://www.carersplus.net/","featuredImageUrl": null,"authorId": 1,"authorName": null,"scrapedAt": "2026-05-20T00:13:00.000Z"}
🗂 Key output fields
| Group | Fields |
|---|---|
| Identifiers | type (job or post), cpt (always portfolio), source, jobId, slug, jobUrl, wpJsonUrl, scrapedAt |
| Content | title, description (HTML), descriptionText (plain) |
| Dates | postedDate (ISO), closingDate (raw text), modifiedDate (ISO) |
| Employer | companyName (null), companyWebsite (= externalApplyUrl), companyDomain |
| Location | location (primary, from portfolio_entries), locations[] (all), remote (true if 'Homeworking' tag present) |
| Compensation | salary.{currency, min, max, raw} (best-effort regex), salaryRaw |
| Taxonomies | categories[] (Jobs/Leadership/etc.), portfolioTerms[] (all term names) |
| Apply flow | applyType, applyUrl, applyEmail, externalApplyUrl |
❓ FAQ
Why is closing date sometimes null even when the body mentions a deadline?
The regex looks for "Closing date:", "Deadline:", or "Apply by:" prefixes. If the body uses other phrasing (e.g. "Applications must arrive by…"), the field stays null. The full body HTML is always in description.
Why is salary parse fragile?
CFY items don't have a structured salary field — the regex hunts for "£" patterns in body text. Look at salaryRaw to see what was matched; if structured min/max look wrong, fall back to the raw string.
Can I scrape volunteering or events too?
Yes. Set entityTerms: ["volunteering"] (or other term slugs). The same row shape applies — type becomes "post" for non-job categories.
Can I scrape private pages or applicant data? No. Only the public WP-JSON REST API.
How do I limit results?
Set maxItems. With only ~6 jobs live, maxItems: 100 covers everything.
💬 Support
- For issues or feature requests, please use the Issues tab on the actor's Apify Console page.
- Author's website: https://muhamed-didovic.github.io/
- Email: muhamed.didovic@gmail.com
🛠 Additional services
- Custom output shape, additional fields, or one-off datasets: muhamed.didovic@gmail.com
- Similar scrapers for other CVS / volunteer hubs (Doing Good Leeds, VA Rotherham, VAS Sheffield, Barnsley CVS, BCVS, York CVS): drop an email.
- For API access (no Apify fee, just usage): muhamed.didovic@gmail.com
🔎 Explore more scrapers
See other scrapers at memo23's Apify profile — covering job boards, real estate, social media, and more.
⚠️ Disclaimer
This Actor is an independent tool and is not affiliated with, endorsed by, or sponsored by Community First Yorkshire (CFY), communityfirstyorkshire.org.uk, or any of their subsidiaries or affiliates. All trademarks mentioned are the property of their respective owners.
The scraper accesses only the publicly available WP-JSON REST endpoint and public detail pages on communityfirstyorkshire.org.uk — no authenticated endpoints, recruiter-only features, or content behind a login. Users are responsible for ensuring their use complies with communityfirstyorkshire.org.uk's Terms of Service, applicable data-protection law (GDPR, CCPA, etc.), and any contractual obligations of their own organisation.
SEO Keywords
community first yorkshire scraper, scrape communityfirstyorkshire.org.uk, cfy jobs api, yorkshire rural charity jobs scraper, north yorkshire voluntary sector jobs api, york charity jobs scraper, yorkshire dales charity recruitment data, Apify cfy, rural yorkshire jobs scraper, pateley bridge jobs api, yorkshire homeworking jobs scraper, wp-json portfolio cpt scraper, wordpress portfolio scraper, charityjob alternative scraper, doing good leeds alternative scraper, vassheffield alternative scraper, barnsleycvs alternative scraper, va rotherham alternative scraper, uk rural cvs jobs scraper, yorkshire third sector recruitment data