Software

2 minute read

x-crawl v7 version has been released

April 27, 2023

x-crawl

x-crawl is a flexible Node.js multifunctional crawler library. Flexible usage and numerous functions can help you quickly, safely, and stably crawl pages, interfaces, and files.

If you also like x-crawl, you can give x-crawl repository a star to support it, thank you for your support!

Features

🔥 Asynchronous Synchronous – Just change the mode property to toggle asynchronous or synchronous crawling mode.
⚙️ Multiple purposes – It can crawl pages, crawl interfaces, crawl files and poll crawls to meet the needs of various scenarios.
🖋️ Flexible writing style – The same crawling API can be adapted to multiple configurations, and each configuration method is very unique.
⏱️ Interval Crawling – No interval, fixed interval and random interval to generate or avoid high concurrent crawling.
🔄 Failed Retry – Avoid crawling failure due to short-term problems, and customize the number of retries.
➡️ Proxy Rotation – Auto-rotate proxies with failure retry, custom error times and HTTP status codes.
👀 Device Fingerprinting – Zero configuration or custom configuration, avoid fingerprinting to identify and track us from different locations.
🚀 Priority Queue – According to the priority of a single crawling target, it can be crawled ahead of other targets.
☁️ Crawl SPA – Crawl SPA (Single Page Application) to generate pre-rendered content (aka “SSR” (Server Side Rendering)).
⚒️ Control Page – You can submit form, keyboard input, event operation, generate screenshots of the page, etc.
🧾 Capture Record – Capture and record crawling, and use colored strings to remind in the terminal.
🦾 TypeScript – Own types, implement complete types through generics.

Example

Take the automatic acquisition of some photos of experiences and homes around the world every day as an example:

// 1.Import module ES/CJS
import xCrawl from 'x-crawl'

// 2.Create a crawler instance
const myXCrawl = xCrawl({maxRetry: 3,intervalTime: { max: 3000, min: 2000 }})

// 3.Set the crawling task
/*
  Call the startPolling API to start the polling function,
  and the callback function will be called every other day
*/
myXCrawl.startPolling({ d: 1 }, async (count, stopPolling) => {
  // Call crawlPage API to crawl Page
  const res = await myXCrawl.crawlPage({
    targets: [
      'https://www.airbnb.cn/s/experiences',
      'https://www.airbnb.cn/s/plus_homes'
    ],
    viewport: { width: 1920, height: 1080 }
  })

  // Store the image URL to targets
  const targets = []
  const elSelectorMap = ['._fig15y', '._aov0j6']
  for (const item of res) {
    const { id } = item
    const { page } = item.data

    // Wait for the page to load
    await new Promise((r) => setTimeout(r, 300))

    // Gets the URL of the page image
    const urls = await page.$$eval(
      `${elSelectorMap[id - 1]} img`,
      (imgEls) => {
        return imgEls.map((item) => item.src)
      }
    )
    targets.push(...urls)

    // Close page
    page.close()
  }

  // Call the crawlFile API to crawl pictures
  myXCrawl.crawlFile({ targets, storeDir: './upload' })
})

running result:

Note: Do not crawl at will, you can check the robots.txt protocol before crawling. This is just to demonstrate how to use x-crawl.

For more detailed documentation, please check: https://github.com/coder-hxl/x-crawl

How we built the Fastest ChatGPT Website Chatbot

April 27, 2023

Software

Meta Skills of a Software Engineer

April 27, 2023

M	T	W	T	F	S	S
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Hand-Picked Top-Read Stories

Machine Vision Lighting Solutions for Unwanted Glare

I Fine Tuned an Open Source Model and the Bhagavad Gita Explained It Better Than Any Paper

What STEM Professionals Should Know About EB1A Self-Petition in 2026

Trending Tags

x-crawl v7 version has been released

x-crawl

Features

Example

More

Leave a Reply Cancel reply

Previous Post

How we built the Fastest ChatGPT Website Chatbot

Next Post

Meta Skills of a Software Engineer

x-crawl v7 version has been released

x-crawl

Features

Example

More

Leave a Reply Cancel reply

Previous Post

Next Post

Related Posts