Understanding starter links

Starter URLs are the URLS that the crawler will start its journey when crawling a website. It should contain the elements/data that you require from the site. The crawler will then look for similar elements in the website starting from this URL. If you would like your scraping to start from multile places, you can specify multiple start URLs

Starter URLs are best as Search URLs or category listings pages. If the listing page or search url is paginated the crawler will follow the pagination to find similar pages with the same information specified

E.g A Search url or category page for a best selling books on a online book retailer could look like www.bookwebsite.com/fiction/best-selling-100

Starter Links for Pre-Defined Extractors

To use webautomation.io predefined extractors you will have to enter in some input values to the extratcors, these will most likely be starter links.

Types of input starter links supported

Depending on the pre-defined extractor you use, it will expect certain types of input starter links to be able to run. Please read the instructions of each PDE to understand which type of links to enter

Search URL(s)

A search url also called a query string is the URL genereted by a website after inputting some keywords or filters. When you enter this as an input starter link webautomation extractor will follow all the search results, visit each page and extract the content of each page from the searhc results

See example below

Product page URL(s) / Direct URL(s)

This are individual single pages on a website. They are usually templated and look similar across each website. e.g on an ecommerce website a product page will be a page showing one product and its full details. To use these as input starter urls you will need to copy and page each individual of these links and paste into the starter links input box. Our extractor will then visit each link and extract the product details from each

See example

Category page URL(s)

Category or Department pages are sectioned pages in a website which has all similar listings/products categorised in the same section. They will typically have the same template and present information in the same design. If you enter this as a starter link our extractor will visit all the pages in the category and extract the product details from each of the pages

See example

Are you ready to start getting your data?

Your data is waiting….

Leave a comment:

You should login to leave comments.