Glossary of terms
This article describes some terms that appear within WebAutomatation
Xpath: Full meaning XMLPath. Its the syntax to find the location of an element on a webpage See here for more details
Regular expression: Maybe referred to as RegEx on some sections on our site
Extractor: This is a software created to automatically extract data from a website
CSV: Short for Comma Separated Values. It is a delimited text file format which uses commas to separate the values
JSON: Full name "JavaScript Object Notation" is a data and file format easily readable by machines and Humans. See here for more details
Requests: See question "What is a Request"
Starter urls: This will be the urls from which the scraping will begin. It should contain the elements/data that you require from the site. The spider will then look for similar elements in the website starting from this URL. If you would like your scraping to start from multiple places, you can specify multiple start URLs
Link/Follower rules: By Default your spider will visit every single page on the website from the starting URL. This could consume alot of your requests. In order to prevent this if not needed you can specify how you want your spider to look for data. You can do this by defining the URL structure to create restriction's and limit e.g... Link contains /p/product; with this link rule the spider will only find links that begins with /p/product/xxxx
Details Page: This is a page that contains the data for an individual object, such as a product or business page. For eCommerce websites this would be described as a Product details page and would contain information like price, description, name along with a picture of the product. Scraping one details page will return a single row of data on an export. One details page of dat will also count as one request
Listings/category Page: This is a page on a website which displays a list of all identically structured items. For eCommerce a product listing page lists all products based on a category or search query. It can also be referred to as “category pages,”
You should login to leave comments.