Glossary of terms

Glossary of terms

This article describes some terms that appear within WebAutomatation

Xpath: Full meaning XMLPath. Its the syntax to find the location of an element on a webpage See here for more details

Regular expression:  Maybe referred to as RegEx on some sections on our site

Extractor: This is a software created to automatically extract data from a website

CSV: Short for Comma Separated Values. It is a delimited text file format which uses commas to separate the values

JSON: Full name "JavaScript Object Notation" is a data and file format easily readable by machines and Humans. See here for more details

Requests: See question "What is a Request"

Starter urls: This will be the urls from which the scraping will begin. It should contain the elements/data that you require from the site. The spider will then look for similar elements in the website starting from this URL. If you would like your scraping to start from multiple places, you can specify multiple start URLs

Link/Follower rules: By Default your spider will visit every single page on the website from the starting URL. This could consume alot of your requests. In order to prevent this if not needed you can specify how you want your spider to look for data. You can do this by defining the URL structure to create restriction's and limit e.g... Link contains /p/product; with this link rule the spider will only find links that begins with /p/product/xxxx

Details Page: This is a page that contains the data for an individual object, such as a product or business page. For eCommerce websites this would be described as a Product details page and would contain information like price, description, name along with a picture of the product. Scraping one details page will return a single row of data on an export. One details page of dat will also count as one request

Listings/category Page: This is a page on a website which displays a list of all identically structured items. For eCommerce a product listing page lists all products based on a category or search query. It can also be referred to as “category pages,” 

Are you ready to start getting your data?

Your data is waiting….

Leave a comment:

You should login to leave comments.