Web scraping is a process used to extract content from a website. With this plugin, you can extract publicly available, numeric data from websites and store it as an Ubidots variable.
Note: Some websites have strict policies against automated scraping. Please check the website policies first to make sure you're not violating any of their user terms. |
Requirements
An active Ubidots account
1. Getting the device and variable’s labels
Creating this plugin requires six inputs, which include having both a device label and a variable label that will receive the data generated by the plugin.
To get the device label you can either create a new device or use an already existing one in your account.
Go into your device and copy the device label, which, in our case, is “scraper”.
Then, create a new variable within the device you’ll use and copy its label too. In our case, it’s “price-variable”.
2. Getting the URL and the Target XPath
The next two inputs we need are those related to the value we want to extract from a website: The URL of the website that hosts the value and the target XPath of the numerical value.
To get these inputs just go to the website's page that contains the value you want to retrieve and:
Copy the URL of the page. In our case, we’re interested in following the price of Amazon’s stocks, so we copied this link:
Now, locate the numerical value you want to track on that website. We, for example, want to track this value, the price of the stock:
To get the XPath, right-click on the value and inspect it. Once the dev tools open, go to the element that corresponds to the numerical value, which should be highlighted, and right-click on it → go to copy → copy XPath. In our example, the XPath is “//*[@id="quote-header-info"]/div[3]/div[1]/div[1]/fin-streamer[1]”.
3. Creating a Web Scraper plugin
Having these four inputs, we can now go to our Ubidots account → Devices → Plugins.
There, create a new Web Scraper plugin and fill in the input fields with the information we gathered in the previous steps.
For the last two inputs, choose the token you want to use and the run time of the plugin.
After a final step where you can choose a name and a description for your plugin, your variable should start receiving data at the time rate you selected. Just create a widget for you to visualize the data and you’ll be all set.