Download and Sign Up
Get a $5 Coupon For Free
Getting Started Main Features

【Flowchart Mode】How to set the fields | Web Scraping Tool | ScrapeStorm

2021-07-16 13:32:56
8816 views

Abstract:This tutorial will show you how to set fields in flowchart mode. No Programming Needed. Visual Operation. ScrapeStormFree Download

Extract” component is used to extract data from a web page. The component can be used alone or in conjunction with a “Loop” component or a “Judge” component. It is suitable for extracting data on a single page when used alone. When used together, it is suitable for extracting data on all pages.

The specific settings are as follows:

1. Rename

 

2. Merge

There are two ways to merge fields.
(1) Click on a field that needs to be merged, right click and select “Merge”, then select the fields you want to merge in the page, and select the appropriate separator, which is suitable for the combination of two fields.

(2) Press crtl or shift to select multiple fields, then right click on “Merge”. This method is suitable for the combination of multiple fields.

 

3. Select in page

If you want to modify the content extracted in the field, or add a new field to set the extraction content, you need to click “Select in page”, and then extract the required data in the web page.

 

4. Edit Xpath

Xpath is a path query language that uses a path expression to find the location of the data we need in the web page. Users with a programming foundation can use this feature to set up a new XPath.

 

5. Extract Type

Different data needs to set different value attributes. When setting a new field, the value of the field defaults to a text field.

In general, when you select new data, ScrapeStorm will automatically help you determine the field attributes, you don’t need to set it up. However, if there is a judgment error, you can set the value attribute of the field yourself.

Text: Suitable for ordinary text data.

InnerHTML: Suitable for extracting HTML that does not include the content itself.

OuterHTML: Suitable for extracting HTML that includes the content itself.

Link URL: Suitable for extracting links.

Image/Video/Audio URL: Suitable for extracting images and other media.

Input Value: Suitable for extracting the text of input box, mostly used for keyword collection.

Download Button: Used to extract download address.

 

6. Decode Type

The software will automatically identify the fields to be decoded. In case that some fields are decoded incorrectly or not, you can manually select the decoding function. This is the function of Business Plan. Users need to upgrade to use it.

 

7. Modify data
Sometimes we need to do some processing on the content of the extracted fields. For example, you only need the numbers and email in the fields, or replace the text in the fields with new text, or clear the blank characters at the beginning and the end, or create some new regular expressions. Alternatively, you can click on “Modify Data”.

 

8. Special value
In the data scraping process, some users need to scrape some special fields, such as scraping time, page title, page URL, etc. These fields cannot be scraped directly in the web page, then you can use “Special Value” to set the field.

Users can create a new field, change the field to a special field, or change the original field to a special field.

 

9. Delete column

You can right click on the field to select Delete, or press Ctrl or Shift to select multiple fields to delete.

 

10. Clear

If the user does not need the fields that the system automatically recognizes, you can click “Clear”  to clear the fields and you can reset the required fields.

 

11. Add field

If you want to add a new field, click on “Add Field” in the upper right corner, right click on the newly added field, click on “Select in Page”, and extract the required data from the page.

 

 

Keyword extraction from web content Download web page as word Data scraping with python Match emails with Regex Download videos in batches php crawler Download images in batches python crawler Generate URLs in batches python download file
关闭