【Flowchart Mode】How to collect web data in reverse order | Web Scraping Tool | ScrapeStorm
Abstract:When collecting data, it is often necessary to collect in reverse order (collecting data from the last page to the first). This article will show you how to use ScrapeStorm's smart mode to collect web page data in reverse order. ScrapeStormFree Download
When collecting data, it is often necessary to collect in reverse order (collecting data from the last page to the first). This article will show you how to use ScrapeStorm’s flowchart mode to collect web page data in reverse order.
Case 1: After paging the list page, the link changes, and the link to the last page exists .
Processing method 1: Use the last page link of the list page as the collection link.
When we can directly get the link to the last page of the website list page, we can use the link of the last page to create a collection task by directly copying the link.
1. Click to the last page in the browser and copy the link of the last page.
2. Create a flowchart mode task.
3. After the flow chart mode detects the list, the software will prompt whether to detect the next page button. According to the operation prompt, manually click the “Previous” button.
4. Start the task to start collecting in reverse order.
Processing method 2: Set reverse page numbers in batches
When the link of the website will change according to the page turning, but there is no “Previous Page” button to realize the operation of turning the page forward, you can realize the reverse order collection by setting the page number.
1. Copy the link to the second page.
Generally speaking, the link on the first page may be different from the link on the second page and the third page. It is impossible to find a regular link through the link on the first page, so it is recommended to directly copy the link on the second page to create Task.
2. Use the function of generating URLs in batches to generate links.
As shown in the figure below, “Start” is set to “Last Page”, “End” is set to “First Page”, and “Step” is set to “decrease”.
For more details, please refer to the tutorial: How to use URLs Generator
3. When URLs have been generated in batches, there is no need to set the page turning button. You can select “No, extract only the currrent webpage” in the operation prompt. If the page needs to be scrolled to display more data, it is recommended to set it to “Scroll to Load”.
4. Start the task to start collecting in reverse order.
Case 2: After the list page is turned, the link remains unchanged, and there is no link to the last page
Processing method 1: There is a button to jump to the last page on the web page.
When the link of the website will not change according to the page turning, and we cannot directly get the link of the last page, we can jump to the last page by directly clicking the page turning button of the last page, so as to realize reverse collection.
1. Create a flowchart mode task.
2. Add a “Click” component to turn the page to the last page.
3. After the list is detected, the software will prompt whether to detect the next page button. According to the operation prompt, manually click the “Previous” button.
4. Start the task to start collecting in reverse order.
Processing method 2: There is a page number input box on the web page
When the link of the website will not change according to the page turning, and we cannot directly get the link of the last page, we can jump to the last page by directly inputting the page number of the last page to achieve reverse collection.
1. Create a flowchart mode task.
2. Add a “Input” component and a “Click” component to turn the page to the last page.
3. After the list is detected, the software will prompt whether to detect the next page button. According to the operation prompt, manually click the “Previous” button.
4. Start the task to start collecting in reverse order.