{"id":505385,"date":"2024-05-17T12:42:35","date_gmt":"2024-05-17T12:42:35","guid":{"rendered":"https:\/\/oneproxy.pro\/?p=505385"},"modified":"2024-08-27T06:50:19","modified_gmt":"2024-08-27T06:50:19","slug":"puppeteer-vs-selenium","status":"publish","type":"post","link":"https:\/\/oneproxy.pro\/in\/info\/puppeteer-vs-selenium\/","title":{"rendered":"\u092a\u092a\u0947\u091f\u093f\u092f\u0930 \u092c\u0928\u093e\u092e \u0938\u0947\u0932\u0947\u0928\u093f\u092f\u092e: \u0935\u0947\u092c \u0938\u094d\u0915\u094d\u0930\u0948\u092a\u093f\u0902\u0917 \u0915\u0947 \u0932\u093f\u090f \u0915\u094d\u092f\u093e \u091a\u0941\u0928\u0947\u0902?"},"content":{"rendered":"\n<p>Are you trying to decide between Puppeteer and Selenium for web scraping? Both are powerful browser automation frameworks, and making the right choice depends on your specific scraping needs and available resources.<\/p>\n\n\n\n<p>To help you make an informed decision, we&#8217;ve highlighted the key differences between Puppeteer and Selenium in the table below. Afterward, we will delve into the details and provide a scraping example for each framework to demonstrate their effectiveness in extracting data from web pages.<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table><thead><tr><th>Criteria<\/th><th>Puppeteer<\/th><th>Selenium<\/th><\/tr><\/thead><tbody><tr><td><strong>Compatible Languages<\/strong><\/td><td>Only JavaScript is officially supported, but there are unofficial PHP and Python ports<\/td><td>Java, Python, C#, Ruby, PHP, JavaScript, and Kotlin<\/td><\/tr><tr><td><strong>Browser Support<\/strong><\/td><td>Chromium and experimental Firefox support<\/td><td>Chrome, Safari, Firefox, Opera, Edge, and Internet Explorer<\/td><\/tr><tr><td><strong>Performance<\/strong><\/td><td>60% faster than Selenium<\/td><td>Fast<\/td><\/tr><tr><td><strong>Operating System Support<\/strong><\/td><td>Windows, Linux, and macOS<\/td><td>Windows, Linux, macOS, and Solaris<\/td><\/tr><tr><td><strong>Architecture<\/strong><\/td><td>Event-driven architecture with headless browser instances<\/td><td>JSONWire protocol on the web driver to control the browser instance<\/td><\/tr><tr><td><strong>Prerequisites<\/strong><\/td><td>JavaScript package is enough<\/td><td>Selenium Bindings (for the selected programming language) and browser web drivers<\/td><\/tr><tr><td><strong>Community<\/strong><\/td><td>Smaller community compared to Selenium<\/td><td>Well-established documentation and a large community<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>Let&#8217;s proceed to discuss these libraries in detail and perform a scraping example with each to illustrate their efficiency in extracting data from a web page.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"800\" height=\"450\" src=\"https:\/\/oneproxy.pro\/wp-content\/uploads\/2024\/05\/puppeteer-logo.png\" alt=\"Puppeteer Logo\" class=\"wp-image-505386\" title=\"\" srcset=\"https:\/\/oneproxy.pro\/wp-content\/uploads\/2024\/05\/puppeteer-logo.png 800w, https:\/\/oneproxy.pro\/wp-content\/uploads\/2024\/05\/puppeteer-logo-150x84.png 150w, https:\/\/oneproxy.pro\/wp-content\/uploads\/2024\/05\/puppeteer-logo-768x432.png 768w, https:\/\/oneproxy.pro\/wp-content\/uploads\/2024\/05\/puppeteer-logo-18x10.png 18w\" sizes=\"auto, (max-width: 800px) 100vw, 800px\" \/><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">Puppeteer<\/h2>\n\n\n\n<p><a href=\"https:\/\/pptr.dev\/\" target=\"_blank\" data-type=\"link\" data-id=\"https:\/\/pptr.dev\/\" rel=\"noreferrer noopener nofollow\">Puppeteer<\/a> is a Node.js library that provides a high-level API to control Chrome or Chromium over the DevTools Protocol. It is designed for automating tasks in Chrome or Chromium, such as taking screenshots, generating PDFs, and navigating pages.<\/p>\n\n\n\n<p>Puppeteer can also be used for testing web pages by simulating user interactions like clicking buttons, filling out forms, and verifying the results displayed.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Advantages of Puppeteer<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Ease of Use<\/strong>: Simple and straightforward to use.<\/li>\n\n\n\n<li><strong>Bundled with Chromium<\/strong>: No additional setup is required.<\/li>\n\n\n\n<li><strong>Headless Mode<\/strong>: Runs in headless mode by default but can be configured to run in full browser mode.<\/li>\n\n\n\n<li><strong>Event-Driven Architecture<\/strong>: Eliminates the need for manual sleep calls in your code.<\/li>\n\n\n\n<li><strong>Comprehensive Capabilities<\/strong>: Can take screenshots, generate PDFs, and automate all browser actions.<\/li>\n\n\n\n<li><strong>Performance Management<\/strong>: Offers tools for recording runtime and load performance to optimize and debug your scraper.<\/li>\n\n\n\n<li><strong>SPA Crawling<\/strong>: Capable of crawling Single Page Applications (SPAs) and generating pre-rendered content (server-side rendering).<\/li>\n\n\n\n<li><strong>Script Recording<\/strong>: Allows creating Puppeteer scripts by recording actions on the browser using the DevTools console.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Disadvantages of Puppeteer<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Limited Browser Support<\/strong>: Supports fewer browsers compared to Selenium.<\/li>\n\n\n\n<li><strong>JavaScript Focused<\/strong>: Primarily supports JavaScript, although unofficial ports for Python and PHP exist.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Web Scraping Example with Puppeteer<\/h3>\n\n\n\n<p>Let&#8217;s go through a Puppeteer web scraping tutorial to extract items from the Crime and Thriller category of the Danube website.<\/p>\n\n\n\n<p><strong>Danube Store: Crime and Thrillers<\/strong><\/p>\n\n\n\n<p>To get started, import the Puppeteer module and create an asynchronous function to run the Puppeteer code:<\/p>\n\n\n\n<div class=\"hcb_wrap\"><pre class=\"prism line-numbers lang-js\" data-lang=\"JavaScript\"><code>const puppeteer = require(&#39;puppeteer&#39;); \n\nasync function main() { \n    \/\/ Launch a headless browser instance \n    const browser = await puppeteer.launch({ headless: true });\n\n    \/\/ Create a new page object \n    const page = await browser.newPage();\n\n    \/\/ Navigate to the target URL and wait until the loading finishes\n    await page.goto(&#39;https:\/\/danube-webshop.herokuapp.com\/&#39;, { waitUntil: &#39;networkidle2&#39; });\n\n    \/\/ Wait for the left-side bar to load\n    await page.waitForSelector(&#39;ul.sidebar-list&#39;);\n\n    \/\/ Click on the first element and wait for the navigation to finish\n    await Promise.all([\n        page.waitForNavigation(),\n        page.click(&quot;ul[class=&#39;sidebar-list&#39;] &gt; li &gt; a&quot;),\n    ]);\n\n    \/\/ Wait for the book previews to load\n    await page.waitForSelector(&quot;li[class=&#39;preview&#39;]&quot;);\n\n    \/\/ Extract the book previews\n    const books = await page.evaluateHandle(\n        () =&gt; [...document.querySelectorAll(&quot;li[class=&#39;preview&#39;]&quot;)]\n    );\n\n    \/\/ Extract the relevant data using page.evaluate\n    const processed_data = await page.evaluate(elements =&gt; {\n        let data = [];\n        elements.forEach(element =&gt; {\n            let title = element.querySelector(&quot;div.preview-title&quot;).innerHTML;\n            let author = element.querySelector(&quot;div.preview-author&quot;).innerHTML;\n            let rating = element.querySelector(&quot;div.preview-details &gt; p.preview-rating&quot;).innerHTML;\n            let price = element.querySelector(&quot;div.preview-details &gt; p.preview-price&quot;).innerHTML;\n\n            let result = { title, author, rating, price };\n            data.push(result);\n        });\n        return data;\n    }, books);\n\n    \/\/ Print out the extracted data\n    console.log(processed_data);\n\n    \/\/ Close the page and browser respectively\n    await page.close();\n    await browser.close();\n}\n\n\/\/ Run the main function to scrape the data\nmain();<\/code><\/pre><\/div>\n\n\n\n<h3 class=\"wp-block-heading\">Expected Output<\/h3>\n\n\n\n<p>When you run the code, the output should resemble the following:<\/p>\n\n\n\n<div class=\"hcb_wrap\"><pre class=\"prism line-numbers lang-js\" data-lang=\"JavaScript\"><code>[\n    {\n        title: &#39;Does the Sun Also Rise?&#39;,\n        author: &#39;Ernst Doubtingway&#39;,\n        rating: &#39;\u2605\u2605\u2605\u2605\u2606&#39;,\n        price: &#39;$9.95&#39;\n    },\n    {\n        title: &#39;The Insiders&#39;,\n        author: &#39;E. S. Hilton&#39;,\n        rating: &#39;\u2605\u2605\u2605\u2605\u2606&#39;,\n        price: &#39;$9.95&#39;\n    },\n    {\n        title: &#39;A Citrussy Clock&#39;,\n        author: &#39;Bethany Urges&#39;,\n        rating: &#39;\u2605\u2605\u2605\u2605\u2605&#39;,\n        price: &#39;$9.95&#39;\n    }\n]<\/code><\/pre><\/div>\n\n\n\n<h3 class=\"wp-block-heading\">Another Example of Using Puppeteer<\/h3>\n\n\n\n<p>In addition to scraping data from web pages, Puppeteer can be used for a variety of automation tasks. One common use case is to generate a PDF of a webpage. Let&#8217;s walk through an example where Puppeteer is used to generate a PDF from a web page.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Generating a PDF with Puppeteer<\/h3>\n\n\n\n<p><strong>Step 1: Import Puppeteer and Create an Asynchronous Function<\/strong><\/p>\n\n\n\n<div class=\"hcb_wrap\"><pre class=\"prism line-numbers lang-js\" data-lang=\"JavaScript\"><code>const puppeteer = require(&#39;puppeteer&#39;);\n\nasync function generatePDF() {\n    \/\/ Launch a headless browser instance\n    const browser = await puppeteer.launch({ headless: true });\n\n    \/\/ Create a new page object\n    const page = await browser.newPage();\n\n    \/\/ Navigate to the target URL\n    await page.goto(&#39;https:\/\/example.com&#39;, { waitUntil: &#39;networkidle2&#39; });\n\n    \/\/ Generate a PDF from the web page\n    await page.pdf({\n        path: &#39;example.pdf&#39;, \/\/ Output file path\n        format: &#39;A4&#39;,        \/\/ Paper format\n        printBackground: true, \/\/ Include background graphics\n    });\n\n    \/\/ Close the page and browser respectively\n    await page.close();\n    await browser.close();\n}\n\n\/\/ Run the function to generate the PDF\ngeneratePDF();<\/code><\/pre><\/div>\n\n\n\n<h3 class=\"wp-block-heading\">Additional Puppeteer Options<\/h3>\n\n\n\n<p>Puppeteer provides several options for generating PDFs that can be customized to suit your needs. Here are some of the options you can use:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><code>path<\/code>: The file path to save the PDF.<\/li>\n\n\n\n<li><code>format<\/code>: The paper format (e.g., &#8216;A4&#8217;, &#8216;Letter&#8217;).<\/li>\n\n\n\n<li><code>printBackground<\/code>: Whether to include the background graphics.<\/li>\n\n\n\n<li><code>landscape<\/code>: Set to <code>true<\/code> for landscape orientation.<\/li>\n\n\n\n<li><code>margin<\/code>: Specify margins for the PDF (top, right, bottom, left).<\/li>\n<\/ul>\n\n\n\n<p><strong>Example with Additional Options:<\/strong><\/p>\n\n\n\n<div class=\"hcb_wrap\"><pre class=\"prism line-numbers lang-js\" data-lang=\"JavaScript\"><code>const puppeteer = require(&#39;puppeteer&#39;);\n\nasync function generatePDF() {\n    const browser = await puppeteer.launch({ headless: true });\n    const page = await browser.newPage();\n    await page.goto(&#39;https:\/\/example.com&#39;, { waitUntil: &#39;networkidle2&#39; });\n\n    await page.pdf({\n        path: &#39;example.pdf&#39;,\n        format: &#39;A4&#39;,\n        printBackground: true,\n        landscape: true,\n        margin: {\n            top: &#39;20px&#39;,\n            right: &#39;20px&#39;,\n            bottom: &#39;20px&#39;,\n            left: &#39;20px&#39;,\n        },\n    });\n\n    await page.close();\n    await browser.close();\n}\n\ngeneratePDF();<\/code><\/pre><\/div>\n\n\n\n<h3 class=\"wp-block-heading\">Example Output<\/h3>\n\n\n\n<p>Running the above code will create a PDF file named <code>example.pdf<\/code> in the current directory with the contents of the web page <code>https:\/\/example.com<\/code>.<\/p>\n\n\n\n<p>Puppeteer is a versatile tool for web automation tasks, from scraping data to generating PDFs. Its ease of use and powerful features make it an excellent choice for automating a wide range of browser activities. Whether you&#8217;re scraping data, generating reports, or testing web pages, Puppeteer provides the tools you need to get the job done efficiently.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"2048\" height=\"501\" src=\"https:\/\/oneproxy.pro\/wp-content\/uploads\/2024\/05\/Selenium_Logo-2048x501.png\" alt=\"Selenium Logo\" class=\"wp-image-505387\" title=\"\" srcset=\"https:\/\/oneproxy.pro\/wp-content\/uploads\/2024\/05\/Selenium_Logo-2048x501.png 2048w, https:\/\/oneproxy.pro\/wp-content\/uploads\/2024\/05\/Selenium_Logo-1280x313.png 1280w, https:\/\/oneproxy.pro\/wp-content\/uploads\/2024\/05\/Selenium_Logo-150x37.png 150w, https:\/\/oneproxy.pro\/wp-content\/uploads\/2024\/05\/Selenium_Logo-768x188.png 768w, https:\/\/oneproxy.pro\/wp-content\/uploads\/2024\/05\/Selenium_Logo-1536x376.png 1536w, https:\/\/oneproxy.pro\/wp-content\/uploads\/2024\/05\/Selenium_Logo-18x4.png 18w\" sizes=\"auto, (max-width: 2048px) 100vw, 2048px\" \/><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">Selenium<\/h2>\n\n\n\n<p><a href=\"https:\/\/www.selenium.dev\/\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">Selenium<\/a> is an open-source end-to-end testing and web automation tool often used for web scraping. Its main components include Selenium IDE, Selenium WebDriver, and Selenium Grid.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Selenium IDE<\/strong>: Used to record actions before automating them.<\/li>\n\n\n\n<li><strong>Selenium WebDriver<\/strong>: Executes commands in the browser.<\/li>\n\n\n\n<li><strong>Selenium Grid<\/strong>: Enables parallel execution.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Advantages of Selenium<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Ease of Use<\/strong>: Simple and straightforward to use.<\/li>\n\n\n\n<li><strong>Language Support<\/strong>: Supports various programming languages such as Python, Java, JavaScript, Ruby, and C#.<\/li>\n\n\n\n<li><strong>Browser Automation<\/strong>: Can automate browsers like Firefox, Edge, Safari, and even custom QtWebKit browsers.<\/li>\n\n\n\n<li><strong>Scalability<\/strong>: Possible to scale Selenium to hundreds of instances using cloud servers with different browser settings.<\/li>\n\n\n\n<li><strong>Cross-Platform<\/strong>: Operates on Windows, macOS, and Linux.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Disadvantages of Selenium<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Complex Setup<\/strong>: Selenium setup methods can be complex.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Web Scraping Sample with Selenium<\/h3>\n\n\n\n<p>As with Puppeteer, let&#8217;s go through a tutorial on web scraping with Selenium using the same target site. We&#8217;ll extract the book previews from the Crime and Thriller category of the Danube website.<\/p>\n\n\n\n<p><strong>Danube Store: Crime and Thrillers<\/strong><\/p>\n\n\n\n<p><strong>Step 1: Import the Necessary Modules and Configure Selenium<\/strong><\/p>\n\n\n\n<div class=\"hcb_wrap\"><pre class=\"prism line-numbers lang-python\" data-lang=\"Python\"><code>import time\nfrom selenium import webdriver\nfrom selenium.webdriver.common.by import By\n\noptions = webdriver.ChromeOptions()\noptions.add_argument(&quot;--headless&quot;)<\/code><\/pre><\/div>\n\n\n\n<p><strong>Step 2: Initialize the Chrome WebDriver<\/strong><\/p>\n\n\n\n<div class=\"hcb_wrap\"><pre class=\"prism line-numbers lang-python\" data-lang=\"Python\"><code>driver = webdriver.Chrome(options=options)<\/code><\/pre><\/div>\n\n\n\n<p><strong>Step 3: Navigate to the Target Website<\/strong><\/p>\n\n\n\n<div class=\"hcb_wrap\"><pre class=\"prism line-numbers lang-python\" data-lang=\"Python\"><code>time.sleep(1)\ncrime_n_thrillers = driver.find_element(By.CSS_SELECTOR, &quot;ul[class=&#39;sidebar-list&#39;] &gt; li&quot;)\ncrime_n_thrillers.click()\ntime.sleep(1)\nbooks = driver.find_elements(By.CSS_SELECTOR, &quot;div.shop-content li.preview&quot;)<\/code><\/pre><\/div>\n\n\n\n<p><strong>Step 4: Click on the Crime &amp; Thrillers Category and Extract Book Previews<\/strong><\/p>\n\n\n\n<div class=\"hcb_wrap\"><pre class=\"prism line-numbers lang-python\" data-lang=\"Python\"><code>time.sleep(1)\ncrime_n_thrillers = driver.find_element(By.CSS_SELECTOR, &quot;ul[class=&#39;sidebar-list&#39;] &gt; li&quot;)\ncrime_n_thrillers.click()\ntime.sleep(1)\nbooks = driver.find_elements(By.CSS_SELECTOR, &quot;div.shop-content li.preview&quot;)<\/code><\/pre><\/div>\n\n\n\n<p><strong>Step 5: Define a Function to Extract Data from Each Book Preview<\/strong><\/p>\n\n\n\n<div class=\"hcb_wrap\"><pre class=\"prism line-numbers lang-python\" data-lang=\"Python\"><code>def extract(element):\n    title = element.find_element(By.CSS_SELECTOR, &quot;div.preview-title&quot;).text\n    author = element.find_element(By.CSS_SELECTOR, &quot;div.preview-author&quot;).text\n    rating = element.find_element(By.CSS_SELECTOR, &quot;div.preview-details p.preview-rating&quot;).text\n    price = element.find_element(By.CSS_SELECTOR, &quot;div.preview-details p.preview-price&quot;).text\n    return {&quot;title&quot;: title, &quot;author&quot;: author, &quot;rating&quot;: rating, &quot;price&quot;: price}<\/code><\/pre><\/div>\n\n\n\n<p><strong>Step 6: Loop Through the Previews, Extract the Data, and Quit the Driver<\/strong><\/p>\n\n\n\n<div class=\"hcb_wrap\"><pre class=\"prism line-numbers lang-python\" data-lang=\"Python\"><code>extracted_data = []\nfor element in books:\n    data = extract(element)\n    extracted_data.append(data)\n\nprint(extracted_data)\ndriver.quit()<\/code><\/pre><\/div>\n\n\n\n<h3 class=\"wp-block-heading\">Expected Output<\/h3>\n\n\n\n<p>Running the above code will produce an output similar to the following:<\/p>\n\n\n\n<div class=\"hcb_wrap\"><pre class=\"prism line-numbers lang-python\" data-lang=\"Python\"><code>[\n    {&#39;title&#39;: &#39;Does the Sun Also Rise?&#39;, &#39;author&#39;: &#39;Ernst Doubtingway&#39;, &#39;rating&#39;: &#39;\u2605\u2605\u2605\u2605\u2606&#39;, &#39;price&#39;: &#39;$9.95&#39;},\n    {&#39;title&#39;: &#39;The Insiders&#39;, &#39;author&#39;: &#39;E. S. Hilton&#39;, &#39;rating&#39;: &#39;\u2605\u2605\u2605\u2605\u2606&#39;, &#39;price&#39;: &#39;$9.95&#39;},\n    {&#39;title&#39;: &#39;A Citrussy Clock&#39;, &#39;author&#39;: &#39;Bethany Urges&#39;, &#39;rating&#39;: &#39;\u2605\u2605\u2605\u2605\u2605&#39;, &#39;price&#39;: &#39;$9.95&#39;}\n]<\/code><\/pre><\/div>\n\n\n\n<h3 class=\"wp-block-heading\">Additional Selenium Example: Taking a Screenshot<\/h3>\n\n\n\n<p>In addition to scraping data, Selenium can also be used to take screenshots of web pages. Here&#8217;s an example of how to take a screenshot of a web page using Selenium.<\/p>\n\n\n\n<p><strong>Step 1: Import the Necessary Modules and Configure Selenium<\/strong><\/p>\n\n\n\n<div class=\"hcb_wrap\"><pre class=\"prism line-numbers lang-python\" data-lang=\"Python\"><code>from selenium import webdriver\n\noptions = webdriver.ChromeOptions()\noptions.add_argument(&quot;--headless&quot;)<\/code><\/pre><\/div>\n\n\n\n<p><strong>Step 2: Initialize the Chrome WebDriver<\/strong><\/p>\n\n\n\n<div class=\"hcb_wrap\"><pre class=\"prism line-numbers lang-python\" data-lang=\"Python\"><code>driver = webdriver.Chrome(options=options)<\/code><\/pre><\/div>\n\n\n\n<p><strong>Step 3: Navigate to the Target Website<\/strong><\/p>\n\n\n\n<div class=\"hcb_wrap\"><pre class=\"prism line-numbers lang-python\" data-lang=\"Python\"><code>url = &quot;https:\/\/example.com&quot;\ndriver.get(url)<\/code><\/pre><\/div>\n\n\n\n<p><strong>Step 4: Take a Screenshot<\/strong><\/p>\n\n\n\n<div class=\"hcb_wrap\"><pre class=\"prism line-numbers lang-python\" data-lang=\"Python\"><code>driver.save_screenshot(&quot;example_screenshot.png&quot;)<\/code><\/pre><\/div>\n\n\n\n<p><strong>Step 5: Quit the Driver<\/strong><\/p>\n\n\n\n<div class=\"hcb_wrap\"><pre class=\"prism line-numbers lang-python\" data-lang=\"Python\"><code>driver.quit()<\/code><\/pre><\/div>\n\n\n\n<p>Selenium is a versatile tool for web automation tasks, including web scraping and taking screenshots. Its support for multiple programming languages and browsers, along with its scalability, makes it a powerful choice for various automation needs. Whether you&#8217;re extracting data or generating reports, Selenium provides the capabilities to automate your tasks efficiently.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Puppeteer vs. Selenium: Speed Comparison<\/h2>\n\n\n\n<p>Is Puppeteer faster than Selenium? The answer is yes\u2014Puppeteer is generally faster than Selenium.<\/p>\n\n\n\n<p>To compare the speed of Puppeteer and Selenium, we used the Danube-store sandbox and ran the scripts presented above 20 times, averaging the execution times.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Selenium Speed Test<\/h3>\n\n\n\n<p>We used the <code>time<\/code> module in Python to measure the execution time of the Selenium script. The start time was recorded at the beginning and the end time at the end of the script. The difference between these times provided the total execution duration.<\/p>\n\n\n\n<p>Here is the complete script used for Selenium:<\/p>\n\n\n\n<div class=\"hcb_wrap\"><pre class=\"prism line-numbers lang-python\" data-lang=\"Python\"><code>import time\nfrom selenium import webdriver\nfrom selenium.webdriver.common.by import By\nfrom selenium.webdriver.support.ui import WebDriverWait\nfrom selenium.webdriver.support import expected_conditions as EC\n\ndef extract(element):\n    title = element.find_element(By.CSS_SELECTOR, &quot;div.preview-title&quot;).text\n    author = element.find_element(By.CSS_SELECTOR, &quot;div.preview-author&quot;).text\n    rating = element.find_element(By.CSS_SELECTOR, &quot;div.preview-details p.preview-rating&quot;).text\n    price = element.find_element(By.CSS_SELECTOR, &quot;div.preview-details p.preview-price&quot;).text\n    return {&quot;title&quot;: title, &quot;author&quot;: author, &quot;rating&quot;: rating, &quot;price&quot;: price}\n\n# Start the timer\nstart_time = time.time()\n\noptions = webdriver.ChromeOptions()\noptions.add_argument(&quot;--headless&quot;)\n\n# Create a new instance of the Chrome driver\ndriver = webdriver.Chrome(options=options)\n\nurl = &quot;https:\/\/danube-webshop.herokuapp.com\/&quot;\ndriver.get(url)\n\n# Click on the Crime & Thrillers category\ntime.sleep(1)\ncrime_n_thrillers = driver.find_element(By.CSS_SELECTOR, &quot;ul[class=&#39;sidebar-list&#39;] &gt; li&quot;)\ncrime_n_thrillers.click()\ntime.sleep(1)\n\n# Extract the book previews\nbooks = driver.find_elements(By.CSS_SELECTOR, &quot;div.shop-content li.preview&quot;)\n\nextracted_data = []\nfor element in books:\n    data = extract(element)\n    extracted_data.append(data)\n\nprint(extracted_data)\n\n# End the timer\nend_time = time.time()\nprint(f&quot;The whole script took: {end_time - start_time:.4f} seconds&quot;)\n\ndriver.quit()<\/code><\/pre><\/div>\n\n\n\n<h3 class=\"wp-block-heading\">Puppeteer Speed Test<\/h3>\n\n\n\n<p>For the Puppeteer script, we used the <code>Date<\/code> object to measure the execution time. The start time was recorded at the beginning and the end time at the end of the script. The difference between these times provided the total execution duration.<\/p>\n\n\n\n<p>Here is the complete script used for Puppeteer:<\/p>\n\n\n\n<div class=\"hcb_wrap\"><pre class=\"prism line-numbers lang-js\" data-lang=\"JavaScript\"><code>const puppeteer = require(&#39;puppeteer&#39;);\n\nasync function main() {\n    const start = Date.now();\n\n    const browser = await puppeteer.launch({ headless: true });\n    const page = await browser.newPage();\n    await page.goto(&#39;https:\/\/danube-webshop.herokuapp.com\/&#39;, { waitUntil: &#39;networkidle2&#39; });\n\n    await page.waitForSelector(&#39;ul.sidebar-list&#39;);\n\n    await Promise.all([\n        page.waitForNavigation(),\n        page.click(&quot;ul[class=&#39;sidebar-list&#39;] &gt; li &gt; a&quot;),\n    ]);\n\n    await page.waitForSelector(&quot;li[class=&#39;preview&#39;]&quot;);\n    const books = await page.evaluateHandle(\n        () =&gt; [...document.querySelectorAll(&quot;li[class=&#39;preview&#39;]&quot;)]\n    );\n\n    const processed_data = await page.evaluate(elements =&gt; {\n        let data = [];\n        elements.forEach(element =&gt; {\n            let title = element.querySelector(&quot;div.preview-title&quot;).innerHTML;\n            let author = element.querySelector(&quot;div.preview-author&quot;).innerHTML;\n            let rating = element.querySelector(&quot;div.preview-details &gt; p.preview-rating&quot;).innerHTML;\n            let price = element.querySelector(&quot;div.preview-details &gt; p.preview-price&quot;).innerHTML;\n\n            let result = { title, author, rating, price };\n            data.push(result);\n        });\n        return data;\n    }, books);\n\n    console.log(processed_data);\n    await page.close();\n    await browser.close();\n\n    const end = Date.now();\n    console.log(`Execution time: ${(end - start) \/ 1000} seconds`);\n}\n\nmain();<\/code><\/pre><\/div>\n\n\n\n<h3 class=\"wp-block-heading\">Performance Test Results<\/h3>\n\n\n\n<p>The performance tests showed that Puppeteer is about 60% faster than Selenium. This speed advantage makes Puppeteer a more suitable choice for projects requiring high-speed web scraping and automation, especially when working with Chromium-based browsers.<\/p>\n\n\n\n<p><strong>Speed Results Summary:<\/strong><\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"600\" height=\"398\" src=\"https:\/\/oneproxy.pro\/wp-content\/uploads\/2024\/05\/pupeteer-vs-selenium.png\" alt=\"Puppeteer vs. Selenium Speed Test\" class=\"wp-image-505388\" title=\"\" srcset=\"https:\/\/oneproxy.pro\/wp-content\/uploads\/2024\/05\/pupeteer-vs-selenium.png 600w, https:\/\/oneproxy.pro\/wp-content\/uploads\/2024\/05\/pupeteer-vs-selenium-150x100.png 150w, https:\/\/oneproxy.pro\/wp-content\/uploads\/2024\/05\/pupeteer-vs-selenium-18x12.png 18w\" sizes=\"auto, (max-width: 600px) 100vw, 600px\" \/><\/figure>\n\n\n\n<p>The chart below illustrates the performance difference between Puppeteer and Selenium:<\/p>\n\n\n\n<p>Scaling up Puppeteer applications for projects requiring fast, efficient web scraping is the optimal choice in this context.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Puppeteer vs. Selenium: Which Is Better?<\/h2>\n\n\n\n<p>So which one is better between Selenium and Puppeteer for scraping? There isn&#8217;t a direct answer to that question since it depends on multiple factors, such as long-term library support, cross-browser support, and your web scraping needs.<\/p>\n\n\n\n<p>Puppeteer is faster, but compared to Selenium, it supports fewer browsers. Selenium also supports more programming languages compared to Puppeteer.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Conclusion<\/h2>\n\n\n\n<p>Although using Puppeteer or Selenium is a good option for web scraping, scaling up and optimizing your web scraping project can be challenging because advanced anti-bot measures can detect and block these libraries. The best way to avoid this is by using a web scraping API, like OneProxy.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Using Puppeteer with Proxy Servers<\/h3>\n\n\n\n<p>To use Puppeteer with a proxy server, you can pass the proxy settings in the <code>args<\/code> option when launching the browser instance. Here\u2019s an example:<\/p>\n\n\n\n<div class=\"hcb_wrap\"><pre class=\"prism line-numbers lang-js\" data-lang=\"JavaScript\"><code>const puppeteer = require(&#39;puppeteer&#39;);\n\nasync function main() {\n    const proxyServer = &#39;http:\/\/your-proxy-server:port&#39;;\n    \n    const browser = await puppeteer.launch({\n        headless: true,\n        args: [`--proxy-server=${proxyServer}`]\n    });\n\n    const page = await browser.newPage();\n    await page.goto(&#39;https:\/\/example.com&#39;, { waitUntil: &#39;networkidle2&#39; });\n\n    \/\/ Perform your web scraping tasks here\n\n    await browser.close();\n}\n\nmain();<\/code><\/pre><\/div>\n\n\n\n<h3 class=\"wp-block-heading\">Using Selenium with Proxy Servers<\/h3>\n\n\n\n<p>To use Selenium with a proxy server, you can set the proxy options using the <code>webdriver.Proxy<\/code> class. Here\u2019s an example:<\/p>\n\n\n\n<div class=\"hcb_wrap\"><pre class=\"prism line-numbers lang-python\" data-lang=\"Python\"><code>from selenium import webdriver\nfrom selenium.webdriver.common.proxy import Proxy, ProxyType\n\nproxy = Proxy()\nproxy.proxy_type = ProxyType.MANUAL\nproxy.http_proxy = &quot;your-proxy-server:port&quot;\nproxy.ssl_proxy = &quot;your-proxy-server:port&quot;\n\ncapabilities = webdriver.DesiredCapabilities.CHROME\nproxy.add_to_capabilities(capabilities)\n\noptions = webdriver.ChromeOptions()\noptions.add_argument(&quot;--headless&quot;)\n\ndriver = webdriver.Chrome(desired_capabilities=capabilities, options=options)\ndriver.get(&quot;https:\/\/example.com&quot;)\n\n# Perform your web scraping tasks here\n\ndriver.quit()<\/code><\/pre><\/div>\n\n\n\n<p>Using proxy servers with Puppeteer and Selenium can help bypass IP-based restrictions and reduce the risk of getting blocked, enhancing the efficiency of your web scraping tasks. <a href=\"https:\/\/oneproxy.pro\/services\/rotating-proxies\/\">OneProxy&#8217;s rotating proxies<\/a> can further optimize this process, providing a seamless scraping experience.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Are you trying to decide between Puppeteer and Selenium for web scraping? Both are powerful browser automation frameworks, and making the right choice depends on your specific scraping needs and available resources. To help you make an informed decision, we&#8217;ve highlighted the key differences between Puppeteer and Selenium in the table below. Afterward, we will [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":505389,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"content-type":"","inline_featured_image":false,"footnotes":""},"categories":[92],"tags":[],"class_list":["post-505385","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-info"],"acf":{"faq_title":"Frequently Asked Questions (FAQ)","faq_items":[{"question":"What are Puppeteer and Selenium?","answer":"<span>Puppeteer and Selenium are both browser automation frameworks used for web scraping, testing, and automating browser tasks. Puppeteer is a Node.js library that controls Chrome or Chromium over the DevTools Protocol, while Selenium is an open-source tool that supports various browsers and programming languages through its WebDriver API.<\/span>"},{"question":"Which is faster, Puppeteer or Selenium?","answer":"<span>Puppeteer is generally faster than Selenium. However, the speed difference can vary depending on the specific tasks and configurations used in your web scraping or automation projects.<\/span>"},{"question":"What are the main advantages of Puppeteer?","answer":"<span>Puppeteer is known for its ease of use, speed, and ability to automate tasks in headless mode by default. It supports Chromium and has a strong event-driven architecture that eliminates the need for manual sleep calls in code.<\/span>"},{"question":"What are the limitations of Puppeteer?","answer":"<span>Puppeteer supports fewer browsers compared to Selenium and primarily focuses on JavaScript, though unofficial ports for other languages like Python and PHP exist.<\/span>"},{"question":"How can I use Puppeteer with a proxy server?","answer":"<span>You can configure Puppeteer to use a proxy server by passing the proxy settings in the <\/span><code>args<\/code><span> option when launching the browser.<\/span>"},{"question":"What are the main advantages of Selenium?","answer":"<span>Selenium supports multiple programming languages (Python, Java, JavaScript, Ruby, C#) and can automate various browsers, including Firefox, Edge, Safari, and custom browsers like QtWebKit. It also allows for extensive scalability through techniques like setting up cloud servers with different browser settings.<\/span>"},{"question":"What are the limitations of Selenium?","answer":"<span>Selenium can be more complex to set up compared to Puppeteer, especially when configuring it for different browsers and environments.<\/span>"},{"question":"How can I use Selenium with a proxy server?","answer":"<span>You can set up a proxy server in Selenium using the <\/span><code>webdriver.Proxy<\/code> <span>class.<\/span>"},{"question":"How was the speed comparison between Puppeteer and Selenium conducted?","answer":"<span>We ran the same web scraping tasks on the Danube-store sandbox using both Puppeteer and Selenium. Each script was executed 20 times, and the average execution times were calculated to compare the performance.<\/span>"},{"question":"What were the results of the speed comparison?","answer":"<span>The results showed that Puppeteer is about 60% faster than Selenium, making it a better choice for high-speed web scraping and automation tasks.<\/span>"},{"question":"How can I avoid getting blocked by anti-bot measures?","answer":"<span>OneProxy can help you avoid getting blocked. OneProxy handles anti-bot bypassing, provides rotating proxies, headless browsers, automatic retries, and more, ensuring a seamless web scraping experience.<\/span>"}]},"_links":{"self":[{"href":"https:\/\/oneproxy.pro\/in\/wp-json\/wp\/v2\/posts\/505385","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/oneproxy.pro\/in\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/oneproxy.pro\/in\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/oneproxy.pro\/in\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/oneproxy.pro\/in\/wp-json\/wp\/v2\/comments?post=505385"}],"version-history":[{"count":3,"href":"https:\/\/oneproxy.pro\/in\/wp-json\/wp\/v2\/posts\/505385\/revisions"}],"predecessor-version":[{"id":505393,"href":"https:\/\/oneproxy.pro\/in\/wp-json\/wp\/v2\/posts\/505385\/revisions\/505393"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/oneproxy.pro\/in\/wp-json\/wp\/v2\/media\/505389"}],"wp:attachment":[{"href":"https:\/\/oneproxy.pro\/in\/wp-json\/wp\/v2\/media?parent=505385"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/oneproxy.pro\/in\/wp-json\/wp\/v2\/categories?post=505385"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/oneproxy.pro\/in\/wp-json\/wp\/v2\/tags?post=505385"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}