Sunday, May 31, 2020

Web Scraping Using Selenium - Explicit Wait for Presence of An Element

Selenium Wait - Explicit Wait For Presence of An Element

Waiting in selenium can be done in different ways. In this tutorial, we will use the explicit wait functionality for presence of an element given its locator. This is a way to check if the element already loaded.
But before that, please make sure you have read the first blog on this series to do the prerequisites.

Selenium Explicit Wait For Presence of An Elements Given Its Locator

  1. Create a file seleniumwaitpresenceone.py and paste the following codes
    from selenium import webdriver
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.support import expected_conditions as EC
    from selenium.common.exceptions import TimeoutException
    from datetime import datetime
    
    The codes above imports the required library that we will use.
  2. Add this line
    driver = webdriver.Firefox(executable_path="geckodriver.exe")
    
    The code above will create a webdriver instance for Firefox.
  3. Add this line
    driver.get("https://slackingslacker.github.io/seleniumindex")
    
    The line will got to the website (https://slackingslacker.github.io/seleniumindex).
  4. Add this function as is
    def wait_for_the_element(wait_time: int, el_id: str):
        try:
            print("[{}] Finding element {}".format(str(datetime.now()), el_id))
            WebDriverWait(driver, wait_time).until(
                EC.presence_of_all_elements_located((By.ID, el_id))
            )
            print("[{}] Element found".format(str(datetime.now())))
        except TimeoutException as e:
            print("[{}] Element did not load".format(str(datetime.now())))
    
    This method will wait for an element using locator to load at a given waiting time. It will print a message if the element loaded or not.
  5. Add this line
    wait_for_the_element(3, "navMenuId")
    
    This line will call the method we created and will display Element found.
  6. Add this line
    wait_for_the_element(6, "noneExistentId")
    
    This line will call the method we created and wait for 6 seconds until it gives an error.
  7. Add this line
    wait_for_the_element(9, "anotherNoneExistentId")
    
    Again will call the method with a non existing element this time it is 9 seconds.
  8. Add this line
    driver.close()
    
    The line will close the webdriver as well as the browser.
  9. Run the seleniumwaitpresenceone.py. It should do the following:
    • Open the firefox browser
    • Browser goes to https://slackingslacker.github.io/seleniumindex
    • Call the Method 3 times which prints messages in the console
    • Closes the browser
 

Program Sample Output

[2020-06-01 13:52:53.817328] Finding element navMenuId
[2020-06-01 13:52:53.886479] Element found
[2020-06-01 13:52:53.886479] Finding element noneExistentId
[2020-06-01 13:53:00.019786] Element did not load
[2020-06-01 13:53:00.019786] Finding element anotherNoneExistentId
[2020-06-01 13:53:09.055716] Element did not load
Output explanations
  1. The code looks for an element given the id navMenuId
  2. The code found the element
  3. The code looks for an element given the id noneExistentId
  4. The code does not find the element within 6 seconds
  5. The code looks for an element given the id anotherNoneExistentId
  6. The code does not find the element within 9 seconds
As you may have noticed that the waiting time varies to what you have supplied to the WebDriverWait class.
 

Final Selenium Code

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
from datetime import datetime

driver = webdriver.Firefox(executable_path="geckodriver.exe")
driver.get("https://slackingslacker.github.io/seleniumindex")

def wait_for_the_element(wait_time: int, el_id: str):
    try:
        print("[{}] Finding element {}".format(str(datetime.now()), el_id))
        WebDriverWait(driver, wait_time).until(
            EC.presence_of_all_elements_located((By.ID, el_id))
        )
        print("[{}] Element found".format(str(datetime.now())))
    except TimeoutException as e:
        print("[{}] Element did not load".format(str(datetime.now())))

wait_for_the_element(3, "navMenuId")
wait_for_the_element(6, "noneExistentId")
wait_for_the_element(9, "anotherNoneExistentId")
driver.close()

 

Conclusion

Waiting time in selenium can be set to wait for an element given its locator.
 

Web Scraping Using Selenium - Explicit Wait for Presence of Any Elements

Selenium Wait - Explicit Wait For Presence of Any Elements

Waiting in selenium can be done in different ways. In this tutorial, we will use the explicit wait functionality for presence of any elements given a locator. This is a way to check if the elements already load.
But before that, please make sure you have read the first blog on this series to do the prerequisites.

Selenium Explicit Wait For Presence of Any Elements Given a Locator

  1. Create a file seleniumwaitpresenceall.py and paste the following codes
    from selenium import webdriver
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.support import expected_conditions as EC
    from selenium.common.exceptions import TimeoutException
    from datetime import datetime
    
    The codes above imports the required library that we will use.
  2. Add this line
    driver = webdriver.Firefox(executable_path="geckodriver.exe")
    
    The code above will create a webdriver instance for Firefox.
  3. Add this line
    driver.get("https://slackingslacker.github.io/seleniumindex")
    
    The line will got to the website (https://slackingslacker.github.io/seleniumindex).
  4. Add this function as is
    def wait_for_the_elements(wait_time: int, el_name: str):
        try:
            print("[{}] Finding element {}".format(str(datetime.now()), el_name))
            WebDriverWait(driver, wait_time).until(
                EC.presence_of_all_elements_located((By.TAG_NAME, el_name))
            )
            print("[{}] Element found".format(str(datetime.now())))
        except TimeoutException as e:
            print("[{}] Element did not load".format(str(datetime.now())))
    
    This method will wait for the any element given a locator to load at a given waiting time. It will print a message if any element loaded or not.
  5. Add this line
    wait_for_the_elements(3, "nav")
    
    This line will call the method we created and will display Element found.
  6. Add this line
    wait_for_the_elements(6, "table")
    
    This line will call the method we created and wait for 6 seconds until it gives an error.
  7. Add this line
    wait_for_the_elements(9, "ul")
    
    Again will call the method with a non existing element this time it is 9 seconds.
  8. Add this line
    driver.close()
    
    The line will close the webdriver as well as the browser.
  9. Run the seleniumwaitpresenceall.py. It should do the following:
    • Open the firefox browser
    • Browser goes to https://slackingslacker.github.io/seleniumindex
    • Call the Method 3 times which prints messages in the console
    • Closes the browser
 

Program Sample Output

[2020-06-01 01:02:57.360303] Finding element nav
[2020-06-01 01:02:57.375942] Element found
[2020-06-01 01:02:57.375942] Finding element table
[2020-06-01 01:03:03.456108] Element did not load
[2020-06-01 01:03:03.456108] Finding element ul
[2020-06-01 01:03:12.497395] Element did not load
Output explanations
  1. The code looks for any element given the tag nav
  2. The code found the element
  3. The code looks for any element given the tag table
  4. The code does not find the element within 6 seconds
  5. The code looks for any element given the tag ul
  6. The code does not find the element within 9 seconds
As you may have noticed that the waiting time varies to what you have supplied to the WebDriverWait class.
 

Final Selenium Code

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
from datetime import datetime

driver = webdriver.Firefox(executable_path="geckodriver.exe")
driver.get("https://slackingslacker.github.io/seleniumindex")

def wait_for_the_elements(wait_time: int, el_name: str):
    try:
        print("[{}] Finding element {}".format(str(datetime.now()), el_name))
        WebDriverWait(driver, wait_time).until(
            EC.presence_of_all_elements_located((By.TAG_NAME, el_name))
        )
        print("[{}] Element found".format(str(datetime.now())))
    except TimeoutException as e:
        print("[{}] Element did not load".format(str(datetime.now())))

wait_for_the_elements(3, "nav")
wait_for_the_elements(6, "table")
wait_for_the_elements(9, "ul")
driver.close()

 

Conclusion

Waiting time in selenium can be set to wait for any element given a locator.
 

Web Scraping Using Selenium - Explicit Wait for Title Contains Words

Selenium Wait - Explicit Wait For Title Contains

Waiting in selenium can be done in different ways. In this tutorial, we will use the explicit wait functionality for title.
But before that, please make sure you have read the first blog on this series to do the prerequisites.

Selenium Explicit Wait For Title That Contains Words

  1. Create a file seleniumwaittitlecontains.py and paste the following codes
    from selenium import webdriver
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.support import expected_conditions as EC
    from selenium.common.exceptions import TimeoutException
    from datetime import datetime
    
    The codes above imports the required library that we will use.
  2. Add this line
    driver = webdriver.Firefox(executable_path="geckodriver.exe")
    
    The code above will create a webdriver instance for Firefox.
  3. Add this line
    driver.get("https://slackingslacker.github.io/seleniumindex")
    
    The line will got to the website (https://slackingslacker.github.io/seleniumindex).
  4. Add this function as is
    def wait_for_the_title(wait_time: int, doc_title: str):
        try:
            print("[{}] Waiting for title {}".format(str(datetime.now()), doc_title))
            WebDriverWait(driver, wait_time).until(EC.title_contains(doc_title))
            print("[{}] Title found".format(str(datetime.now())))
        except TimeoutException as e:
            print("[{}] Error waiting for title".format(str(datetime.now())))
    
    This method will wait for the a document title that contains a word or a phrase to load at a given waiting time. It will print a message if the document title loaded loaded or not.
  5. Add this line
    wait_for_the_title(3, "Do It Simpler")
    
    This line will call the method we created and will display Title found.
  6. Add this line
    wait_for_the_title(6, "Not the title")
    
    This line will call the method we created and wait for 6 seconds until it gives an error.
  7. Add this line
    wait_for_the_title(9, "Another wrong title")
    
    Again will call the method with a non existing element this time it is 9 seconds.
  8. Add this line
    driver.close()
    
    The line will close the webdriver as well as the browser.
  9. Run the seleniumwaittitlecontains.py. It should do the following:
    • Open the firefox browser
    • Browser goes to https://slackingslacker.github.io/seleniumindex
    • Call the Method 3 times which prints messages in the console
    • Closes the browser
 

Program Sample Output

[2020-06-01 00:13:13.537620] Waiting for title Do It Simpler
[2020-06-01 00:13:13.553244] Title found
[2020-06-01 00:13:13.553244] Waiting for title Not the title
[2020-06-01 00:13:19.581967] Error waiting for title
[2020-06-01 00:13:19.581967] Waiting for title Another wrong title
[2020-06-01 00:13:28.635654] Error waiting for title
Output explanations
  1. The code looks for the title that contains Do It Simpler
  2. The code found the title
  3. The code looks for the title that contains Not the title
  4. The code does not find the title within 6 seconds
  5. The code looks for the title that contains Another wrong title
  6. The code does not find the title within 9 seconds
As you may have noticed that the waiting time varies to what you have supplied to the WebDriverWait class.
 

Final Selenium Code

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
from datetime import datetime

driver = webdriver.Firefox(executable_path="geckodriver.exe")
driver.get("https://slackingslacker.github.io/seleniumindex")

def wait_for_the_title(wait_time: int, doc_title: str):
    try:
        print("[{}] Waiting for title {}".format(str(datetime.now()), doc_title))
        WebDriverWait(driver, wait_time).until(EC.title_contains(doc_title))
        print("[{}] Title found".format(str(datetime.now())))
    except TimeoutException as e:
        print("[{}] Error waiting for title".format(str(datetime.now())))

wait_for_the_title(3, "Do It Simpler")
wait_for_the_title(6, "Not the title")
wait_for_the_title(9, "Another wrong title")
driver.close()

 

Conclusion

Waiting time in selenium can be set to wait for a title that contains words or phrase.
 

Web Scraping Using Selenium - Navigation Using Browser History

Selenium Navigation - History

Navigations in selenium can be done in different ways. In this tutorial, we will use the back and forward button of the browser as a form of navigation.
But before that, please make sure you have read the first blog on this series to do the prerequisites.

Selenium Navigation History - Back and Forward

  1. Create a file seleniumnavhistory.py and paste the following codes
    from selenium import webdriver
    import time
    
    The codes above imports the required library that we will use.
  2. Add this line
    driver = webdriver.Firefox(executable_path="geckodriver.exe")
    
    The code above will create a webdriver instance for Firefox.
  3. Add this line
    driver.get("https://slackingslacker.github.io/seleniumindex")
    
    The line will execute a script to go to the webpage https://slackingslacker.github.io/seleniumindex.
  4. Add this line
    time.sleep(3)
    
    We will pause the program for 3 seconds.
  5. Add this line
    driver.get("https://slackingslacker.github.io/seleniumindex#/about")
    
    The line will execute a script to go to the webpage https://slackingslacker.github.io/seleniumindex#/about.
  6. Add this line
    time.sleep(3)
    
    We will pause the program for 3 seconds. At this time we have history in the browser.
  7. Add this line
    driver.back()
    
    The line will go back to the previous page which is https://slackingslacker.github.io/seleniumindex.
  8. Add this line
    time.sleep(10)
    
    We will pause the program for 10 seconds. You may have noticed that the current page is the main page.
  9. Add this line
    driver.forward()
    
    The line will go back to the next page which is https://slackingslacker.github.io/seleniumindex#/about.
  10. Add this line
    time.sleep(10)
    
    We will pause the program for 10 seconds. You may have noticed that the current page is the about page.
  11. Add this line
    driver.close()
    
    The line will close the webdriver as well as the browser.
  12. Run the seleniumnavhistory.py. It should do the following:
    • Open the firefox browser
    • Browser goes to https://slackingslacker.github.io/seleniumindex
    • Halts for 3 seconds
    • Browser goes to https://slackingslacker.github.io/seleniumindex#/about
    • Halts for 3 seconds
    • Browser goes to https://slackingslacker.github.io/seleniumindex using the back of history
    • Halts for 10 seconds
    • Browser goes to https://slackingslacker.github.io/seleniumindex#/about using the forward of history
    • Halts for 10 seconds
    • Closes the browser
 

Final Selenium Code

from selenium import webdriver
import time

driver = webdriver.Firefox(executable_path="geckodriver.exe")
driver.get("https://slackingslacker.github.io/seleniumindex")
time.sleep(3)
driver.get("https://slackingslacker.github.io/seleniumindex#/about")
time.sleep(3)

driver.back()
time.sleep(10)

driver.forward()
time.sleep(10)

driver.close()

 

Conclusion

Navigation in selenium can be done using browser history.
 

Web Scraping Using Selenium - Navigation Using Script Execution

Selenium Navigation - Using Javascript

Navigations in selenium can be done in different ways. In this tutorial, we will invoke a script to navigate to a different page.
But before that, please make sure you have read the first blog on this series to do the prerequisites.

Selenium Navigation Executing a Script

  1. Create a file seleniumnavscript.py and paste the following codes
    from selenium import webdriver
    import time
    
    The codes above imports the required library that we will use.
  2. Add this line
    driver = webdriver.Firefox(executable_path="geckodriver.exe")
    
    The code above will create a webdriver instance for Firefox.
  3. Add this line
    driver.execute_script("window.location.href='https://slackingslacker.github.io/seleniumindex#/about';")
    
    The line will execute a script to go to the webpage https://slackingslacker.github.io/seleniumindex#/about.
  4. Add this line
    time.sleep(10)
    
    We will pause the program for 10 seconds so that you can see that the page loaded.
  5. Add this line
    driver.close()
    
    The line will close the webdriver as well as the browser.
  6. Run the seleniumnavscript.py. It should do the following:
    • Open the firefox browser
    • Execute a script to navigate to a page
    • Halts for 10 seconds
    • Closes the browser
 

Final Selenium Code

from selenium import webdriver
import time

driver = webdriver.Firefox(executable_path="geckodriver.exe")
driver.execute_script("window.location.href='https://slackingslacker.github.io/seleniumindex#/about';")
time.sleep(10)

driver.close()

 

Conclusion

Navigation in selenium can be done using script execution.
 

Web Scraping Using Selenium - Navigation Using Form Submit

Selenium Navigation - Form Submission

Navigations in selenium can be done in different ways. In this tutorial, we will use the form submission functionality.
But before that, please make sure you have read the first blog on this series to do the prerequisites.

Selenium Navigation Using Form Element Submit

  1. Create a file seleniumnavform.py and paste the following codes
    from selenium import webdriver
    import time
    
    The codes above imports the required library that we will use.
  2. Add this line
    driver = webdriver.Firefox(executable_path="geckodriver.exe")
    
    The code above will create a webdriver instance for Firefox.
  3. Add this line
    driver.get("https://slackingslacker.github.io/seleniumindex#/forms")
    
    The line will go to the website (https://slackingslacker.github.io/seleniumindex#/forms).
  4. Add this line
    time.sleep(5)
    
    We will pause the program for 5 seconds.
  5. Add this line
    el = driver.find_element_by_css_selector("div > form")
    
    This code will find an element using a CSS selector. The code will find a form element that is a direct child of a div.
  6. Add this line
    el.submit()
    
    This will submit form.
  7. Add this line
    time.sleep(10)
    
    We will pause the program for 10 seconds to see that it submitted the form.
  8. Add this line
    driver.close()
    
    The line will close the webdriver as well as the browser.
  9. Run the seleniumnavform.py. It should do the following:
    • Open the firefox browser
    • Browser goes to https://slackingslacker.github.io/seleniumindex#/forms
    • Halts for 5 seconds
    • Finds the form tag
    • Submits the form without click a button
    • Halts for 10 seconds
    • Closes the browser
 

Final Selenium Code

from selenium import webdriver
import time

driver = webdriver.Firefox(executable_path="geckodriver.exe")
driver.get("https://slackingslacker.github.io/seleniumindex#/forms")
time.sleep(5)

el = driver.find_element_by_css_selector("div > form")
el.submit()
time.sleep(10)

driver.close()

 

Conclusion

Navigation in selenium can be done using submission of a form element.
 

Web Scraping Using Selenium - Navigation with Anchor Element

Selenium Navigation - Anchor Navigation

Navigations in selenium can be done in different ways. In this tutorial, we will use the click functionality of an anchor tag.
But before that, please make sure you have read the first blog on this series to do the prerequisites.

Selenium Navigation Using Anchor Element

  1. Create a file seleniumnavachor.py and paste the following codes
    from selenium import webdriver
    import time
    
    The codes above imports the required library that we will use.
  2. Add this line
    driver = webdriver.Firefox(executable_path="geckodriver.exe")
    
    The code above will create a webdriver instance for Firefox.
  3. Add this line
    driver.get("https://slackingslacker.github.io/seleniumindex")
    
    The line will got to the website (https://slackingslacker.github.io/seleniumindex).
  4. Add this line
    time.sleep(5)
    
    We will pause the program for 5 seconds.
  5. Add this line
    el = driver.find_element_by_css_selector("div[class='navbar-start'] > a:last-of-type")
    
    This code will find an element using a CSS selector. We will tackle this more in the future. The element that we are looking for is the About link in the Menu Bar at the top
  6. Add this line
    el.click()
    
    This will click the About in the menu.
  7. Add this line
    time.sleep(10)
    
    We will pause the program for 10 seconds to see that it loaded the about page.
  8. Add this line
    driver.close()
    
    The line will close the webdriver as well as the browser.
  9. Run the seleniumnavachor.py. It should do the following:
    • Open the firefox browser
    • Browser goes to https://slackingslacker.github.io/seleniumindex
    • Halts for 5 seconds
    • Finds the anchor tag for the About in the top menu
    • Clicks the link
    • Halts for 10 seconds
    • Closes the browser
 

Final Selenium Code

from selenium import webdriver
import time

driver = webdriver.Firefox(executable_path="geckodriver.exe")
driver.get("https://slackingslacker.github.io/seleniumindex")
time.sleep(5)

el = driver.find_element_by_css_selector("div[class='navbar-start'] > a:last-of-type")
el.click()
time.sleep(10)

driver.close()

 

Conclusion

Navigation in selenium can be done using an anchor element.
 

Tuesday, May 26, 2020

Web Scraping Using Selenium - Explicit Wait for Title Element

Selenium Wait - Explicit Wait For Title

Waiting in selenium can be done in different ways. In this tutorial, we will use the explicit wait functionality for title. Good example is loading a blog or a product page with a specific title.
But before that, please make sure you have read the first blog on this series to do the prerequisites.

Selenium Explicit Wait For Specific Title

  1. Create a file seleniumwaittitle.py and paste the following codes
    from selenium import webdriver
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.support import expected_conditions as EC
    from selenium.common.exceptions import TimeoutException
    from datetime import datetime
    
    The codes above imports the required library that we will use.
  2. Add this line
    driver = webdriver.Firefox(executable_path="geckodriver.exe")
    
    The code above will create a webdriver instance for Firefox.
  3. Add this line
    driver.get("https://slackingslacker.github.io/seleniumindex")
    
    The line will got to the website (https://slackingslacker.github.io/seleniumindex).
  4. Add this function as is
    def wait_for_the_title(wait_time: int, doc_title: str):
        try:
            print("[{}] Waiting for title {}".format(str(datetime.now()), doc_title))
            WebDriverWait(driver, wait_time).until(EC.title_is(doc_title))
            print("[{}] Title found".format(str(datetime.now())))
        except TimeoutException as e:
            print("[{}] Error waiting for title".format(str(datetime.now())))
    
    This method will wait for the an specific document title to load at a given waiting time. It will print a message if the document title loaded loaded or not.
  5. Add this line
    wait_for_the_title(3, "Do It Simpler - VUE - Bulma For Scraping")
    
    This line will call the method we created and will display Title found.
  6. Add this line
    wait_for_the_title(6, "Not the title")
    
    This line will call the method we created and wait for 6 seconds until it gives an error.
  7. Add this line
    wait_for_the_title(9, "Another wrong title")
    
    Again will call the method with a non existing element this time it is 9 seconds.
  8. Add this line
    driver.close()
    
    The line will close the webdriver as well as the browser.
  9. Run the seleniumwaittitle.py. It should do the following:
    • Open the firefox browser
    • Browser goes to https://slackingslacker.github.io/seleniumindex
    • Call the Method 3 times which prints messages in the console
    • Closes the browser
 

Program Sample Output

[2020-05-31 23:58:37.029019] Waiting for title Do It Simpler - VUE - Bulma For Scraping
[2020-05-31 23:58:37.044668] Title found
[2020-05-31 23:58:37.044668] Waiting for title Not the title
[2020-05-31 23:58:43.274813] Error waiting for title
[2020-05-31 23:58:43.274813] Waiting for title Another wrong title
[2020-05-31 23:58:52.510780] Error waiting for title
Output explanations
  1. The code looks for the title that is equal to Do It Simpler - VUE - Bulma For Scraping
  2. The code found the title
  3. The code looks for the title that is equal to Not the title
  4. The code does not find the title within 6 seconds
  5. The code looks for the title that is equal to Another wrong title
  6. The code does not find the title within 9 seconds
As you may have noticed that the waiting time varies to what you have supplied to the WebDriverWait class.
 

Final Selenium Code

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
from datetime import datetime

driver = webdriver.Firefox(executable_path="geckodriver.exe")
driver.get("https://slackingslacker.github.io/seleniumindex")

def wait_for_the_title(wait_time: int, doc_title: str):
    try:
        print("[{}] Waiting for title {}".format(str(datetime.now()), doc_title))
        WebDriverWait(driver, wait_time).until(EC.title_is(doc_title))
        print("[{}] Title found".format(str(datetime.now())))
    except TimeoutException as e:
        print("[{}] Error waiting for title".format(str(datetime.now())))

wait_for_the_title(3, "Do It Simpler - VUE - Bulma For Scraping")
wait_for_the_title(6, "Not the title")
wait_for_the_title(9, "Another wrong title")
driver.close()
 

Conclusion

Waiting time in selenium can be set to wait for a specific title.
 

Monday, May 25, 2020

Web Scraping Using Selenium - Explicit Wait

Selenium Wait - Explicit Wait

Waiting in selenium can be done in different ways. In this tutorial, we will use the explicit wait functionality.
But before that, please make sure you have read the first blog on this series to do the prerequisites.

Selenium Explicit Wait

  1. Create a file seleniumexplicitwait.py and paste the following codes
    from selenium import webdriver
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.support import expected_conditions as EC
    from datetime import datetime
    
    The codes above imports the required library that we will use.
  2. Add this line
    driver = webdriver.Firefox(executable_path="geckodriver.exe")
    
    The code above will create a webdriver instance for Firefox.
  3. Add this line
    driver.get("https://slackingslacker.github.io/seleniumindex")
    
    The line will got to the website (https://slackingslacker.github.io/seleniumindex).
  4. Add this function as is
    def wait_for_the_element(wait_time: int, el_id: str):
        try:
            print("[{}] Finding element {}".format(str(datetime.now()), el_id))
            WebDriverWait(driver, wait_time).until(
                EC.presence_of_element_located((By.ID, el_id))
            )
            print("[{}] Element found".format(str(datetime.now())))
        except Exception as e:
            print("[{}] Element did not load".format(str(datetime.now())))
    
    This method will wait for the element to load at a given waiting time. It will print a message if the element loaded or not.
  5. Add this line
    wait_for_the_element(3, "navMenuId")
    
    This line will call the method we created and will display Element found.
  6. Add this line
    wait_for_the_element(6, "noneExistentId")
    
    This line will call the method we created and wait for 6 seconds until it gives an error.
  7. Add this line
    wait_for_the_element(9, "anotherNoneExistentId")
    
    Again will call the method with a non existing element this time it is 9 seconds.
  8. Add this line
    driver.close()
    
    The line will close the webdriver as well as the browser.
  9. Run the seleniumexplicitwait.py. It should do the following:
    • Open the firefox browser
    • Browser goes to https://slackingslacker.github.io/seleniuminde
    • Call the Method 3 times which prints messages in the console
    • Closes the browser
 

Program Sample Output

[2020-05-31 23:55:38.889027] Finding element navMenuId
[2020-05-31 23:55:38.920301] Element found
[2020-05-31 23:55:38.920301] Finding element noneExistentId
[2020-05-31 23:55:44.931274] Element did not load
[2020-05-31 23:55:44.931274] Finding element anotherNoneExistentId
[2020-05-31 23:55:53.968452] Element did not load
Output explanations
  1. The code looks for an element with id navMenuId
  2. The code found the element
  3. The code looks for an element with id noneExistentId
  4. The code does not find the element within 6 seconds
  5. The code looks for an element with id anotherNoneExistentId
  6. The code does not find the element within 9 seconds
As you may have noticed that the waiting time varies to what you have supplied to the WebDriverWait class.
 

Final Selenium Code

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from datetime import datetime

driver = webdriver.Firefox(executable_path="geckodriver.exe")
driver.get("https://slackingslacker.github.io/seleniumindex")

def wait_for_the_element(wait_time: int, el_id: str):
    try:
        print("[{}] Finding element {}".format(str(datetime.now()), el_id))
        WebDriverWait(driver, wait_time).until(
            EC.presence_of_element_located((By.ID, el_id))
        )
        print("[{}] Element found".format(str(datetime.now())))
    except Exception as e:
        print("[{}] Element did not load".format(str(datetime.now())))

wait_for_the_element(3, "navMenuId")
wait_for_the_element(6, "noneExistentId")
wait_for_the_element(9, "anotherNoneExistentId")
driver.close()
 

Conclusion

Waiting time in selenium can be set per element.
 

Web Scraping Using Selenium - Implicit Wait

Selenium Wait - Implicit Wait

Waiting in selenium can be done in different ways. In this tutorial, we will use the implicit wait functionality.
But before that, please make sure you have read the first blog on this series to do the prerequisites.

Selenium Implicit Wait

  1. Create a file seleniumimplicitwait.py and paste the following codes
    from selenium import webdriver
    from datetime import datetime
    
    The codes above imports the required library that we will use.
  2. Add this line
    driver = webdriver.Firefox(executable_path="geckodriver.exe")
    
    The code above will create a webdriver instance for Firefox.
  3. Add this line
    driver.implicitly_wait(5)
    
    This line will set the waiting time for the element to load. It is set to 5 seconds.
  4. Add this line
    driver.get("https://slackingslacker.github.io/seleniumindex")
    
    The line will got to the website (https://slackingslacker.github.io/seleniumindex).
  5. Add this function as is
    def find_the_element(selector: str):
        try:
            print("[{}] Finding element {}".format(str(datetime.now()), selector))
            driver.find_element_by_css_selector(selector)
            print("[{}] Element found".format(str(datetime.now())))
        except Exception as e:
            print("[{}] Error is {}".format(str(datetime.now()), str(e)))
    
    This method will check for the element and will handle the error if the element did not load or do not exists in the given waiting time.
  6. Add this line
    find_the_element("nav[role='navigation']")
    
    This line will call the method we created and checks if the CSS selector exists or wait for the given time.
  7. Add this line
    find_the_element("#noneExistentId")
    
    Again will call the method with a non existing CSS selector.
  8. Add this line
    find_the_element("#anotherNoneExistentId")
    
    Again will call the method with a non existing CSS selector.
  9. Add this line
    driver.close()
    
    The line will close the webdriver as well as the browser.
  10. Run the seleniumimplicitwait.py. It should do the following:
    • Open the firefox browser
    • Sets the implicit wait to 5 seconds
    • Browser goes to https://slackingslacker.github.io/seleniuminde
    • Call the Method 3 times which prints messages in the console
    • Closes the browser
 

Program Sample Output

[2020-05-31 22:20:55.018838] Finding element nav[role='navigation']
[2020-05-31 22:20:55.034466] Element found
[2020-05-31 22:20:55.034466] Finding element #noneExistentId
[2020-05-31 22:21:00.058297] Error is Message: Unable to locate element: #noneExistentId
[2020-05-31 22:21:00.058297] Finding element #anotherNoneExistentId
[2020-05-31 22:21:05.074049] Error is Message: Unable to locate element: #anotherNoneExistentId
Output explanations
  1. The code looks for the selector nav[role='navigation']
  2. The code found the CSS selector
  3. The code looks for the selector #noneExistentId
  4. The code does not find the CSS selector within 5 seconds
  5. The code looks for the selector #anotherNoneExistentId
  6. The code does not find the CSS selector within 5 seconds
As you may have noticed that the waiting time is constant to 5 seconds as we have set earlier.
 

Final Selenium Code

from selenium import webdriver
from datetime import datetime

driver = webdriver.Firefox(executable_path="geckodriver.exe")
driver.implicitly_wait(5)
driver.get("https://slackingslacker.github.io/seleniumindex")

def find_the_element(selector: str):
    try:
        print("[{}] Finding element {}".format(str(datetime.now()), selector))
        driver.find_element_by_css_selector(selector)
        print("[{}] Element found".format(str(datetime.now())))
    except Exception as e:
        print("[{}] Error is {}".format(str(datetime.now()), str(e)))

find_the_element("nav[role='navigation']")
find_the_element("#noneExistentId")
find_the_element("#anotherNoneExistentId")
driver.close()
 

Conclusion

Waiting time in selenium can be set to a constant value throughout the selenium life cycle.
 

Saturday, May 23, 2020

Web Scraping Using Selenium - Navigation

Selenium Navigation - driver.get

Navigations in selenium can be done in different ways. In this tutorial, we will use the driver.get functionality.
But before that, please make sure you have read the first blog on this series to do the prerequisites.

Selenium Navigation Code Using driver.get

  1. Create a file seleniumnav.py and paste the following codes
    from selenium import webdriver
    import time
    
    The codes above imports the required library that we will use.
  2. Add this line
    driver = webdriver.Firefox(executable_path="geckodriver.exe")
    
    The code above will create a webdriver instance for Firefox.
  3. Add this line
    driver.get("https://www.google.com")
    
    The line will got to the website (https://www.google.com).
  4. Add this line
    time.sleep(10)
    
    We will pause the program for 10 seconds.
  5. Add this line
    driver.get("https://slackingslacker.github.io/seleniumindex")
    
    The line will got to the website (https://slackingslacker.github.io/seleniumindex). This is way to navigate to different pages of the site or different websites.
  6. Add this line
    time.sleep(10)
    
    We will pause the program for 10 seconds.
  7. Add this line
    driver.close()
    
    The line will close the webdriver as well as the browser.
  8. Run the seleniumnav.py. It should do the following:
    • Open the firefox browser
    • Browser goes to www.google.com
    • Halts for 10 seconds
    • Browser goes to https://slackingslacker.github.io/seleniumindex
    • Halts for 10 seconds
    • Closes the browser
 

Final Selenium Code

from selenium import webdriver
import time

driver = webdriver.Firefox(executable_path="geckodriver.exe")
driver.get("https://www.google.com")
time.sleep(10)

driver.get("https://slackingslacker.github.io/seleniumindex")
time.sleep(10)

driver.close()

 

Conclusion

Navigation from page to page or website to website can be done using driver.get functionality
 

Saturday, May 16, 2020

Web Scraping Using Selenium - Intro

Selenium Introduction

The purpose of these series is to use the different functionalities in the selenium documentation. Each functionality will have a different example in order for us to better understand when and where to use that functionality.
 

Background

Selenium is usually used for automated testing but can also be used for scraping websites. Since most modern websites are created as Single-Page Applications (SPA), the page is lazy loaded. It means that the basic structure like HTML and CSS are loaded first before the data were loaded. The data were loaded afterwards through API calls and Javascript. Scrapers such as requests library from Python (tutorials can be found here) or guzzle from PHP cannot directly interact with javascripts. There can be workarounds to handle those javascript interactions but sometimes it is a dead end. Selenium solves this kind of problems by interacting to the website using browser so it is just like a person controlling the website. Selenium is mostly used for testing website and also can be user for scraping.
 

Getting the Softwares required for Selenium

We will now start to code. But first lets make sure that we have the required.
  1. Install python. You can download python here depending on your OS. The installation of python will depend on the OS that you are using
  2. Install the required library, in this case requests.You can run the command.
    pip install selenium
    or
    python -m pip install selenium
  3. (Optional) You can download pycharm here as to make your coding faster. I will be using pycharm in doing these tutorials but you can also use notepad and command lines.
  4. Download the browser drivers and paste it to the directory where it is accessible to the app. you can it paste the library later. I will be using firefox for most of the tutorials as the geckodriver and firefox are compatible even when the browser were updated.
  5. (Optional) Install the chrome and firefox browsers.
 

Coding My First Selenium Program

  1. Create a directory where you will put your codes
  2. Copy the drivers that you have downloaded and paste them in the directory you've created.
  3. Create a file seleniumintro.py and paste the following codes
    from selenium import webdriver
    import time
    
    The codes above imports the required library that we will use.
  4. Add this line
    driver = webdriver.Firefox(executable_path="geckodriver.exe")
    
    The code above will create a webdriver instance for Firefox
  5. Add this line
    driver.get("https://slackingslacker.github.io/seleniumindex")
    
    The line will got to the website (https://slackingslacker.github.io/seleniumindex).
  6. Add this line
    time.sleep(5)
    
    We will pause the program for 5 seconds. This is to ensure that you can see whats happening in the browser.
  7. Add this line
    driver.close()
    
    The line will close the webdriver as well as the browser.
  8. Add this line
    driver = webdriver.Chrome(executable_path="chromedriver.exe")
    
    This time we are going to use chrome browser to access the website.
  9. Add this line
    driver.get("https://slackingslacker.github.io/seleniumindex")
    
    The line will got to the website (https://slackingslacker.github.io/seleniumindex) on the chrome.
  10. Add this line
    time.sleep(5)
    
    We will again pause the program for 5 seconds.
  11. Add this line
    driver.close()
    
    The line will close the webdriver as well as the browser.
  12. Run the seleniumsimple.py.
    python seleniumintro.py
    or Run on pycharm
     
    It should do the following:
    1. Opens the firefox browser.(assuming you have firefox installed.)
    2. Browser goes to the website https://slackingslacker.github.io/seleniumindex
    3. Halts for 5 seconds
    4. Close firefox browser
    5. Opens the chrome browser (assuming you have chrome installed.)
    6. Browser goes to the website https://slackingslacker.github.io/seleniumindex
    7. Wait for 5 seconds
    8. Close chrome browser
 

Final Selenium Code

from selenium import webdriver
import time

driver = webdriver.Firefox(executable_path="geckodriver.exe")
driver.get("https://slackingslacker.github.io/seleniumindex")
time.sleep(5)
driver.close()

driver = webdriver.Chrome(executable_path="chromedriver.exe")
driver.get("https://slackingslacker.github.io/seleniumindex")
time.sleep(5)
driver.close()

 

Conclusion

Using selenium, we can open the browser and automatically use the functionalities of a website. With just a few lines of codes, we can easily use selenium.
 

Tuesday, May 5, 2020

Web Scraping Using Requests - Query Selectors/CSS Selectors

Requests and Beautifulsoup Searching Using CSS Selectors

We will use beautifulSoup to get different tags using different selectors.
 
Before we proceed, please make sure you have read the first and second blogs on this series to do the prerequisites.
 

Lets start to code.

  1. Create a file simpleselectors.py and paste the following codes. We will do it differently to print the tags that we will find using a method.
    from bs4 import BeautifulSoup
    import requests
    
    url = "https://slackingslacker.github.io/simpleselectors.html"
    html_doc = requests.get(url).text
    soup = BeautifulSoup(html_doc, "html.parser")
    
    
    def find_by_selector(selector: str):
        elements = soup.select(selector)
        for el in elements:
            print("============================")
            print("name : " + el.name)
            print("attributes : " + str(el.attrs))
            print(el)
    
  2. Add the code below.
    find_by_selector(".column")
    
    Run the code. It should print all the div tag elements that has the class attribute with value column
  3. Comment the recently added line by putting # at start of the line.
    find_by_selector(".table.is-narrow")
    
    It should print all table tags that has the class attribute with value table and is-narrow.
  4. Comment the recently added line and add the line below then run the code.
    find_by_selector(".columns .table")
    
    It should print all table tags that has the class attribute with value table and must be a child of an element (div) with class attribute and value columns.
  5. Comment the recently added line and add the line below then run the code.
    find_by_selector("#total")
    
    It should print a span tag that has the id attribute with value total.
  6. Comment the recently added line and add the line below then run the code.
    find_by_selector("div")
    
    It should print all div tags.
  7. Comment the recently added line and add the line below then run the code.
    find_by_selector("td.has-text-link")
    
    It should print all td tags that has the class attribute with value has-text-link.
  8. Comment the recently added line and add the line below then run the code.
    find_by_selector("b,i")
    
    It should print all b and i tags.
  9. Comment the recently added line and add the line below then run the code.
    find_by_selector("table th")
    
    It should print all th tags under the table tags.
  10. Comment the recently added line and add the line below then run the code.
    find_by_selector("div > span")
    
    It should print all span tags that is a direct child of div tags.
  11. Comment the recently added line and add the line below then run the code.
    find_by_selector("span~p")
    
    It should print all p tags that is preceeded by span tags.
  12. Comment the recently added lines and add the line below then run the code.
    find_by_selector("[colspan]")
    
    It should print all tags with colspan attribute.
  13. Comment the recently added lines and add the line below then run the code.
    find_by_selector("[colspan='2']")
    
    It should print all tags with colspan attribute that has a value of 2.
  14. Comment the recently added lines and add the line below then run the code.
    find_by_selector("[class^='has']")
    
    It should print all tags with class attribute that has a value that starts with has.
  15. Comment the recently added lines and add the line below then run the code.
    find_by_selector("[class$='link']"
    
    It should print all tags with class attribute that has a value that ends with link.
  16. Comment the recently added lines and add the line below then run the code.
    find_by_selector("[class*='text']")
    
    It should print all tags with class attribute that has a value that contains text.
  17. Comment the recently added lines and add the line below then run the code.
    find_by_selector("span:empty")
    
    It should print all tags that have no child.
  18. Comment the recently added lines and add the line below then run the code.
    find_by_selector("tr:first-child")
    
    It should print all tr tags that is the first child of a parent tag.
  19. Comment the recently added lines and add the line below then run the code.
    find_by_selector("tr:last-child")
    
    It should print all tr tags that is the last child of a parent tag.
  20. Comment the recently added lines and add the line below then run the code.
    find_by_selector("td:nth-child(3)")
    
    It should print all td tags that is the 3rd child of a parent tag.
  21. Comment the recently added lines and add the line below then run the code.
    find_by_selector("a:first-of-type")
    
    It should print all a tags that is the first child (first of type anchor) of a parent tag.
  22. Comment the recently added lines and add the line below then run the code.
    find_by_selector("a:last-of-type")
    
    It should print all a tags that is the last child (last of type anchor) of a parent tag.
  23. Comment the recently added lines and add the line below then run the code.
    find_by_selector("a:nth-of-type(2)")
    
    It should print all a tags that is the 2nd child (2nd of type anchor) of a parent tag.
  24. Comment the recently added lines and add the line below then run the code.
    find_by_selector("span:only-child")
    
    It should print all span tags that is the only child of a parent tag.
  25. Comment the recently added lines and add the line below then run the code.
    find_by_selector("table > tr:not(:first-child)")
    
    It should print all tr tags that is not the first child of the table. It will print all tr tags that does not contain the titles for the columns.
  26. Comment the recently added lines and add the line below then run the code.
    find_by_selector("div.column:nth-of-type(1) > table > tr:nth-child(4) > td:nth-child(2)")
    
    It should print all td tags that is the 2nd child of a parent tr which is the 4th child of a table tag which is the child of a div that is the first of type (first div) from a parent tag.
  27. Comment the recently added lines and add the line below then run the code.
    print(soup.select_one("div.column:nth-of-type(1) > table > tr:nth-child(4) > td:nth-child(2)").text)
    
    By using the query selector from previous example, we just get a single tag using select_one which will find the first tag matching the query and displaying the text Chicken Legs.
 

Conclusion

We used the select and select_one to get all or one of the Tags in different ways.
 

Programming

Basic Web Scraping Using Python - A Beginner's Guide to using Requests and Selenium

Beginner Guide to Web Scraping Using Python For Requests and Selenium (Live Examples)   Web scraping is gathering da...