Basic Programming - Do It Simpler: May 2020

Sunday, May 31, 2020

Web Scraping Using Selenium - Explicit Wait for Presence of An Element

Selenium Wait - Explicit Wait For Presence of An Element

Waiting in selenium can be done in different ways. In this tutorial, we will use the explicit wait functionality for presence of an element given its locator. This is a way to check if the element already loaded.

But before that, please make sure you have read the first blog on this series to do the prerequisites.

Selenium Explicit Wait For Presence of An Elements Given Its Locator

Create a file seleniumwaitpresenceone.py and paste the following codes

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
from datetime import datetime

The codes above imports the required library that we will use.

Add this line
```
driver = webdriver.Firefox(executable_path="geckodriver.exe")
```
The code above will create a webdriver instance for Firefox.
Add this line
```
driver.get("https://slackingslacker.github.io/seleniumindex")
```
The line will got to the website (https://slackingslacker.github.io/seleniumindex).

Add this function as is

def wait_for_the_element(wait_time: int, el_id: str):
    try:
        print("[{}] Finding element {}".format(str(datetime.now()), el_id))
        WebDriverWait(driver, wait_time).until(
            EC.presence_of_all_elements_located((By.ID, el_id))
        )
        print("[{}] Element found".format(str(datetime.now())))
    except TimeoutException as e:
        print("[{}] Element did not load".format(str(datetime.now())))

This method will wait for an element using locator to load at a given waiting time. It will print a message if the element loaded or not.

Add this line
```
wait_for_the_element(3, "navMenuId")
```
This line will call the method we created and will display Element found.
Add this line
```
wait_for_the_element(6, "noneExistentId")
```
This line will call the method we created and wait for 6 seconds until it gives an error.
Add this line
```
wait_for_the_element(9, "anotherNoneExistentId")
```
Again will call the method with a non existing element this time it is 9 seconds.
Add this line
```
driver.close()
```
The line will close the webdriver as well as the browser.
Run the seleniumwaitpresenceone.py. It should do the following:
- Open the firefox browser
- Browser goes to https://slackingslacker.github.io/seleniumindex
- Call the Method 3 times which prints messages in the console
- Closes the browser

Program Sample Output

[2020-06-01 13:52:53.817328] Finding element navMenuId
[2020-06-01 13:52:53.886479] Element found
[2020-06-01 13:52:53.886479] Finding element noneExistentId
[2020-06-01 13:53:00.019786] Element did not load
[2020-06-01 13:53:00.019786] Finding element anotherNoneExistentId
[2020-06-01 13:53:09.055716] Element did not load

Output explanations

The code looks for an element given the id navMenuId
The code found the element
The code looks for an element given the id noneExistentId
The code does not find the element within 6 seconds
The code looks for an element given the id anotherNoneExistentId
The code does not find the element within 9 seconds

As you may have noticed that the waiting time varies to what you have supplied to the WebDriverWait class.

Final Selenium Code

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
from datetime import datetime

driver = webdriver.Firefox(executable_path="geckodriver.exe")
driver.get("https://slackingslacker.github.io/seleniumindex")

def wait_for_the_element(wait_time: int, el_id: str):
    try:
        print("[{}] Finding element {}".format(str(datetime.now()), el_id))
        WebDriverWait(driver, wait_time).until(
            EC.presence_of_all_elements_located((By.ID, el_id))
        )
        print("[{}] Element found".format(str(datetime.now())))
    except TimeoutException as e:
        print("[{}] Element did not load".format(str(datetime.now())))

wait_for_the_element(3, "navMenuId")
wait_for_the_element(6, "noneExistentId")
wait_for_the_element(9, "anotherNoneExistentId")
driver.close()

Conclusion

Waiting time in selenium can be set to wait for an element given its locator.

Web Scraping Using Selenium - Explicit Wait for Presence of Any Elements

Selenium Wait - Explicit Wait For Presence of Any Elements

Waiting in selenium can be done in different ways. In this tutorial, we will use the explicit wait functionality for presence of any elements given a locator. This is a way to check if the elements already load.

But before that, please make sure you have read the first blog on this series to do the prerequisites.

Selenium Explicit Wait For Presence of Any Elements Given a Locator

Create a file seleniumwaitpresenceall.py and paste the following codes

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
from datetime import datetime

The codes above imports the required library that we will use.

Add this line
```
driver = webdriver.Firefox(executable_path="geckodriver.exe")
```
The code above will create a webdriver instance for Firefox.
Add this line
```
driver.get("https://slackingslacker.github.io/seleniumindex")
```
The line will got to the website (https://slackingslacker.github.io/seleniumindex).

Add this function as is

def wait_for_the_elements(wait_time: int, el_name: str):
    try:
        print("[{}] Finding element {}".format(str(datetime.now()), el_name))
        WebDriverWait(driver, wait_time).until(
            EC.presence_of_all_elements_located((By.TAG_NAME, el_name))
        )
        print("[{}] Element found".format(str(datetime.now())))
    except TimeoutException as e:
        print("[{}] Element did not load".format(str(datetime.now())))

This method will wait for the any element given a locator to load at a given waiting time. It will print a message if any element loaded or not.

Add this line
```
wait_for_the_elements(3, "nav")
```
This line will call the method we created and will display Element found.
Add this line
```
wait_for_the_elements(6, "table")
```
This line will call the method we created and wait for 6 seconds until it gives an error.
Add this line
```
wait_for_the_elements(9, "ul")
```
Again will call the method with a non existing element this time it is 9 seconds.
Add this line
```
driver.close()
```
The line will close the webdriver as well as the browser.
Run the seleniumwaitpresenceall.py. It should do the following:
- Open the firefox browser
- Browser goes to https://slackingslacker.github.io/seleniumindex
- Call the Method 3 times which prints messages in the console
- Closes the browser

Program Sample Output

[2020-06-01 01:02:57.360303] Finding element nav
[2020-06-01 01:02:57.375942] Element found
[2020-06-01 01:02:57.375942] Finding element table
[2020-06-01 01:03:03.456108] Element did not load
[2020-06-01 01:03:03.456108] Finding element ul
[2020-06-01 01:03:12.497395] Element did not load

Output explanations

The code looks for any element given the tag nav
The code found the element
The code looks for any element given the tag table
The code does not find the element within 6 seconds
The code looks for any element given the tag ul
The code does not find the element within 9 seconds

As you may have noticed that the waiting time varies to what you have supplied to the WebDriverWait class.

Final Selenium Code

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
from datetime import datetime

driver = webdriver.Firefox(executable_path="geckodriver.exe")
driver.get("https://slackingslacker.github.io/seleniumindex")

def wait_for_the_elements(wait_time: int, el_name: str):
    try:
        print("[{}] Finding element {}".format(str(datetime.now()), el_name))
        WebDriverWait(driver, wait_time).until(
            EC.presence_of_all_elements_located((By.TAG_NAME, el_name))
        )
        print("[{}] Element found".format(str(datetime.now())))
    except TimeoutException as e:
        print("[{}] Element did not load".format(str(datetime.now())))

wait_for_the_elements(3, "nav")
wait_for_the_elements(6, "table")
wait_for_the_elements(9, "ul")
driver.close()

Conclusion

Waiting time in selenium can be set to wait for any element given a locator.

Web Scraping Using Selenium - Explicit Wait for Title Contains Words

Selenium Wait - Explicit Wait For Title Contains

Waiting in selenium can be done in different ways. In this tutorial, we will use the explicit wait functionality for title.

But before that, please make sure you have read the first blog on this series to do the prerequisites.

Selenium Explicit Wait For Title That Contains Words

Create a file seleniumwaittitlecontains.py and paste the following codes

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
from datetime import datetime

The codes above imports the required library that we will use.

Add this line
```
driver = webdriver.Firefox(executable_path="geckodriver.exe")
```
The code above will create a webdriver instance for Firefox.
Add this line
```
driver.get("https://slackingslacker.github.io/seleniumindex")
```
The line will got to the website (https://slackingslacker.github.io/seleniumindex).

Add this function as is

def wait_for_the_title(wait_time: int, doc_title: str):
    try:
        print("[{}] Waiting for title {}".format(str(datetime.now()), doc_title))
        WebDriverWait(driver, wait_time).until(EC.title_contains(doc_title))
        print("[{}] Title found".format(str(datetime.now())))
    except TimeoutException as e:
        print("[{}] Error waiting for title".format(str(datetime.now())))

This method will wait for the a document title that contains a word or a phrase to load at a given waiting time. It will print a message if the document title loaded loaded or not.

Add this line
```
wait_for_the_title(3, "Do It Simpler")
```
This line will call the method we created and will display Title found.
Add this line
```
wait_for_the_title(6, "Not the title")
```
This line will call the method we created and wait for 6 seconds until it gives an error.
Add this line
```
wait_for_the_title(9, "Another wrong title")
```
Again will call the method with a non existing element this time it is 9 seconds.
Add this line
```
driver.close()
```
The line will close the webdriver as well as the browser.
Run the seleniumwaittitlecontains.py. It should do the following:
- Open the firefox browser
- Browser goes to https://slackingslacker.github.io/seleniumindex
- Call the Method 3 times which prints messages in the console
- Closes the browser

Program Sample Output

[2020-06-01 00:13:13.537620] Waiting for title Do It Simpler
[2020-06-01 00:13:13.553244] Title found
[2020-06-01 00:13:13.553244] Waiting for title Not the title
[2020-06-01 00:13:19.581967] Error waiting for title
[2020-06-01 00:13:19.581967] Waiting for title Another wrong title
[2020-06-01 00:13:28.635654] Error waiting for title

Output explanations

The code looks for the title that contains Do It Simpler
The code found the title
The code looks for the title that contains Not the title
The code does not find the title within 6 seconds
The code looks for the title that contains Another wrong title
The code does not find the title within 9 seconds

As you may have noticed that the waiting time varies to what you have supplied to the WebDriverWait class.

Final Selenium Code

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
from datetime import datetime

driver = webdriver.Firefox(executable_path="geckodriver.exe")
driver.get("https://slackingslacker.github.io/seleniumindex")

def wait_for_the_title(wait_time: int, doc_title: str):
    try:
        print("[{}] Waiting for title {}".format(str(datetime.now()), doc_title))
        WebDriverWait(driver, wait_time).until(EC.title_contains(doc_title))
        print("[{}] Title found".format(str(datetime.now())))
    except TimeoutException as e:
        print("[{}] Error waiting for title".format(str(datetime.now())))

wait_for_the_title(3, "Do It Simpler")
wait_for_the_title(6, "Not the title")
wait_for_the_title(9, "Another wrong title")
driver.close()

Conclusion

Waiting time in selenium can be set to wait for a title that contains words or phrase.

Web Scraping Using Selenium - Navigation Using Browser History

Selenium Navigation - History

Navigations in selenium can be done in different ways. In this tutorial, we will use the back and forward button of the browser as a form of navigation.

But before that, please make sure you have read the first blog on this series to do the prerequisites.

Selenium Navigation History - Back and Forward

Create a file seleniumnavhistory.py and paste the following codes
```
from selenium import webdriver
import time
```
The codes above imports the required library that we will use.
Add this line
```
driver = webdriver.Firefox(executable_path="geckodriver.exe")
```
The code above will create a webdriver instance for Firefox.
Add this line
```
driver.get("https://slackingslacker.github.io/seleniumindex")
```
The line will execute a script to go to the webpage https://slackingslacker.github.io/seleniumindex.
Add this line
```
time.sleep(3)
```
We will pause the program for 3 seconds.
Add this line
```
driver.get("https://slackingslacker.github.io/seleniumindex#/about")
```
The line will execute a script to go to the webpage https://slackingslacker.github.io/seleniumindex#/about.
Add this line
```
time.sleep(3)
```
We will pause the program for 3 seconds. At this time we have history in the browser.
Add this line
```
driver.back()
```
The line will go back to the previous page which is https://slackingslacker.github.io/seleniumindex.
Add this line
```
time.sleep(10)
```
We will pause the program for 10 seconds. You may have noticed that the current page is the main page.
Add this line
```
driver.forward()
```
The line will go back to the next page which is https://slackingslacker.github.io/seleniumindex#/about.
Add this line
```
time.sleep(10)
```
We will pause the program for 10 seconds. You may have noticed that the current page is the about page.
Add this line
```
driver.close()
```
The line will close the webdriver as well as the browser.
Run the seleniumnavhistory.py. It should do the following:
- Open the firefox browser
- Browser goes to https://slackingslacker.github.io/seleniumindex
- Halts for 3 seconds
- Browser goes to https://slackingslacker.github.io/seleniumindex#/about
- Halts for 3 seconds
- Browser goes to https://slackingslacker.github.io/seleniumindex using the back of history
- Halts for 10 seconds
- Browser goes to https://slackingslacker.github.io/seleniumindex#/about using the forward of history
- Halts for 10 seconds
- Closes the browser

Final Selenium Code

from selenium import webdriver
import time

driver = webdriver.Firefox(executable_path="geckodriver.exe")
driver.get("https://slackingslacker.github.io/seleniumindex")
time.sleep(3)
driver.get("https://slackingslacker.github.io/seleniumindex#/about")
time.sleep(3)

driver.back()
time.sleep(10)

driver.forward()
time.sleep(10)

driver.close()

Conclusion

Navigation in selenium can be done using browser history.

Web Scraping Using Selenium - Navigation Using Script Execution

Selenium Navigation - Using Javascript

Navigations in selenium can be done in different ways. In this tutorial, we will invoke a script to navigate to a different page.

But before that, please make sure you have read the first blog on this series to do the prerequisites.

Selenium Navigation Executing a Script

Create a file seleniumnavscript.py and paste the following codes
```
from selenium import webdriver
import time
```
The codes above imports the required library that we will use.
Add this line
```
driver = webdriver.Firefox(executable_path="geckodriver.exe")
```
The code above will create a webdriver instance for Firefox.
Add this line
```
driver.execute_script("window.location.href='https://slackingslacker.github.io/seleniumindex#/about';")
```
The line will execute a script to go to the webpage https://slackingslacker.github.io/seleniumindex#/about.
Add this line
```
time.sleep(10)
```
We will pause the program for 10 seconds so that you can see that the page loaded.
Add this line
```
driver.close()
```
The line will close the webdriver as well as the browser.
Run the seleniumnavscript.py. It should do the following:
- Open the firefox browser
- Execute a script to navigate to a page
- Halts for 10 seconds
- Closes the browser

Final Selenium Code

from selenium import webdriver
import time

driver = webdriver.Firefox(executable_path="geckodriver.exe")
driver.execute_script("window.location.href='https://slackingslacker.github.io/seleniumindex#/about';")
time.sleep(10)

driver.close()

Conclusion

Navigation in selenium can be done using script execution.

Web Scraping Using Selenium - Navigation Using Form Submit

Selenium Navigation - Form Submission

Navigations in selenium can be done in different ways. In this tutorial, we will use the form submission functionality.

But before that, please make sure you have read the first blog on this series to do the prerequisites.

Selenium Navigation Using Form Element Submit

Create a file seleniumnavform.py and paste the following codes
```
from selenium import webdriver
import time
```
The codes above imports the required library that we will use.
Add this line
```
driver = webdriver.Firefox(executable_path="geckodriver.exe")
```
The code above will create a webdriver instance for Firefox.
Add this line
```
driver.get("https://slackingslacker.github.io/seleniumindex#/forms")
```
The line will go to the website (https://slackingslacker.github.io/seleniumindex#/forms).
Add this line
```
time.sleep(5)
```
We will pause the program for 5 seconds.
Add this line
```
el = driver.find_element_by_css_selector("div > form")
```
This code will find an element using a CSS selector. The code will find a form element that is a direct child of a div.
Add this line
```
el.submit()
```
This will submit form.
Add this line
```
time.sleep(10)
```
We will pause the program for 10 seconds to see that it submitted the form.
Add this line
```
driver.close()
```
The line will close the webdriver as well as the browser.
Run the seleniumnavform.py. It should do the following:
- Open the firefox browser
- Browser goes to https://slackingslacker.github.io/seleniumindex#/forms
- Halts for 5 seconds
- Finds the form tag
- Submits the form without click a button
- Halts for 10 seconds
- Closes the browser

Final Selenium Code

from selenium import webdriver
import time

driver = webdriver.Firefox(executable_path="geckodriver.exe")
driver.get("https://slackingslacker.github.io/seleniumindex#/forms")
time.sleep(5)

el = driver.find_element_by_css_selector("div > form")
el.submit()
time.sleep(10)

driver.close()

Conclusion

Navigation in selenium can be done using submission of a form element.

Web Scraping Using Selenium - Navigation with Anchor Element

Selenium Navigation - Anchor Navigation

Navigations in selenium can be done in different ways. In this tutorial, we will use the click functionality of an anchor tag.

But before that, please make sure you have read the first blog on this series to do the prerequisites.

Selenium Navigation Using Anchor Element

Create a file seleniumnavachor.py and paste the following codes
```
from selenium import webdriver
import time
```
The codes above imports the required library that we will use.
Add this line
```
driver = webdriver.Firefox(executable_path="geckodriver.exe")
```
The code above will create a webdriver instance for Firefox.
Add this line
```
driver.get("https://slackingslacker.github.io/seleniumindex")
```
The line will got to the website (https://slackingslacker.github.io/seleniumindex).
Add this line
```
time.sleep(5)
```
We will pause the program for 5 seconds.
Add this line
```
el = driver.find_element_by_css_selector("div[class='navbar-start'] > a:last-of-type")
```
This code will find an element using a CSS selector. We will tackle this more in the future. The element that we are looking for is the About link in the Menu Bar at the top
Add this line
```
el.click()
```
This will click the About in the menu.
Add this line
```
time.sleep(10)
```
We will pause the program for 10 seconds to see that it loaded the about page.
Add this line
```
driver.close()
```
The line will close the webdriver as well as the browser.
Run the seleniumnavachor.py. It should do the following:
- Open the firefox browser
- Browser goes to https://slackingslacker.github.io/seleniumindex
- Halts for 5 seconds
- Finds the anchor tag for the About in the top menu
- Clicks the link
- Halts for 10 seconds
- Closes the browser

Final Selenium Code

from selenium import webdriver
import time

driver = webdriver.Firefox(executable_path="geckodriver.exe")
driver.get("https://slackingslacker.github.io/seleniumindex")
time.sleep(5)

el = driver.find_element_by_css_selector("div[class='navbar-start'] > a:last-of-type")
el.click()
time.sleep(10)

driver.close()

Conclusion

Navigation in selenium can be done using an anchor element.

Tuesday, May 26, 2020

Web Scraping Using Selenium - Explicit Wait for Title Element

Selenium Wait - Explicit Wait For Title

Waiting in selenium can be done in different ways. In this tutorial, we will use the explicit wait functionality for title. Good example is loading a blog or a product page with a specific title.

But before that, please make sure you have read the first blog on this series to do the prerequisites.

Selenium Explicit Wait For Specific Title

Create a file seleniumwaittitle.py and paste the following codes

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
from datetime import datetime

The codes above imports the required library that we will use.

Add this line
```
driver = webdriver.Firefox(executable_path="geckodriver.exe")
```
The code above will create a webdriver instance for Firefox.
Add this line
```
driver.get("https://slackingslacker.github.io/seleniumindex")
```
The line will got to the website (https://slackingslacker.github.io/seleniumindex).

Add this function as is

def wait_for_the_title(wait_time: int, doc_title: str):
    try:
        print("[{}] Waiting for title {}".format(str(datetime.now()), doc_title))
        WebDriverWait(driver, wait_time).until(EC.title_is(doc_title))
        print("[{}] Title found".format(str(datetime.now())))
    except TimeoutException as e:
        print("[{}] Error waiting for title".format(str(datetime.now())))

This method will wait for the an specific document title to load at a given waiting time. It will print a message if the document title loaded loaded or not.

Add this line
```
wait_for_the_title(3, "Do It Simpler - VUE - Bulma For Scraping")
```
This line will call the method we created and will display Title found.
Add this line
```
wait_for_the_title(6, "Not the title")
```
This line will call the method we created and wait for 6 seconds until it gives an error.
Add this line
```
wait_for_the_title(9, "Another wrong title")
```
Again will call the method with a non existing element this time it is 9 seconds.
Add this line
```
driver.close()
```
The line will close the webdriver as well as the browser.
Run the seleniumwaittitle.py. It should do the following:
- Open the firefox browser
- Browser goes to https://slackingslacker.github.io/seleniumindex
- Call the Method 3 times which prints messages in the console
- Closes the browser

Program Sample Output

[2020-05-31 23:58:37.029019] Waiting for title Do It Simpler - VUE - Bulma For Scraping
[2020-05-31 23:58:37.044668] Title found
[2020-05-31 23:58:37.044668] Waiting for title Not the title
[2020-05-31 23:58:43.274813] Error waiting for title
[2020-05-31 23:58:43.274813] Waiting for title Another wrong title
[2020-05-31 23:58:52.510780] Error waiting for title

Output explanations

The code looks for the title that is equal to Do It Simpler - VUE - Bulma For Scraping
The code found the title
The code looks for the title that is equal to Not the title
The code does not find the title within 6 seconds
The code looks for the title that is equal to Another wrong title
The code does not find the title within 9 seconds

As you may have noticed that the waiting time varies to what you have supplied to the WebDriverWait class.

Final Selenium Code

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
from datetime import datetime

driver = webdriver.Firefox(executable_path="geckodriver.exe")
driver.get("https://slackingslacker.github.io/seleniumindex")

def wait_for_the_title(wait_time: int, doc_title: str):
    try:
        print("[{}] Waiting for title {}".format(str(datetime.now()), doc_title))
        WebDriverWait(driver, wait_time).until(EC.title_is(doc_title))
        print("[{}] Title found".format(str(datetime.now())))
    except TimeoutException as e:
        print("[{}] Error waiting for title".format(str(datetime.now())))

wait_for_the_title(3, "Do It Simpler - VUE - Bulma For Scraping")
wait_for_the_title(6, "Not the title")
wait_for_the_title(9, "Another wrong title")
driver.close()

Conclusion

Waiting time in selenium can be set to wait for a specific title.

Monday, May 25, 2020

Web Scraping Using Selenium - Explicit Wait

Selenium Wait - Explicit Wait

Waiting in selenium can be done in different ways. In this tutorial, we will use the explicit wait functionality.

But before that, please make sure you have read the first blog on this series to do the prerequisites.

Selenium Explicit Wait

Create a file seleniumexplicitwait.py and paste the following codes

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from datetime import datetime

The codes above imports the required library that we will use.

Add this line
```
driver = webdriver.Firefox(executable_path="geckodriver.exe")
```
The code above will create a webdriver instance for Firefox.
Add this line
```
driver.get("https://slackingslacker.github.io/seleniumindex")
```
The line will got to the website (https://slackingslacker.github.io/seleniumindex).

Add this function as is

def wait_for_the_element(wait_time: int, el_id: str):
    try:
        print("[{}] Finding element {}".format(str(datetime.now()), el_id))
        WebDriverWait(driver, wait_time).until(
            EC.presence_of_element_located((By.ID, el_id))
        )
        print("[{}] Element found".format(str(datetime.now())))
    except Exception as e:
        print("[{}] Element did not load".format(str(datetime.now())))

This method will wait for the element to load at a given waiting time. It will print a message if the element loaded or not.

Add this line
```
wait_for_the_element(3, "navMenuId")
```
This line will call the method we created and will display Element found.
Add this line
```
wait_for_the_element(6, "noneExistentId")
```
This line will call the method we created and wait for 6 seconds until it gives an error.
Add this line
```
wait_for_the_element(9, "anotherNoneExistentId")
```
Again will call the method with a non existing element this time it is 9 seconds.
Add this line
```
driver.close()
```
The line will close the webdriver as well as the browser.
Run the seleniumexplicitwait.py. It should do the following:
- Open the firefox browser
- Browser goes to https://slackingslacker.github.io/seleniuminde
- Call the Method 3 times which prints messages in the console
- Closes the browser

Program Sample Output

[2020-05-31 23:55:38.889027] Finding element navMenuId
[2020-05-31 23:55:38.920301] Element found
[2020-05-31 23:55:38.920301] Finding element noneExistentId
[2020-05-31 23:55:44.931274] Element did not load
[2020-05-31 23:55:44.931274] Finding element anotherNoneExistentId
[2020-05-31 23:55:53.968452] Element did not load

Output explanations

The code looks for an element with id navMenuId
The code found the element
The code looks for an element with id noneExistentId
The code does not find the element within 6 seconds
The code looks for an element with id anotherNoneExistentId
The code does not find the element within 9 seconds

As you may have noticed that the waiting time varies to what you have supplied to the WebDriverWait class.

Final Selenium Code

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from datetime import datetime

driver = webdriver.Firefox(executable_path="geckodriver.exe")
driver.get("https://slackingslacker.github.io/seleniumindex")

def wait_for_the_element(wait_time: int, el_id: str):
    try:
        print("[{}] Finding element {}".format(str(datetime.now()), el_id))
        WebDriverWait(driver, wait_time).until(
            EC.presence_of_element_located((By.ID, el_id))
        )
        print("[{}] Element found".format(str(datetime.now())))
    except Exception as e:
        print("[{}] Element did not load".format(str(datetime.now())))

wait_for_the_element(3, "navMenuId")
wait_for_the_element(6, "noneExistentId")
wait_for_the_element(9, "anotherNoneExistentId")
driver.close()

Conclusion

Waiting time in selenium can be set per element.

Web Scraping Using Selenium - Implicit Wait

Selenium Wait - Implicit Wait

Waiting in selenium can be done in different ways. In this tutorial, we will use the implicit wait functionality.

But before that, please make sure you have read the first blog on this series to do the prerequisites.

Selenium Implicit Wait

Create a file seleniumimplicitwait.py and paste the following codes
```
from selenium import webdriver
from datetime import datetime
```
The codes above imports the required library that we will use.
Add this line
```
driver = webdriver.Firefox(executable_path="geckodriver.exe")
```
The code above will create a webdriver instance for Firefox.
Add this line
```
driver.implicitly_wait(5)
```
This line will set the waiting time for the element to load. It is set to 5 seconds.
Add this line
```
driver.get("https://slackingslacker.github.io/seleniumindex")
```
The line will got to the website (https://slackingslacker.github.io/seleniumindex).

Add this function as is

def find_the_element(selector: str):
    try:
        print("[{}] Finding element {}".format(str(datetime.now()), selector))
        driver.find_element_by_css_selector(selector)
        print("[{}] Element found".format(str(datetime.now())))
    except Exception as e:
        print("[{}] Error is {}".format(str(datetime.now()), str(e)))

This method will check for the element and will handle the error if the element did not load or do not exists in the given waiting time.

Add this line
```
find_the_element("nav[role='navigation']")
```
This line will call the method we created and checks if the CSS selector exists or wait for the given time.
Add this line
```
find_the_element("#noneExistentId")
```
Again will call the method with a non existing CSS selector.
Add this line
```
find_the_element("#anotherNoneExistentId")
```
Again will call the method with a non existing CSS selector.
Add this line
```
driver.close()
```
The line will close the webdriver as well as the browser.
Run the seleniumimplicitwait.py. It should do the following:
- Open the firefox browser
- Sets the implicit wait to 5 seconds
- Browser goes to https://slackingslacker.github.io/seleniuminde
- Call the Method 3 times which prints messages in the console
- Closes the browser

Program Sample Output

[2020-05-31 22:20:55.018838] Finding element nav[role='navigation']
[2020-05-31 22:20:55.034466] Element found
[2020-05-31 22:20:55.034466] Finding element #noneExistentId
[2020-05-31 22:21:00.058297] Error is Message: Unable to locate element: #noneExistentId
[2020-05-31 22:21:00.058297] Finding element #anotherNoneExistentId
[2020-05-31 22:21:05.074049] Error is Message: Unable to locate element: #anotherNoneExistentId

Output explanations

The code looks for the selector nav[role='navigation']
The code found the CSS selector
The code looks for the selector #noneExistentId
The code does not find the CSS selector within 5 seconds
The code looks for the selector #anotherNoneExistentId
The code does not find the CSS selector within 5 seconds

As you may have noticed that the waiting time is constant to 5 seconds as we have set earlier.

Final Selenium Code

from selenium import webdriver
from datetime import datetime

driver = webdriver.Firefox(executable_path="geckodriver.exe")
driver.implicitly_wait(5)
driver.get("https://slackingslacker.github.io/seleniumindex")

def find_the_element(selector: str):
    try:
        print("[{}] Finding element {}".format(str(datetime.now()), selector))
        driver.find_element_by_css_selector(selector)
        print("[{}] Element found".format(str(datetime.now())))
    except Exception as e:
        print("[{}] Error is {}".format(str(datetime.now()), str(e)))

find_the_element("nav[role='navigation']")
find_the_element("#noneExistentId")
find_the_element("#anotherNoneExistentId")
driver.close()

Conclusion

Waiting time in selenium can be set to a constant value throughout the selenium life cycle.

Saturday, May 23, 2020

Web Scraping Using Selenium - Navigation

Selenium Navigation - driver.get

Navigations in selenium can be done in different ways. In this tutorial, we will use the driver.get functionality.

But before that, please make sure you have read the first blog on this series to do the prerequisites.

Selenium Navigation Code Using driver.get

Create a file seleniumnav.py and paste the following codes
```
from selenium import webdriver
import time
```
The codes above imports the required library that we will use.
Add this line
```
driver = webdriver.Firefox(executable_path="geckodriver.exe")
```
The code above will create a webdriver instance for Firefox.
Add this line
```
driver.get("https://www.google.com")
```
The line will got to the website (https://www.google.com).
Add this line
```
time.sleep(10)
```
We will pause the program for 10 seconds.
Add this line
```
driver.get("https://slackingslacker.github.io/seleniumindex")
```
The line will got to the website (https://slackingslacker.github.io/seleniumindex). This is way to navigate to different pages of the site or different websites.
Add this line
```
time.sleep(10)
```
We will pause the program for 10 seconds.
Add this line
```
driver.close()
```
The line will close the webdriver as well as the browser.
Run the seleniumnav.py. It should do the following:
- Open the firefox browser
- Browser goes to www.google.com
- Halts for 10 seconds
- Browser goes to https://slackingslacker.github.io/seleniumindex
- Halts for 10 seconds
- Closes the browser

Final Selenium Code

from selenium import webdriver
import time

driver = webdriver.Firefox(executable_path="geckodriver.exe")
driver.get("https://www.google.com")
time.sleep(10)

driver.get("https://slackingslacker.github.io/seleniumindex")
time.sleep(10)

driver.close()

Conclusion

Navigation from page to page or website to website can be done using driver.get functionality

Saturday, May 16, 2020

Web Scraping Using Selenium - Intro

Selenium Introduction

The purpose of these series is to use the different functionalities in the selenium documentation. Each functionality will have a different example in order for us to better understand when and where to use that functionality.

Background

Selenium is usually used for automated testing but can also be used for scraping websites. Since most modern websites are created as Single-Page Applications (SPA), the page is lazy loaded. It means that the basic structure like HTML and CSS are loaded first before the data were loaded. The data were loaded afterwards through API calls and Javascript. Scrapers such as requests library from Python (tutorials can be found here) or guzzle from PHP cannot directly interact with javascripts. There can be workarounds to handle those javascript interactions but sometimes it is a dead end. Selenium solves this kind of problems by interacting to the website using browser so it is just like a person controlling the website. Selenium is mostly used for testing website and also can be user for scraping.

Getting the Softwares required for Selenium

We will now start to code. But first lets make sure that we have the required.

Install python. You can download python here depending on your OS. The installation of python will depend on the OS that you are using
Install the required library, in this case requests.You can run the command.
```
pip install selenium
```
or
```
python -m pip install selenium
```
(Optional) You can download pycharm here as to make your coding faster. I will be using pycharm in doing these tutorials but you can also use notepad and command lines.
Download the browser drivers and paste it to the directory where it is accessible to the app. you can it paste the library later. I will be using firefox for most of the tutorials as the geckodriver and firefox are compatible even when the browser were updated.
- chrome driver download
- firefox driver download
(Optional) Install the chrome and firefox browsers.

Coding My First Selenium Program

Create a directory where you will put your codes
Copy the drivers that you have downloaded and paste them in the directory you've created.
Create a file seleniumintro.py and paste the following codes
```
from selenium import webdriver
import time
```
The codes above imports the required library that we will use.
Add this line
```
driver = webdriver.Firefox(executable_path="geckodriver.exe")
```
The code above will create a webdriver instance for Firefox
Add this line
```
driver.get("https://slackingslacker.github.io/seleniumindex")
```
The line will got to the website (https://slackingslacker.github.io/seleniumindex).
Add this line
```
time.sleep(5)
```
We will pause the program for 5 seconds. This is to ensure that you can see whats happening in the browser.
Add this line
```
driver.close()
```
The line will close the webdriver as well as the browser.
Add this line
```
driver = webdriver.Chrome(executable_path="chromedriver.exe")
```
This time we are going to use chrome browser to access the website.
Add this line
```
driver.get("https://slackingslacker.github.io/seleniumindex")
```
The line will got to the website (https://slackingslacker.github.io/seleniumindex) on the chrome.
Add this line
```
time.sleep(5)
```
We will again pause the program for 5 seconds.
Add this line
```
driver.close()
```
The line will close the webdriver as well as the browser.
Run the seleniumsimple.py.
```
python seleniumintro.py
```
or Run on pycharm

It should do the following:
1. Opens the firefox browser.(assuming you have firefox installed.)
2. Browser goes to the website https://slackingslacker.github.io/seleniumindex
3. Halts for 5 seconds
4. Close firefox browser
5. Opens the chrome browser (assuming you have chrome installed.)
6. Browser goes to the website https://slackingslacker.github.io/seleniumindex
7. Wait for 5 seconds
8. Close chrome browser

Final Selenium Code

from selenium import webdriver
import time

driver = webdriver.Firefox(executable_path="geckodriver.exe")
driver.get("https://slackingslacker.github.io/seleniumindex")
time.sleep(5)
driver.close()

driver = webdriver.Chrome(executable_path="chromedriver.exe")
driver.get("https://slackingslacker.github.io/seleniumindex")
time.sleep(5)
driver.close()

Conclusion

Using selenium, we can open the browser and automatically use the functionalities of a website. With just a few lines of codes, we can easily use selenium.

Tuesday, May 5, 2020

Web Scraping Using Requests - Query Selectors/CSS Selectors

Requests and Beautifulsoup Searching Using CSS Selectors

We will use beautifulSoup to get different tags using different selectors.

Before we proceed, please make sure you have read the first and second blogs on this series to do the prerequisites.

Lets start to code.

Create a file simpleselectors.py and paste the following codes. We will do it differently to print the tags that we will find using a method.

from bs4 import BeautifulSoup
import requests

url = "https://slackingslacker.github.io/simpleselectors.html"
html_doc = requests.get(url).text
soup = BeautifulSoup(html_doc, "html.parser")


def find_by_selector(selector: str):
    elements = soup.select(selector)
    for el in elements:
        print("============================")
        print("name : " + el.name)
        print("attributes : " + str(el.attrs))
        print(el)

Add the code below.
```
find_by_selector(".column")
```
Run the code. It should print all the div tag elements that has the class attribute with value column
Comment the recently added line by putting # at start of the line.
```
find_by_selector(".table.is-narrow")
```
It should print all table tags that has the class attribute with value table and is-narrow.
Comment the recently added line and add the line below then run the code.
```
find_by_selector(".columns .table")
```
It should print all table tags that has the class attribute with value table and must be a child of an element (div) with class attribute and value columns.
Comment the recently added line and add the line below then run the code.
```
find_by_selector("#total")
```
It should print a span tag that has the id attribute with value total.
Comment the recently added line and add the line below then run the code.
```
find_by_selector("div")
```
It should print all div tags.
Comment the recently added line and add the line below then run the code.
```
find_by_selector("td.has-text-link")
```
It should print all td tags that has the class attribute with value has-text-link.
Comment the recently added line and add the line below then run the code.
```
find_by_selector("b,i")
```
It should print all b and i tags.
Comment the recently added line and add the line below then run the code.
```
find_by_selector("table th")
```
It should print all th tags under the table tags.
Comment the recently added line and add the line below then run the code.
```
find_by_selector("div > span")
```
It should print all span tags that is a direct child of div tags.
Comment the recently added line and add the line below then run the code.
```
find_by_selector("span~p")
```
It should print all p tags that is preceeded by span tags.
Comment the recently added lines and add the line below then run the code.
```
find_by_selector("[colspan]")
```
It should print all tags with colspan attribute.
Comment the recently added lines and add the line below then run the code.
```
find_by_selector("[colspan='2']")
```
It should print all tags with colspan attribute that has a value of 2.
Comment the recently added lines and add the line below then run the code.
```
find_by_selector("[class^='has']")
```
It should print all tags with class attribute that has a value that starts with has.
Comment the recently added lines and add the line below then run the code.
```
find_by_selector("[class$='link']"
```
It should print all tags with class attribute that has a value that ends with link.
Comment the recently added lines and add the line below then run the code.
```
find_by_selector("[class*='text']")
```
It should print all tags with class attribute that has a value that contains text.
Comment the recently added lines and add the line below then run the code.
```
find_by_selector("span:empty")
```
It should print all tags that have no child.
Comment the recently added lines and add the line below then run the code.
```
find_by_selector("tr:first-child")
```
It should print all tr tags that is the first child of a parent tag.
Comment the recently added lines and add the line below then run the code.
```
find_by_selector("tr:last-child")
```
It should print all tr tags that is the last child of a parent tag.
Comment the recently added lines and add the line below then run the code.
```
find_by_selector("td:nth-child(3)")
```
It should print all td tags that is the 3rd child of a parent tag.
Comment the recently added lines and add the line below then run the code.
```
find_by_selector("a:first-of-type")
```
It should print all a tags that is the first child (first of type anchor) of a parent tag.
Comment the recently added lines and add the line below then run the code.
```
find_by_selector("a:last-of-type")
```
It should print all a tags that is the last child (last of type anchor) of a parent tag.
Comment the recently added lines and add the line below then run the code.
```
find_by_selector("a:nth-of-type(2)")
```
It should print all a tags that is the 2nd child (2nd of type anchor) of a parent tag.
Comment the recently added lines and add the line below then run the code.
```
find_by_selector("span:only-child")
```
It should print all span tags that is the only child of a parent tag.
Comment the recently added lines and add the line below then run the code.
```
find_by_selector("table > tr:not(:first-child)")
```
It should print all tr tags that is not the first child of the table. It will print all tr tags that does not contain the titles for the columns.
Comment the recently added lines and add the line below then run the code.
```
find_by_selector("div.column:nth-of-type(1) > table > tr:nth-child(4) > td:nth-child(2)")
```
It should print all td tags that is the 2nd child of a parent tr which is the 4th child of a table tag which is the child of a div that is the first of type (first div) from a parent tag.
Comment the recently added lines and add the line below then run the code.
```
print(soup.select_one("div.column:nth-of-type(1) > table > tr:nth-child(4) > td:nth-child(2)").text)
```
By using the query selector from previous example, we just get a single tag using select_one which will find the first tag matching the query and displaying the text Chicken Legs.

Conclusion

We used the select and select_one to get all or one of the Tags in different ways.