Saturday, April 25, 2020

Web Scraping Using Requests - Intro

Intro to Requests Scraper

We will write a simple scraper using python as our base language. We will create the very simplest web scraper.
 

What is Web Scraping?

Web scraping is data extraction from a website.
 

What to do next?

We will now start to code. But first lets make sure that we have the required.
  1. Install python. You can download python here depending on your OS. The installation of python will depend on the OS that you are using
  2. Install the required library, in this case requests.You can run the command.
    pip install requests
    or
    python -m pip install requests
  3. (Optional) You can download pycharm here as to make your coding faster. I will be using pycharm in doing these tutorials but you can also use notepad and command lines.
 

Lets start to code.

  1. Create a directory where you will put your codes
  2. Create a file simple.py and paste the following codes
    import requests
    url = "https://slackingslacker.github.io/simple.html"
    print(requests.get(url).text)
    
  3. Run the simple.py.
    python simple.py
    or Run on pycharm
     
    It should print the HTML
    <!DOCTYPE html>
    <html>
    <head>
        <meta charset="UTF-8">
        <title>Tutorials</title>
    </head>
    <body>
        <div id="d">
            Inside the div tag but outside the p tag.
            <p id="p">This is inside the p tag.</p>
        </div>
    </body>
    </html>
    
 

What did we do?

  • We import the requests library in order for us to use it.
    import requests
    
  • We then declare the URL that we will use for scraping.
    url = "https://slackingslacker.github.io/simple.html"
    
  • On the third lince we access the url using.
    requests.get(url)
    
    Then get the text value by
    .text
    
    And print the HTML using
    print()
    
    method.
 

Conclusion

With just 3 lines of code we successfully scrape a website by getting its whole page.
 

Just an Update yo our simple scraper. The following are sample on different HTTP Methods

  1. Create a directory where you will put your codes
  2. Create a file simple.py and paste the following codes
    import requests
    url = "https://slackingslacker.github.io/simple.html"
    print(requests.get(url).text)
    
    url = "http://slackingslacker.pythonanywhere.com"
    # Using GET Method
    print(requests.get(url+"/get").text)
    # Using POST Method
    print(requests.post(url+"/post").text)
    # Using PUT Method
    print(requests.put(url+"/put").text)
    # Using DELETE Method
    print(requests.delete(url+"/delete").text)
    # Using POST Method With FORM Submission
    print(requests.post(url+"/postdata",
                        data={"name":"slackingslacker",
                              "location": "earth",
                              "height": "normal human"}).text)
    
 

No comments:

Post a Comment

Programming

Basic Web Scraping Using Python - A Beginner's Guide to using Requests and Selenium

Beginner Guide to Web Scraping Using Python For Requests and Selenium (Live Examples)   Web scraping is gathering da...