Requests and Beautifulsoup Using Tag
We will use beautifulSoup to get the name of the tag and some attributes.
Before we proceed, please make sure you have read the
first and
second blogs on this series to do the
prerequisites.
Lets start to code.
- Create a file simpletagattr.py and paste the following codes
from bs4 import BeautifulSoup import requests url = "https://slackingslacker.github.io/simpletags.html" html_doc = requests.get(url).text soup = BeautifulSoup(html_doc, "html.parser") print(soup.find(id="hDiv").name) image_tag = soup.find(id="imgId") print(image_tag["src"]) print(image_tag["width"]) print(image_tag["height"]) print(image_tag["alt"])
- Run the simpletagattr.py. It should print the text
div simple.png 200 200 Sample Image
What did we do?
- We import the BeautifulSoup class from bs4 library.
from bs4 import BeautifulSoup
- We import the requests library.
import requests
- We declare the URL that we will use for scraping.
url = "https://slackingslacker.github.io/simpletags.html"
- We get the name value of the HTML in the 4th line assigning it to html_doc.
html_doc = requests.get(url).text
- We create a BeautifulSoup instance using the HTML text value in variable html_doc assigning to
variable soup. We also use the html.parser to parsing the HTML.
soup = BeautifulSoup(html_doc, "html.parser")
- On the 6th line,
print(soup.find(id="hDiv").name)
we print the name of the Tag. This code gets the object Tag of whatever it finds using id hDivsoup.find(id="hDiv")
- On the 7th line,
image_tag = soup.find(id="imgId")
we assign the Tag object to image_tag. We will use this object to print the element attributes. - On the succeeding lines, we print the src, width, height and alt attributes respectively
No comments:
Post a Comment