Six lines (not including imports) of Python code to scrape the website onthisday.com for the “persons of interest.” Impressive!
pip install lxml pip install cssselect pip install requests
The Python code:
import requests from lxml import html from lxml.cssselect import CSSSelector from lxml import etree page = requests.get("http://www.onthisday.com/birthdays/august/19") tree = html.fromstring(page.content) sel = CSSSelector('.section--person-of-interest') pois = sel(tree) for poi in pois: poi.xpath("div/div/div/p").text_content()
The results (for my birthday):
'1871 Orville Wright, aviator (Wright Brothers), born in Dayton, Ohio (d. 1912)' '1878 Manuel Luis Quezon y Molina, Second President of the Philippines (1935-42), born in Baler, Aurora, Philippines (d. 1944)' '1919 Malcolm Forbes, American publisher of Forbes Magazine, born in Brooklyn, New York (d. 1990)' '1946 Bill Clinton [William Jefferson], 42nd US President (Democrat, 1993-2001), born in Hope, Arkansas' '1967 Satya Nadella, Indian-American businessman (CEO of Microsoft), born in Hyderabad'