Topic: Analysis of Social Media Data
Sub-topic: Web Scraping with Python
Program Length: One day workshop
Dates: December 19th
About this workshop
Have you ever visited a website rich with the data you want, but the website provided no "download" button for you to retrieve it in a convenient way? Or have you ever found the perfect source for the CSV or other data files you needed, but were first required to meticulously click the "next" and "download" buttons dozens or even hundreds of times to get what you need?
In this course, you'll learn how to build Python web scraping programs to programmatically navigate through a website and retrieve data in a structured format. You'll learn how to harness Python to automate and streamline data collection from sites that require logins, have tables, and more to help make your job easier and more efficient. We'll also discuss the ethics surrounding these practices, so you understand when it's okay to use scraping and when you need to find an alternative route.
Whether you work with data for personal, professional, or academic reasons, you'll walk away with a concrete new skill that helps you automate and streamline tasks.
Takeaways
- Explore the ethical debate surrounding web scraping
- Understand how web scraping works and why Python is an excellent tool to programmatically extract data from websites
- Gain practice scraping web pages with Python using Requests, BeautifulSoup, and Selenium
- Learn how to properly format and store the scraped data as a CSV
Prereqs & Preparation
This workshop is designed for students with a basic knowledge of Python, or experience programming in another language. Anyone who has taken a Python course online will be well-equipped for this course, but self-taught learners and anyone who is willing to follow along are welcome! Additionally, knowledge of basic HTML syntax will be incredibly useful, but it will not be assumed.
What to bring to class:
All students must bring their own laptops with an installation of
Anaconda 3.6, a free distribution of Python that includes libraries of open source Python tools. In case of technical difficulties on your local computer, opening an account on
Google Colaboratory, a cloud-based Python environment, is highly encouraged.
Lead Instructor & Mentors
Jeremy Banning is the Co-founder and Chief Data Officer of Blossom Academy. He launched his career as an electrical engineer at top companies such as SAIC and Sikorsky Aircraft. He has since transitioned to a Data Science and Analytics Senior Manager role at United Technologies Digital where he primarily develops machine learning models to identify at risk engines susceptible to premature failure within Pratt and Whitney’s Engine Health Monitoring (EHM) and predictive maintenance efforts. His efforts have helped save airlines approximately $2.6M in unscheduled maintenance fees and have led to the development of one patent (pending) along with a company trade secret within the predictive analytics space.
Students from our Immersive Program will support Jeremy as supporting instructors/mentors throughout the Web Scraping course. #comeblossom