This article presents the steps of a quick tutorial on how to scrape jobs from Indeed. Using python, bs4, selenium, and pandas, we'll be able to extract information from indeed.com and construct a pandas data frame. Before we begin, let's understand web scraping simply.
Imagine if you are trying to get much information about something from various web pages and articles that need to be stored in a suitable format, for instance, an excel file. One way is to go through all those websites and write the useful information to the excel sheets manually. But programmers tend to do it in an easy way which is web scraping. Web scraping is the technique of extracting a large amount of data from different web pages that can be stored in a suitable format.
Scraping job details from Indeed
Indeed is one of the largest American job listing portals which consists of millions of job listings all over the world from different small scale and large scale companies including startups. Scraping job details from indeed really helps you to get a large amount of information about different jobs, locations, actively hiring companies, ratings, etc.
Here are the steps involved
1. Install and import necessary modules
2. Send some basic queries like job title or company name and location to the Indeed website using selenium
3. Fetch the current URL after sending the queries to the website using selenium.
4. Parse the page using requests and Beautiful Soup
5. Fetch the information about job title, company name, rating, location, simple description, date of posting, etc
Installing and importing libraries
First of all, we need to install some specific modules including a chrome driver for selenium. You can find the versions of chrome driver for different OS from This link. Check the version of chrome for installing the correct version of the chrome driver.
After installing the chrome driver move it to the working directory.
Now we can install the libraries using 'pip'
After installation, import the modules