Ask Python Expert

Python Assignment -

This assignment involves writing Python code to extract information about jobs, people and companies from data files and load them into a consistent SQL database. It is an example of an Extract-Transform-Load (ETL) task.

You have been given the task of generating some normalised data on job postings given some data files in different formats. You are given:

  • An HTML file downloaded from the Jobs! website that lists 50 jobs
  • A CSV spreadsheet containing details of companies
  • A CSV spreadsheet containing details of people

The HTML job listing mentions the title of each job and the company it is with. The CSV companies list includes the company name and some contact details including the name of a contact person. The CSV people list includes more complete details of those contacts plus some other people. Your task is to read the data from all of these files and add it into an SQL database.

The schema for the SQL database is provided for you in the file database.py. You can run this file to create the database. Your code will then add data to it. Note that:

The companies and people tables are related through the contact field. In the companies table the value of contact should be the id of the corresponding person.

The positions and companies tables are related through the company field. In the positions table the value of company should be the id of the corresponding company.

In the companies CSV file, contact names are given in full but in the people CSV file they are split into first, last and middle names. You need to match up these records.

Useful Python Modules - Python has many useful modules for this task. You will want to look at:

  • the __csv__ module for reading and writing CSV files
  • the __bs4__ module (BeautifulSoup) for reading HTML files

and of course you will use the __sqlite3__ module for handling the database.

Required Output

To show that you have completed the task successfully, you will generate a single CSV file report that contains the following fields:

  • company name
  • position title
  • company location
  • contact first name
  • contact last name
  • contact email

There should be one row in your output CSV file for every job in the HTML file.

You will also submit the code you have written to solve this problem. Your code **must** use functions and every function **must** include a suitable docstring that describes what it does. Each function should implement a logical part of the overall ETL process.

Additional Report -

You should also submit a brief (1 page) report on the following topic:  This is just a trial data set for this task. The real data is much bigger, with around 10,000 people, 5,000 companies and 50,000 job listings. Thinking about how you have implemented the ETL process, describe any problems that might arise with such a large data set and how you might have to modify your implementation to address these problems.

Attachment:- Assignment Files.rar

Python, Programming

  • Category:- Python
  • Reference No.:- M92792959

Have any Question?


Related Questions in Python

Part i the assignment filesone of the most important

Part I: The Assignment Files One of the most important outcomes of this assignment is that you understand the importance of testing. This assignment will follow an iterative development cycle. That means you will write a ...

Homework -this homework will have both a short written and

Homework - This homework will have, both a short written and coding assignment. The problems that are supposed to be written are clearly marked. 1) (Written) Make heuristics Describe two heuristics for the slide problem ...

Tasksdemonstrate data scraping of a social network of

Tasks Demonstrate data scraping of a social network of choice. Develop technical documentation, including the development of the code & detailing the results. Provide a report on the findings, that includes research into ...

Assignment1 utilising python 3 build the following

Assignment 1. Utilising Python 3 Build the following regression models: - Decision Tree - Gradient Boosted Tree - Linear regression 2. Select a dataset (other than the example dataset given in section 3) and apply the De ...

Python programming assignment -you first need an abstract

Python Programming Assignment - You first need an abstract base class, called, Account which has the following attributes and methods: accountID: This attribute holds the ID assigned the account , if not provided set to ...

Learning outcomes lo3 - research develop and document a

Learning Outcomes LO3 - Research, develop, and document a basic security policy, and analyse, record, and resolve all security incidents LO4 - Identify and assess the threats to, and vulnerabilities of networks Assessmen ...

Question research pythons dictionary data type dictdiscuss

Question : Research Python's dictionary data type (dict). Discuss its interface and usage. Include examples. Discuss practical applications of dictionaries.

Questionwhat is a python development frameworkgive 3

Question What is a python development framework? Give 3 examples python development framework used today. and explain which development framework is used in which industry.

Below zero - ice cream storethe local ice-cream store needs

Below Zero - ice cream store The local ice-cream store needs a new ordering system to improve customer service by streamlining the ordering process. The manager of the store has found that many orders are incorrect and s ...

The second task in this assignment is to create a python

The second task in this assignment is to create a Python program called pancakes.py that will determine the final order of a stack of pancakes after a series of flips.(PYTHON 3) Problem Task In this problem, your input w ...

  • 4,153,160 Questions Asked
  • 13,132 Experts
  • 2,558,936 Questions Answered

Ask Experts for help!!

Looking for Assignment Help?

Start excelling in your Courses, Get help with Assignment

Write us your full requirement for evaluation and you will receive response within 20 minutes turnaround time.

Ask Now Help with Problems, Get a Best Answer

Why might a bank avoid the use of interest rate swaps even

Why might a bank avoid the use of interest rate swaps, even when the institution is exposed to significant interest rate

Describe the difference between zero coupon bonds and

Describe the difference between zero coupon bonds and coupon bonds. Under what conditions will a coupon bond sell at a p

Compute the present value of an annuity of 880 per year

Compute the present value of an annuity of $ 880 per year for 16 years, given a discount rate of 6 percent per annum. As

Compute the present value of an 1150 payment made in ten

Compute the present value of an $1,150 payment made in ten years when the discount rate is 12 percent. (Do not round int

Compute the present value of an annuity of 699 per year

Compute the present value of an annuity of $ 699 per year for 19 years, given a discount rate of 6 percent per annum. As