Ask Question, Ask an Expert

+61-413 786 465

info@mywordsolution.com

Ask Python Expert

Python Assignment -

This assignment involves writing Python code to extract information about jobs, people and companies from data files and load them into a consistent SQL database. It is an example of an Extract-Transform-Load (ETL) task.

You have been given the task of generating some normalised data on job postings given some data files in different formats. You are given:

  • An HTML file downloaded from the Jobs! website that lists 50 jobs
  • A CSV spreadsheet containing details of companies
  • A CSV spreadsheet containing details of people

The HTML job listing mentions the title of each job and the company it is with. The CSV companies list includes the company name and some contact details including the name of a contact person. The CSV people list includes more complete details of those contacts plus some other people. Your task is to read the data from all of these files and add it into an SQL database.

The schema for the SQL database is provided for you in the file database.py. You can run this file to create the database. Your code will then add data to it. Note that:

The companies and people tables are related through the contact field. In the companies table the value of contact should be the id of the corresponding person.

The positions and companies tables are related through the company field. In the positions table the value of company should be the id of the corresponding company.

In the companies CSV file, contact names are given in full but in the people CSV file they are split into first, last and middle names. You need to match up these records.

Useful Python Modules - Python has many useful modules for this task. You will want to look at:

  • the __csv__ module for reading and writing CSV files
  • the __bs4__ module (BeautifulSoup) for reading HTML files

and of course you will use the __sqlite3__ module for handling the database.

Required Output

To show that you have completed the task successfully, you will generate a single CSV file report that contains the following fields:

  • company name
  • position title
  • company location
  • contact first name
  • contact last name
  • contact email

There should be one row in your output CSV file for every job in the HTML file.

You will also submit the code you have written to solve this problem. Your code **must** use functions and every function **must** include a suitable docstring that describes what it does. Each function should implement a logical part of the overall ETL process.

Additional Report -

You should also submit a brief (1 page) report on the following topic:  This is just a trial data set for this task. The real data is much bigger, with around 10,000 people, 5,000 companies and 50,000 job listings. Thinking about how you have implemented the ETL process, describe any problems that might arise with such a large data set and how you might have to modify your implementation to address these problems.

Attachment:- Assignment Files.rar

Python, Programming

  • Category:- Python
  • Reference No.:- M92792959

Have any Question?


Related Questions in Python

Show times in tmus and seconds1 an associate grasps an oven

Show times in TMUs and seconds. 1. An associate grasps an oven door within reach and pulls it open 18 inches with the left hand (he does not relinquish control of the door). With a pan in the right hand, he carefully pos ...

Below zero - ice cream storethe local ice-cream store needs

Below Zero - ice cream store The local ice-cream store needs a new ordering system to improve customer service by streamlining the ordering process. The manager of the store has found that many orders are incorrect and s ...

Sieve of eratosthenes in pythonthe goal is to find all the

Sieve of Eratosthenes (in Python) The goal is to find all the prime numbers less than or equal to some natural number maxn. We have a list that tells us if any of the numbers 0..maxn are "marked". It can be an array of b ...

Question write a simple python program that takes use

Question: Write a simple python program that takes use inputs as non-zero digits and converts them into binary form. The response must be typed, single spaced, must be in times new roman font (size 12) and must follow th ...

Learning outcomes lo3 - research develop and document a

Learning Outcomes LO3 - Research, develop, and document a basic security policy, and analyse, record, and resolve all security incidents LO4 - Identify and assess the threats to, and vulnerabilities of networks Assessmen ...

Question why is software configuration management

Question : Why is software configuration management considered an umbrella activity in software engineering? Please include examples and supporting discussion. The response must be typed, single spaced, must be in times ...

Environment setupthe first mini project will be based on

Environment Setup The first mini project will be based on Ladder Logic programming. We will be using Schneider Electric's IDE called SoMachine Basic to do the programming. The latest ver- sion of SoMachine Basic for Wind ...

Quesiton write a python script that counts occurrences of

Quesiton: Write a python script that counts occurrences of words in a file. • The script expects two command-line arguments: the name of an input file and a threshold (an integer). Here is an example of how to run the sc ...

Assignment1 utilising python 3 build the following

Assignment 1. Utilising Python 3 Build the following regression models: - Decision Tree - Gradient Boosted Tree - Linear regression 2. Select a dataset (other than the example dataset given in section 3) and apply the De ...

Questionwhat is a python development frameworkgive 3

Question What is a python development framework? Give 3 examples python development framework used today. and explain which development framework is used in which industry.

  • 4,153,160 Questions Asked
  • 13,132 Experts
  • 2,558,936 Questions Answered

Ask Experts for help!!

Looking for Assignment Help?

Start excelling in your Courses, Get help with Assignment

Write us your full requirement for evaluation and you will receive response within 20 minutes turnaround time.

Ask Now Help with Problems, Get a Best Answer

Why might a bank avoid the use of interest rate swaps even

Why might a bank avoid the use of interest rate swaps, even when the institution is exposed to significant interest rate

Describe the difference between zero coupon bonds and

Describe the difference between zero coupon bonds and coupon bonds. Under what conditions will a coupon bond sell at a p

Compute the present value of an annuity of 880 per year

Compute the present value of an annuity of $ 880 per year for 16 years, given a discount rate of 6 percent per annum. As

Compute the present value of an 1150 payment made in ten

Compute the present value of an $1,150 payment made in ten years when the discount rate is 12 percent. (Do not round int

Compute the present value of an annuity of 699 per year

Compute the present value of an annuity of $ 699 per year for 19 years, given a discount rate of 6 percent per annum. As