Ask Question, Ask an Expert

+61-413 786 465

info@mywordsolution.com

Ask C/C++ Expert


Home >> C/C++

Goal:

Design a program that computes square matrix multiplication on GPU using CUDA. Write the code in C. In particular, your implementation should obey the following requirements:

1. The program must be general enough to handle matrix sizes beyond the GPU capacity.

2. The GPU capacity should not be hardcoded, but should be queried during execution.

3. The kernel implementation should be such that the execution configuration (number of blocks and threads/block) affects the performance but not the results of the kernel invocation.

Methodology

NOTE: In this lab, you will not use SHARED MEMORY!

The program will be tested on the workstations and the CUDA server using the following matrix sizes:

1. Use C++ compiler (g++) to compile your code, and use new operator instead of malloc() to dynamically allocate memory (malloc may fail on very large memory allocations).

2. Try to design the data structures so to minimize the number of memory transactions between host and device (CPU and GPU).

3. DO NOT START TO CODE IMMEDIATELY. Spend some time designing your solution.

i. How do you handle matrix sizes exceeding the GPU capacity?

ii. How do you represent the matrices?

iii. Which memory transfers are involved with matrices within and beyond the GPU capacity?

iv. When designing the kernel, which work is performed by each thread? How do you correlate each thread with the data it processes?

4. Kernel calls use shared memory and registers. Your kernel should not use shared memory. To see how many registers are used by each thread, you can have a look at the GPU assembly file.The assembly file (called PTX file) can be generated by calling:

nvcc -ptx myfile.cu

This will generate myfile.ptx.

The PTX file will show you the assembly representation of your kernel. In particular, it will show you the code execute by each thread (as you know, all threads execute the same code!). The PTX file will include an area where the registers are declared. For example:

.reg .u16 %rh<4>;//16 bit registers

.reg .u32 %r<9>; //32 bit registers

.reg .u64 %rd<10>;//64 bit registers

.reg .pred %p<3>; // registers used for predication

If you know: how many registers are used by each thread and how many registers are available on the GPU, you can easily determine what is the maximum number of threads that you can run (for your particular kernel).

5. Use the occupancy calculator to calculate the optimal point for configuring the kernel.

Questions:

1. Run your kernel with different number of blocks and of threads/block and see how this affects the performance. Report GPU occupancy and execution time, and discuss the results. Consider execution configurations that are trivially bad, and compare them with good execution configurations. How do you know in advance that some configurations are "bad"? And what is a "good" execution configuration?

2. What is the largest execution configuration that you can use without exceeding the resources available on the GPU in use?

C/C++, Programming

  • Category:- C/C++
  • Reference No.:- M9523448

Have any Question?


Related Questions in C/C++

Project - space race part a console Project - Space Race Part A: Console Implementation

Project - Space Race Part A: Console Implementation INTRODUCTION This assignment aims to give you a real problem-solving experience, similar to what you might encounter in the workplace. You have been hired to complete a ...

What are the legal requirements with which websites must

What are the legal requirements with which websites must comply in order to meet the needs of persons with disabilities? Why is maximizing accessibility important to everyone?

Question 1find the minimum and maximum of a list of numbers

Question: 1. Find the Minimum and Maximum of a List of Numbers: 10 points File: find_min_max.cpp Write a program that reads some number of integers from the user and finds the minimum and maximum numbers in this list. Th ...

Software development fundamentals assignment 1 -details amp

Software Development Fundamentals Assignment 1 - Details & Problems - In this assignment, you are required to answer the short questions, identify error in the code, give output of the code and develop three C# Console P ...

Why do researcher drop the ewaste and where does it end

Why do researcher drop the ewaste and where does it end up?

There are several ways to calculate the pulse width of a

There are several ways to calculate the pulse width of a digital input signal. One method is to directly read the input pin and another method (more efficient) is to use a timer and pin change interrupt. Function startTi ...

Assignment word matchingwhats a six-letter word that has an

Assignment: Word Matching What's a six-letter word that has an e as its first, third, and fifth letter? Can you find an anagram of pine grave. Or how about a word that starts and ends with ant (other than ant itself, of ...

Assign ment - genetic algorithmin this assignment you will

ASSIGN MENT - GENETIC ALGORITHM In this assignment, you will use your C programming skills to build a simple Genetic Algorithm. DESCRIPTION OF THE PROGRAM - CORE REQUIREMENTS - REQ1: Command-line arguments The user of yo ...

1 implement the binary search tree bst in c using the node

1. Implement the Binary Search Tree (BST) in C++, using the Node class template provided below. Please read the provided helper methods in class BST, especially for deleteValue(), make sure you get a fully understanding ...

  • 4,153,160 Questions Asked
  • 13,132 Experts
  • 2,558,936 Questions Answered

Ask Experts for help!!

Looking for Assignment Help?

Start excelling in your Courses, Get help with Assignment

Write us your full requirement for evaluation and you will receive response within 20 minutes turnaround time.

Ask Now Help with Problems, Get a Best Answer

Why might a bank avoid the use of interest rate swaps even

Why might a bank avoid the use of interest rate swaps, even when the institution is exposed to significant interest rate

Describe the difference between zero coupon bonds and

Describe the difference between zero coupon bonds and coupon bonds. Under what conditions will a coupon bond sell at a p

Compute the present value of an annuity of 880 per year

Compute the present value of an annuity of $ 880 per year for 16 years, given a discount rate of 6 percent per annum. As

Compute the present value of an 1150 payment made in ten

Compute the present value of an $1,150 payment made in ten years when the discount rate is 12 percent. (Do not round int

Compute the present value of an annuity of 699 per year

Compute the present value of an annuity of $ 699 per year for 19 years, given a discount rate of 6 percent per annum. As