General-Purpose Programmig of Massively Parallel Graphics Processors

Spring 2010


Administrative:

Instructor Reza Azimi
Graders Amin Abbassi and Reza Mokhtari
Classes Saturday and Monday, 9-10am, Room17
Grading Scheme Programming Exercises: 30%
Project: 40%
Final Exam: 30%
Office Hours Monday 1-3pm

Announcements:

DEADLINE EXTENSION:
  • The deadline for the first programming exercise is extended to Saturday, Esfand 8, 11:59pm

    EXERCISE SUBMISSION INSTRUCTIONS:
  • 1. Pack all of your files into one file using the tar command:
    % tar cvf a.tar *

  • 2. copy a.tar to wave (192.168.0.3) using scp or sftp (if the files are on other machines).
  • 3. On wave type:
    % submitgpu 1 a.tar

    1 is the exercise number
    a.tar is the name the tar file containing your programs and results

  • 4. You may resubmit a.tar repeatedly as long as you do this before the deadline.

    Instruction for logging in to the server (wave):
  • 1. Log in to pasargad
  • 2. ssh @192.168.0.3 ( is your student id prefixed by a "g"
    for instance if your student id is 887887, your userid will be g887887)
  • 3. Your temp password is your student id, change it the first time you log in (using passwd command)!


    Lecture Notes:

    Saturday,
    Bahman 10
    Introduction:
    [SLIDES]
    Monday,
    Bahman 12
    Basics of C Programming in the Linux Environment
    [SLIDES]
    Wednesday,
    Bahman 14
    Basics of Parallel Computing
    [SLIDES]
    Saturday,
    Bahman 17 &
    Monday,
    Bahman 19
    Multithreaded Computing
    [SLIDES]
    Saturday,
    Esfand 1 &
    Monday,
    Esfand 3
    Basics of Programming in CUDA
    [SLIDES]
    Saturday,
    Esfand 8
    Programming in CUDA (further details)
    [SLIDES]
    Monday,
    Esfand 10
    Matrix Multiplication in CUDA
    [SLIDES]
    Wednesday,
    Esfand 12
    An Overview of the GPU Architecture
    [SLIDES]
    Saturday,
    Esfand 15 & Monday,
    Esfand 17 & Wednesday,
    Esfand 19
    An Overview of the GPU Architecture(continued from previous week)
    [SLIDES]
    Saturday,
    Esfand 22 & Monday,
    Esfand 24
    Programming for Performance
    [SLIDES]
    Saturday,
    Farvardin 14 &
    Monday,
    Farvardin 16 & Wednesday,
    Farvardin 18
    Programming for Performance
    [SLIDES]
    Saturday,
    Farvardin 21 &
    Monday,
    Farvardin 23 & Wednesday,
    Farvardin 25
    Case Study: Reduction
    [SLIDES]
    Saturday,
    Farvardin 28 &
    Monday,
    Farvardin 30
    Sorting Networks
    [SLIDES]
    Wednesday,
    Ordibehesht 1
    Debugging Tutorial
    [TEST PROGRAMS]
    Saturday,
    Ordibehesht 4 &
    Monday Ordibehesht 6
    Scan
    [SLIDES]
    Wednesday,
    Ordibehesht 8
    Debugging CUDA programs using cuda-gdb
    [NVIDIA CUDA-GDB GUIDE]
    Ordibehesht 11-18 Midterm Week

    Monday,
    Ordibehesht 19 &
    Wednesday
    Ordibehesht 22
    Histograms and Sparse Matrix Multiplication
    [SLIDES]
    Saturday,
    Ordibehesht 25 &
    Wednesday
    Ordibehesht 29
    CUDA Advanced Features
    [SLIDES]
    Saturday,
    Khordad 1 &
    Monday
    Khordad 3
    Floating Point Issues
    [SLIDES]

    Useful Links:

  • Guy Blelloch's Course on Parallel Algorithms at CMU [LINK]
  • Kathy Yelick's Course on Parallel Computing at Berkeley [LINK]
  • Wen Mei Hwu's Course on CUDA at UIUC [LINK]

  • Tutorials:

  • Debugging Under Unix: gdb Tutorial (CMU Web Site)
  • Debugging Programs with Multiple Threads [LINK]
  • UNIX Tutorial for Beginners (Surrey University Web Site)
  • To Download PuTTY go to PuTTY's Web Site
  • A PuTTY step-by-step tutorial can be found here
  • A tutorial on PSFTP and PScP can be found here
  • Full documentation for pscp can be found here
  • GNU Make Utility Manual here
  • POSIX Thread (pthread) programming here

    Exercises:

  • Exercise 1: Parallel sort with pthreads [LINK]
  • Exercise 2: Gauss-Jordan Elimination with CUDA [LINK]
  • Exercise 3: Parallel Sorting with CUDA [LINK]
  • Sample Exam Questions [LINK]

    Projects

    Schedule:
    Milestone Deilverable Deadline Percentage
    Proposal A 3-5 page document containing:
  • Project idea, motivations, and goals
  • The specific algorithm you intend to implement in CUDA or optimize
  • The source and description of the implementation you use as the base
  • The description of the previous parallel implementation on CPU if there is any
  • The description of the previous CUDA work if there is any
  • Ordibehesht 13 20
    Progress Report A 2-page report describing the effort that is made, and the intermediate results achieved by that point in the project. Ordibehesht 29 15
    Presentation A 20-minute slide presentation (possibly with a demo) containing
  • A clear description of the problem you've been working on
  • The basic CUDA implementation you started with
  • The optimization steps (both algorithmic and architectural) you've taken
  • Performance results
  • Description of the steps that need to be taken for the project to be completed
  • Conclusions and future directions
  • Khordad 17 30
    Technical Report A complete report (10-15 pages single-column, single-space, 11pt font) containing project goals, related work, design and implementation, evaluation methodology, results and analysis, future directions, and a list of referenced papers. Tir 5 35