This project should be very easy to a any CUDA developer, it’s purpose is to demonstrate the a difference in run time of a simple matrix multiplication program when it’s written once without shared memory and another time with shared memory… (Budget: $30-$250 USD, Jobs: C Programming, CUDA, GPGPU)
