Graphics Processing Units (GPUs) are becoming an inevitable part of every computing system because of their ability to accelerate applications consisting of abundant data-level parallelism. They are not only used to accelerate big-data applications in data centers and high-performance computing (HPC) systems, but are also employed in mobile and wearable devices for efficient execution of multimedia-rich applications and smooth rendering of display. However, in spite of the massively parallel structure of GPUs and their ability to execute multiple threads concurrently, they are far from achieving their theoretical peak performance. In this talk, I will discuss the primary reasons behind this performance gap and some solutions to bridge it. In particular, I will show how the problem of shared resource (e.g., caches and memory) contention can cause severe loss in overall GPU performance. I will also discuss how this resource contention problem can cause unfairness and lead to low system throughput when multiple applications are concurrently executed on a GPU. To address these inefficiencies, my doctoral research has focused on developing various thread and memory scheduling techniques. I will present some of these techniques and finally conclude my talk by briefly discussing my future research directions, which are inspired by my belief that all computing systems in the future will consider GPUs as first-class computing citizens and not just as coprocessors.