Data Science with RAPIDS

Conference Experience

This past week, I attended NVIDIA’s GTC to learn more about machine learning and artificial intelligence. I heard about GTC through an organization I’m a member of called Women in Data where the mission is to “close the gender gap and increase diversity in data careers”. There were so many fascinating talks and presentations covering a variety of industries, but for me the most rewarding part of the conference was the hands-on training session I attended.

It was exciting to get my hands on a real-world project in the Fundamentals of Accelerated Data Science with RAPIDS workshop. Through a mixture of instructor-led and self-paced lab sections, I learned all about GPU-accelerated data manipulation, machine learning, and conducted an in-depth biodefense simulation analysis on population data.

Learning RAPIDS

During this workshop, the focus was on performing large data analyses using RAPIDS, which is a collection of data science libraries and APIs and CUDA. Specifically, the libraries allow for end-to-end GPU acceleration for common, everyday data science workflows. One library I learned about was cuDF (which is almost identical to Pandas). In some of the labs, I was able to compare processing times from both cuDF and Pandas and was blown away with just how efficient the cuDF method was. For example, in one scenario I tested, there was over 60 million rows of data to process and while Pandas took about 30 seconds, cuDF only took 3 seconds. Seriously fast!

Image credit: [**NVIDIA**](https://www.nvidia.com/en-us/deep-learning-ai/software/rapids/)
Image credit: NVIDIA

Biodefense Simulation Project

The simulation was broken down into three scenarios. First, I identified geographic clusters, calculated the spread for each cluster, and compared the density of infected vs. uninfected people. Next, I determined the number of individuals closest to each hospital and prepared road directions for ambulances to hospitals based on coordinates. Finally, I calculated the key factors associated with higher rates of infection and visualized all the results with dynamic graphs.

Tools Used:

  • cuDF, cuPy, cuGraph, cuXfilter
  • cuML (K-means, DBSCAN, logistic regression, k-nearest neighbors, XGBoost)

After successfully completing the simulation analysis, I passed the assessment and received a certificate of competency from NVIDIA’s Deep Learning Institute!

Mariah Norell
Mariah Norell
Data Scientist & Lecturer

My research interests include pay equity, diversity and inclusion, and women in leadership.