Advanced Data Wrangling and Visualization in R
CSE-41408
Communicate Complex Data Clearly with Advanced R Visualization and AI Tools
In today’s data-driven economy, the ability to manage, transform, and visualize data is a mission-critical skill for data analysts, data scientists, and researchers. This advanced course in R programming equips learners with cutting-edge techniques for data wrangling — including data cleaning, reshaping, merging, and feature engineering — as well as advanced data visualization strategies to communicate complex information with clarity and impact. By integrating artificial intelligence–enhanced exercises, the course addresses the growing industry demand for professionals who can turn raw data into actionable insights.
Ideal for working professionals, graduate students, and researchers in fields such as business analytics, public health, social science, and engineering, this course delivers hands-on, practical experience to build confidence in transforming messy datasets and presenting data-driven findings effectively. Participants will gain the skills to support data-informed decision-making, improve reporting, and drive innovation in their organizations.
Course Highlights:
- Data Manipulation with dplyr: Advanced techniques, including complex joins, grouped operations, and window functions, to handle large, real-world datasets efficiently.
- String and Regular Expressions: In-depth use of stringr for text processing, with regular expressions for pattern matching, critical for unstructured data analysis.
- Web Scraping: Practical skills to extract and clean data from websites, addressing the growing demand for online data in industry.
- User-Defined Functions with dplyr: Writing reusable functions incorporating dplyr’s data masking and tidy-selection, enabling modular, maintainable code for dynamic workflows.
- Functional Programming with purrr: Leveraging purrr for map functions, list manipulation, and iteration to streamline complex data tasks and enhance code scalability.
- Data Visualization with ggplot2: Exploring foundational plotting techniques with ggplot2, such as bar charts, line graphs, and scatter plots, to effectively communicate data insights.
- Advanced Data Visualization: Mastering layered grammar of graphics for creating customized, publication-quality plots, including multi-panel layouts and thematic customization.
Course Learning Outcomes:
- Master advanced data manipulation techniques with `dplyr`, including complex joins and grouped operations, to efficiently process large datasets for real-world analysis.
- Apply `stringr` and regular expressions to clean and analyze unstructured text data, enhancing skills for natural language processing projects.
- Develop web scraping capabilities to extract and prepare online data, supporting industry applications like market research.
- Create reusable functions with `dplyr`’s data masking, enabling modular code for dynamic workflows in professional settings.
- Utilize `purrr` for functional programming to scale data tasks, improving efficiency in research and analytics.
- Design effective visualizations with `ggplot2` to communicate insights, applicable in reports, presentations, and data-driven decision-making.
Courses Typically Offered: Online in Winter and Summer quarters
Prerequisite: CSE-41097 Introduction to R Programming or equivalent knowledge and experience
Next Step: After completing this course consider taking other courses in the R for Data Analytics certificate program.
Contact: For more information about this course, please email unex-techdata@ucsd.edu
Course Information
Course sessions
Section ID:
Class type:
This course is entirely web-based and to be completed asynchronously between the published course start and end dates. Synchronous attendance is NOT required.
You will have access to your online course on the published start date OR 1 business day after your enrollment is confirmed if you enroll on or after the published start date.
Textbooks:
All course materials are included unless otherwise stated.
Policies:
- No refunds after: 1/12/2026
Schedule:
Instructor: Arthur Li, MS
Biostatistician, City of Hope; Instructor, Department of Preventative Medicine, USC
Arthur Li holds an M.S. in Biostatistics from the University of Southern California and serves as a biostatistician at City of Hope National Medical Center, where he supports cancer research by analyzing clinical and genomic data. At USC, he developed and taught SAS and R programming courses and occasionally taught a linear regression course, helping students build data analysis skills. At UC San Diego Division of Extended Studies, Li developed and teaches the Biostatistical Methods series courses, transitioned from SAS to R, assisting learners in exploring biostatistics, alongside other R programming courses. He authored the Handbook of SAS® DATA Step Programming (CRC Press, 2013), a resource for data management in SAS. In his spare time, Li enjoys traveling, cooking, and exploring new cultures.