Advanced Data Wrangling and Visualization in R
CSE-41408
Communicate Complex Data Clearly with Advanced R Visualization and AI Tools
In today’s data-driven economy, the ability to manage, transform, and visualize data is a mission-critical skill for data analysts, data scientists, and researchers. This advanced course in R programming equips learners with cutting-edge techniques for data wrangling — including data cleaning, reshaping, merging, and feature engineering — as well as advanced data visualization strategies to communicate complex information with clarity and impact. By integrating artificial intelligence–enhanced exercises, the course addresses the growing industry demand for professionals who can turn raw data into actionable insights.
Ideal for working professionals, graduate students, and researchers in fields such as business analytics, public health, social science, and engineering, this course delivers hands-on, practical experience to build confidence in transforming messy datasets and presenting data-driven findings effectively. Participants will gain the skills to support data-informed decision-making, improve reporting, and drive innovation in their organizations.
Course Highlights:
- Data Manipulation with dplyr: Advanced techniques, including complex joins, grouped operations, and window functions, to handle large, real-world datasets efficiently.
- String and Regular Expressions: In-depth use of stringr for text processing, with regular expressions for pattern matching, critical for unstructured data analysis.
- Web Scraping: Practical skills to extract and clean data from websites, addressing the growing demand for online data in industry.
- User-Defined Functions with dplyr: Writing reusable functions incorporating dplyr’s data masking and tidy-selection, enabling modular, maintainable code for dynamic workflows.
- Functional Programming with purrr: Leveraging purrr for map functions, list manipulation, and iteration to streamline complex data tasks and enhance code scalability.
- Data Visualization with ggplot2: Exploring foundational plotting techniques with ggplot2, such as bar charts, line graphs, and scatter plots, to effectively communicate data insights.
- Advanced Data Visualization: Mastering layered grammar of graphics for creating customized, publication-quality plots, including multi-panel layouts and thematic customization.
Course Learning Outcomes:
- Master advanced data manipulation techniques with `dplyr`, including complex joins and grouped operations, to efficiently process large datasets for real-world analysis.
- Apply `stringr` and regular expressions to clean and analyze unstructured text data, enhancing skills for natural language processing projects.
- Develop web scraping capabilities to extract and prepare online data, supporting industry applications like market research.
- Create reusable functions with `dplyr`’s data masking, enabling modular code for dynamic workflows in professional settings.
- Utilize `purrr` for functional programming to scale data tasks, improving efficiency in research and analytics.
- Design effective visualizations with `ggplot2` to communicate insights, applicable in reports, presentations, and data-driven decision-making.
Courses Typically Offered: Online in Winter and Summer quarters
Prerequisite: CSE-41097 Introduction to R Programming or equivalent knowledge and experience
Next Step: After completing this course consider taking other courses in the R for Data Analytics certificate program.
Contact: For more information about this course, please email unex-techdata@ucsd.edu