Futures: Audio, Video, Image, and Augmented Reality Processing
EE-90009
Wireless networks have empowered smart cities and homes as well as work, gaming, and entertainment in the cloud. The powerful combination of media streaming and wireless networks further expands the reach of on-the-go entertainment. Media content such as audio and video are the most engaging source of information and entertainment. Many teenagers spend more than three times of their spare time watching TV and listening to music than on social media. The global footprint of live and on-demand media services has been expanded by the internet, which has opened up new ways for discovering, sharing, and consuming media content anywhere, anytime, and on any device. Conversely, it takes nothing more than a smartphone and a YouTube channel to become a global broadcaster and producer of live events. While wireless and internet technologies have brought humans closer together by allowing instant connectivity, recent advances in artificial intelligence (AI) will broaden that convergence by unifying computers and content to provide smart and predictive analysis, and even award-winning content far beyond the capabilities of the human brain.
This course will provide a detailed description of audio, video, image, and augmented reality processing. In addition to key audio coding standards, it will discuss the coding of video content in emerging fields, including high dynamic range (HDR), computer-generated screen content, and immersive applications such as omnidirectional (360-degree) video and augmented reality (AR). The course will also examine the role of AI in image and natural language processing. Interesting demos will be provided by the instructor.
Learning Outcomes:
-
Learn the architecture of popular and emerging audio codecs (AAC, SBC, LC3, EVS).
-
Understand the core modules of video codecs (AVC, HEVC, AV1, VVC) and development platforms (FFMPEG).
-
Understand the tradeoffs in coding efficiency and scene complexity for typical scenarios.
-
Describe adaptive streaming platforms (Apple live streaming, MPEG-DASH).
-
Review the essentials of immersive communications using AR.
-
Learn the basics of JPEG AI, natural language processing, speech recognition, and ChatGPT.
Course Information
Course sessions
Section ID:
Class type:
This course is entirely web-based and to be completed asynchronously between the published course start and end dates. Synchronous attendance is NOT required.
You will have access to your online course on the published start date OR 1 business day after your enrollment is confirmed if you enroll on or after the published start date.
Textbooks:
All course materials are included unless otherwise stated.
Policies:
- Early enrollment advised
- No UCSD parking permit required
- No visitors permitted
- Pre-enrollment required
- Prerequisite required
- No refunds after: 12/30/2024
Note:
Schedule:
Instructor: Benny Bing, Electrical Eng MS, Nanyang Tech Univ
Benny Bing has taught technology and test prep classes to many high-school and college students, motivating them to excel in STEM classes. He has helped test-takers from diverse academic backgrounds and received a perfect score for the GRE. As a certified computer-aided instructor, he has also taught many virtual industry-guided courses, training thousands of engineers all over the world. He has authored 20 books and 70 scientific research papers, and produced 8 IEEE tutorials that were sponsored by major corporations. Cisco Systems printed 18,000 copies of his Wi-Fi book to launch its first wireless networking product, a bold move that helped jumpstart Wi-Fi in 1999. He has served as a technology panelist for the National Science Foundation, has 6 U.S. patents licensed to industry, and was featured in the MIT Technology Review. Additionally, he received an award from the Georgia Tech Center for the Enhancement of Teaching and Learning and a technology innovation award from the National Association of Broadcasters.