Futures: Audio, Video, Image, and Augmented Reality Processing
EE-90009
Wireless networks have empowered smart cities and homes as well as work, gaming, and entertainment in the cloud. The powerful combination of media streaming and wireless networks further expands the reach of on-the-go entertainment. Media content such as audio and video are the most engaging source of information and entertainment. Many teenagers spend more than three times of their spare time watching TV and listening to music than on social media. The global footprint of live and on-demand media services has been expanded by the internet, which has opened up new ways for discovering, sharing, and consuming media content anywhere, anytime, and on any device. Conversely, it takes nothing more than a smartphone and a YouTube channel to become a global broadcaster and producer of live events. While wireless and internet technologies have brought humans closer together by allowing instant connectivity, recent advances in artificial intelligence (AI) will broaden that convergence by unifying computers and content to provide smart and predictive analysis, and even award-winning content far beyond the capabilities of the human brain.
This course will provide a detailed description of audio, video, image, and augmented reality processing. In addition to key audio coding standards, it will discuss the coding of video content in emerging fields, including high dynamic range (HDR), computer-generated screen content, and immersive applications such as omnidirectional (360-degree) video and augmented reality (AR). The course will also examine the role of AI in image and natural language processing. Interesting demos will be provided by the instructor.
Learning Outcomes:
-
Learn the architecture of popular and emerging audio codecs (AAC, SBC, LC3, EVS).
-
Understand the core modules of video codecs (AVC, HEVC, AV1, VVC) and development platforms (FFMPEG).
-
Understand the tradeoffs in coding efficiency and scene complexity for typical scenarios.
-
Describe adaptive streaming platforms (Apple live streaming, MPEG-DASH).
-
Review the essentials of immersive communications using AR.
-
Learn the basics of JPEG AI, natural language processing, speech recognition, and ChatGPT.
Course Information
Course sessions
Section ID:
Class type:
This course is entirely web-based and to be completed asynchronously between the published course start and end dates. Synchronous attendance is NOT required.
You will have access to your online course on the published start date OR 1 business day after your enrollment is confirmed if you enroll on or after the published start date.
Textbooks:
All course materials are included unless otherwise stated.
Policies:
- Early enrollment advised
- No UCSD parking permit required
- No visitors permitted
- Pre-enrollment required
- Prerequisite required
- No refunds after: 12/30/2024
Note:
Schedule:
Instructor: Benny Bing, Electrical Eng MS, Nanyang Tech Univ
Benny Bing has published over 70 scientific papers and 16 books, and has 6 U.S. patents licensed to industry. He has served as a technical editor for the IEEE Wireless Communications Magazine for 10 years, a guest editor for the IEEE Communications Magazine (twice) and IEEE JSAC, and an IEEE Distinguished Lecturer. His IEEE tutorials were sponsored 8 times by industry. Cisco Systems launched its first wireless product using 18,000 printed copies of one of his books. In addition, he was invited by Qualcomm and Comcast to conduct on-site courses, and the NSF to serve as a panelist on residential broadband. Among the various research, teaching, and industry awards he has received include the NAB Technology Innovation award. He holds undergraduate and graduate degrees in electrical engineering.