Skip to Content
Home /  Courses And Programs / Natural Language Processing

The main goal of Natural Language Processing (NLP) is to comprehend the meaning of text semantics. 

Currently, there are two distinct approaches to NLP. In the first approach, the fundamental mathematical analysis of NLP will be covered.  Students will write Python code to access NLTK, TextBlob software packages, breaking down text into tokens.  The tokenization process will be covered using Regular Expressions, NLTK and TextBlob software.  Text tokens will be converted into vectors and vectorization process, including count vectorizer, cosine similarity computation and TF-IDF (Term Frequency Inverse Document Frequency) will be explored.  The semantics of text will be analyzed using Latent Semantic Analysis (LSA).
 
In the second approach, students will explore Machine/Deep Learning models for NLP.  Naïve Bayes machine learning model will be used for document classification.  Deep learning tools (Keras/TensorFlow) will be used to generate Word Embeddings like Word2Vec.  Transformers, including GPT1/2/3/4, BERT for semantic analysis of text and exploration of the ChatGPT Language model will be covered. Huggingface Transformer library will be used to explore various applications of NLP like Translation, Sentiment Analysis, etc.
 
NLP has become an integral part of Artificial Intelligence (AI), which is expected to drive the growth in the new economy.  Since its inception in November 2022 (ChatGPT) AI has consistently dominated the news cycle. ChatGPT has resonated globally, garnering over 1 million users in its first week. Virtually all news organizations routinely file reports about how ChatGPT is revolutionizing many real-world tasks that had been done by human workers.
This course is comprehensive, covering state-of-the-art tools and techniques of NLP.

Course Highlights:

  • Fundamental mathematical analysis of NLP
  • Tokenization using Regular Expressions
  • Vectorization of words: Count Vectorizer + TF-IDF
  • Cosine similarity between words
  • Understand the meaning of text using Latent Semantic Analysis
  • Machine Learning: Naïve Bayes for text Classification
  • Deep Learning: Word Embeddings Word2Vec
  • Deep Learning: Generating Neural Networks: GPT-1/2/3 (Generative Pre-Training)
  • Deep Learning: NLP Analysis for Search Engines: BERT – (Bi-Directional Encoder Representations Transformers)
  • ChatGPT

Course Learning Outcomes:

Upon successful completion of this course, students will be able to:
  • Understand Word2Vec and apply this concept in various real-world applications
  • Extract themes from documents using several different approaches
  • Use ChatGPT and understand how it was developed
  • Understand Huggingface Transformer library and use this library for various NLP applications
  • Learn what the future of Language Models is and its impact on society

Course Typically Offered: Online during the Winter and Summer academic quarters.

Software: Students will use Python to complete hands-on assignments. These tools are free and open-source.

Hardware: Students must have access to a web-enabled computer.

Prerequisites: CSE-40028 Introduction to Programming (Python) or equivalent knowledge and experience.

Next Step: After completing of this course, consider taking other courses in the Machine Learning Methods, or Python Programming certificates.

Contact: For more information about this course, please contact unex-techdata@ucsd.edu.

Course Number: CSE-41344
Credit: 3.00 unit(s)
Related Certificate Programs: Machine Learning Methods

+ Expand All