With the vast amounts of unstructured data available on the web and stored in databases, and the promise it will provide insights unavailable in structured data, text mining has become an indispensable addition to traditional predictive analytics.
In this course, students will learn practical techniques for text extraction and text mining in a data mining context, including document clustering and classification, information retrieval, and the enhancement of structured data. Emphasis will be placed on the practical use of text mining in business. In addition, basic concepts of textual information such as tokenization, part-of-speech tagging, and disambiguation will be covered.
Topics include:
- Structured vs. unstructured learning
- CRISP-DM
- Data sources
- Dictionaries and lexicons
- Text parsing
- Regular expressions
- Structured data from unstructured data
- Document clustering and classification
- Sentiment analysis
Practical experience:
- Working with R
- Working with unstructured text
- Prepping text data for modeling
- Visualizing text data
Software: Students will use R in this course. There is no additional cost for this product.
Course typically offered: Online in Fall and Spring
Prerequisites: Introduction to R Programming or equivalent knowledge required.
Next Steps: Upon completion of this course, consider taking other courses in data science to continue learning.
More Information: For more information about this course, please contact unex-techdata@ucsd.edu.
Course Number: CSE-41151
Credit: 2.00 unit(s)
Related Certificate Programs: Data Mining for Advanced Analytics, R for Data Analytics
+ Expand All
-
4/19/2023 - 5/27/2023
$595
Online
-
-
-
CLASS TYPE:
Online Asynchronous.
This course is entirely web-based and to be completed asynchronously between the published course start and end dates. Synchronous attendance is NOT required.
You will have access to your online course on the published start date OR 1 business day after your enrollment is confirmed if you enroll on or after the published start date.
Nemteanu, Ion
Ion Nemteanu is the Senior Director of Data Science for Thermo Fisher Scientific’s Life Science Solutions Group. His leadership spans global commercial, marketing, and operational functions where he builds and deploys data science and advanced analytic capabilities to support business growth and innovation. Ion also instructs Text Mining for University of California, San Diego Extension and is an Adjunct Professor of Data science at the University of San Diego. Previously, Ion led the data science for Becton Dickinson’s Technology Solutions team. His leadership brought several new AI driven platforms to their healthcare customers. He specialized in building algorithms that were embedded in BD’s hosted products featuring clinical medication management, diversion, and ...Read More
-
TEXTBOOKS:
No information available at this time.
-
POLICIES:
No refunds after: 4/25/2023.
-
4/19/2023 - 5/27/2023
extensioncanvas.ucsd.edu
You will have access to your course materials on the published start date OR 1 business day after your enrollment is confirmed if you enroll on or after the published start date.
There are no sections of this course currently scheduled. Please contact the Science & Technology department at 858-534-3229 or unex-sciencetech@ucsd.edu for information about when this course will be offered again.