Syllabus | Public Data Repositories | Data-sets-20 |
Data Mining attempts to identify interesting structural patterns
in large data sets that can be used to make future predictions.
For
example, in the area of security, one might analyze a database of past
credit card transactions to predict what sequences are indicative of
fraudulent credit card use, and then reject credit card transactions
that match this pattern. In the area of medical diagnosis, one might
analyze patient histories to determine which patients are most likely
to benefit from an expensive procedure. In the life science area,
molecular biologists might analyze large sets of biological data to
predict protein structure. In the area of consumer marketing, one
might analyze supermarket data to determine what items are typically
purchased with other items, and then display those items together to
encourage more customers to purchase both items. And in the area of
investment and finance, one might analyze economic data to identify
stock market trends. Data mining is becoming increasingly important
in many environments; a few of these include bioinformatics,
information retrieval, recommendation systems,
advertising, banking, business, finance, security, medicine, and web
page design, but there are many others.
This course will introduce fundamental
strategies and methodologies for data mining along with the
concepts underlying them, and will provide hands-on
experience with a variety of different techniques.
Students will learn to use the Weka
workbench, a set of data mining tools. The undergraduate
version, CISC-483, has
been approved as a technical elective for undergraduate computer
science majors.