CISC-683: Data Mining
Public Data Repositories
Links to sites with publicly available datasets --- There is overlap
among the datasets provided at the different sites:
-
University of California Irvine Data Mining Repository: a large repository of datasets
supplied that serves as a benchmark for comparison of data mining techniques
-
University of California Irvine Machine Learning Repository: a large repository of datasets
supplied by individuals, with some overlap with the Data Mining Repository
-
ACM Data Mining and Knowledge
Discovery Cup Center: contains links to instructions and datasets
for the annual KDD contest
-
Links to a variety of large
datasets: These are very large datasets, but many of them are not
well-described
-
Links to
datasets: Many of these are statistical or done without
descriptions of the attributes, and so may not be of much use.
However, others
(such as the baseball dataset) are interesting
- Financial and Economic Datasets --- lots of overlap among them:
First link
Second link
Third link
- Asteroid dataset
-
Insurance dataset:
This dataset was used in the CoIL (Computational Intelligence and Learning
Cluster) competition.