How to Find DataSet in UCI Machine Learning
Open the UCI Machine Learning Repository on the link:
Then the display will appear as below:
The UCI Machine Learning Repository is a collection of databases, domain theory, and data generators used by the machine learning community for empirical analysis of machine learning algorithms. Created as an FTP archive in 1987 by David Aha and fellow graduate students at UC Irvine.
Click “VIEW ALL DataSets” in the upper right corner. As shown below:
Baca Juga :
Select the data you want to analyze, here I choose Data about “Car Evaluation DataSet”
In the image below, there is a table that explains:
DataSet type, namely Multivariate.
Multivariate statistical analysis is a statistical method that allows us to research more than two variables simultaneously. By using this analysis technique, we can analyze the effect of several variables on other variables at the same time. For example, on the dataset that I took.
Multivariate analysis is used because in reality the problems that occur cannot be solved by simply linking two variables or seeing the effect of one variable on another.
Types of Data Attributes, namely Nominal
Nominal means “concerning names.” The nominal attribute values are the symbols or names of an object. Each value is some kind of category, code, or status and so on so that nominal attributes are also referred to as categorical. The values in it have no order. In computer science, these values are also called enumerations.
Types of characteristics, namely classification
Classification is a process of finding a model or function to describe a class or concept of data. The process is used to describe important data and can predict future trends in data.
1. Marko Bohanec (marko.bohanec ‘@’ ijs.si)
2. Blaz Zupan (blaz.zupan ‘@’ ijs.si)
Data Set Information:
The Car Evaluation Database comes from a simple hierarchical decision model originally developed for the demonstration of DEX, M. Bohanec, V. Rajkovic: An expert system for decision making. Sistemica 1 (1), p. 145-157, 1990.). The model evaluates the car according to the following concept structure:
Acceptance of CAR cars
. PRICE the overall price
. . the purchase price
. . maintain maintenance prices
. TECH technical characteristics
. . COMFORTABLE comfort
. . . doors several doors
. . . capacity of people in terms of people to carry
. . . lug_boot trunk boot size
. . safety has estimated the safety of the car
Input attributes are printed in lowercase. Apart from the target concept (CAR), this model includes three intermediate concepts: PRICE, TECH, COMFORT. Each concept exists in the original model associated with lower-level descent by a series of examples (for these examples see [Web Link]). Car Evaluation Database contains examples with structural information removed, eg, directly linking CAR with six input attributes: buy, maintain, door, people, lug_boot, safety. Due to the known basic concept structure, this database may be very useful for constructive induction testing and structure discovery methods.
To analyze the dataset, click the DataFolder link as shown below:
Then download the file with the .data and .names extensions in it.
Open the location of the downloaded file, then right-click on the .data extension and rename the file and change the extension to .csv.
Open it with an excel file, it will look like the one below.STEP 7:
To see a description of the dataset, look at the file with the .names extensionAttribute Information:
buying: vhigh, high, med, low.
Maint: vhigh, high, med, low.
doors: 2, 3, 4, 5 again.
person: 2, 4, a lot more.
lug_boot: small, med, large.
safety: low, med, high.