Data:each row in the matrix is a house, characterized by a series of parameters like: median value in dollars, number of rooms, age, crime rate in town, nitric oxides concentration, accessibility to radial highways, etc. (see reference for complete description).
Objectives of data mining and visualization:
- Visualizing important relationships between house value, house size (number of rooms), neighborhood characteristics, average pollution, age of houses in the same neighborhood, criminality rate, etc.
- Understanding the set of housing possibilities through clustering techniques
- Filtering the data according to personal priorities, identifying the optimal choice
Grapheur sample visualization(s): Parallel filter and scatterplot
In the figure, a scatterplot shows house value in dollars (Y) as a function of the number of rooms (X). The color is related to the age of the neighborhood (average number of houses built prior to 1940 in the neighborhood). The size of the dot is proportional to the crime rate in the neighborhood.
For example, one immediately notices that a large criminality rate is related to a lower-than-average house value, as expected. Similarly, houses in old neighborhoods (blue color) in many cases correspond to lower house values.
By moving the active selector on individual coordinates of the parallel coordinates visualization one can filter houses and analyze the most interesting cases. For example, one may decide to consider only houses with a view on Charles River, or houses in areas with a low criminality rate, etc. Interactive visualization is an excellent way to rapidly consider different options.
Download the Grapheur-ready data file: houses.rbiReferences: Data collected from UCI Machine Learning Repository (http://archive.ics.uci.edu/ml/datasets/Housing).: Harrison, D. and Rubinfeld, D.L. 'Hedonic prices and the demand for clean air', J. Environ. Economics & Management, vol.5, 81-102, 1978.