1 Introduction
2 Methods
2.1 Data basis
Variable | Value |
---|---|
Speed | Free field, default is 20 km/h |
Street category | No preference |
Prefer residential roads [calm] | |
Use only residential roads [calm*] | |
Prefer main roads [main] | |
Use only main roads [main*] | |
Avoid main roads without cycle paths/bus lanes [infra] | |
Avoid main roads without cycle paths [infra*] | |
Surface quality | No preference |
Avoid cobblestones and bad surfaces [smooth] | |
Use only very good surfaces (suitable for racing bikes) [smooth*] | |
Avoid traffic lights | No |
Yes | |
Avoid unlit streets | No |
Yes | |
Green pathways | No preference |
Prefer green pathways [green] | |
Strongly prefer green pathways (may result in longish routes if there are no suitable routes surrounded by greenery available, so use with caution) [green*] | |
Use unknown streets | Allow routing through “unknown” streets (streets which are not yet researched for cyclist usage) |
2.2 Cluster analysis
-
Sample (a): The dataset described in 2.1 is used as the sample for clustering.
-
Data (b): The characteristics of the entities on which the clustering is based are the preferences for various route attributes. These routing preferences are present as nominal data indicating preferences for various road types, surface quality and green pathways. These data include ordinal information as no, weak and strong preference are stated for each street type as well as for surface quality and greenery. To make this information usable, the preference settings are transformed into five ordinal variables defining the desired usage of side roads, main roads, main roads without cycle infrastructure, smooth road surfaces and green pathways with three values each. That means for all requests there is the information if no preference [0], preference [1] or strong preference [2] for each according category (residential roads, main roads, no main roads without infrastructure, avoid cobblestones, green pathways) is stated.
-
Dissimilarities (c): The asymmetric Manhattan method as proposed by [35] is used to calculate a distance matrix for the specific case of ordinal data. In order to do so, the relative distance between every pair of observations in the dataset is calculated and organized in the distance matrix. To do this, the scale for the distance measure is treated as an interval. According to [36], the majority of authors do this so as not to lose information, even though the differences between the single values cannot be known in detail and may be different.
-
Constraints (d): The hierarchical approach is chosen as the clustering method. In hierarchical cluster analysis, objects are merged together into clusters step by step. For each step, similarity matrices are calculated as described in (c) and objects are assigned to the cluster which fits best. Thus, the analysis produces results for a variety of cluster solutions according to the number of resulting clusters. Hierarchical clustering can thus deliver criteria to specify the optimum number of cases, while partitioning algorithms need the number of groups as input a priori. With regard to constraints, there is no need for normalization as the range and relations are identical for all variables integrated in the clustering.
-
Criterion (e): Various measures of homogeneity exist for different types of data and approaches. By evaluating such measures, it is possible to determine the optimal number of cases in the process. The Calinski-Harabasz criterion (CHC) is used [37]. The CHC combines two important measures for evaluating each cluster solution. The total within-cluster covariance shows how compact each cluster is. A low value is preferred. The between-cluster covariance defines how different the clusters are from each other. The Calinski-Harabasz criterion is defined as
-
Algorithm (f): The complete-linkage method is applied as the cluster algorithm to identify similar clusters. Complete-linkage measures the farthest pair of points to calculate similarity. As agglomerative hierarchical clustering, the algorithm starts from each element representing one cluster. The clusters are successively merged together until all elements are united in one cluster. This approach allows the dendrogram to be interpreted as graphical output of the clustering process (see Fig. 7). The dendrogram illustrates the tree of cluster solutions produces by the algorithm. The algorithm is relatively robust against chaining and builds rather compact clusters.
-
Computation (g): The algorithm (f) is applied to the distance matrix (c).
-
Interpretation (h): Interpretation and choosing the number of clusters that fits best is based on two separate evaluations. First, the CHC as described in (e) is evaluated. For the combined CHC, the best cluster solution has the highest value [37]. The CHC offers solutions with five or eight clusters (see Fig. 6).