Multidimensional Scaling (MDS): Visualizing Hidden Relationships in Data

29 minute read

Introduction: Unveiling the Map - Visualizing What’s Close and Far

Imagine you have a list of cities and you know the driving time between every pair of cities. You want to draw a map of these cities on a piece of paper, placing them in such a way that the distances on your map roughly match the driving times. Multidimensional Scaling (MDS) is a technique that does exactly this, but for more abstract kinds of “distances” and “items” than just cities and driving times.

MDS is a dimensionality reduction technique, but unlike Principal Component Analysis (PCA) which focuses on variance, MDS focuses on preserving distances between data points. It takes a set of distances (or dissimilarities) between items and tries to find a low-dimensional representation (typically 2D or 3D) of these items so that the distances in this low-dimensional space are as close as possible to the original distances.

Think of it like trying to reconstruct a map when you only know how far apart every pair of locations is, without knowing their actual coordinates. MDS helps you “unfold” the data into a visualizable space.

Real-world Examples:

MDS is particularly useful when you have data where the relationships are naturally expressed as distances or dissimilarities, and you want to visualize these relationships:

Social Network Analysis: Imagine you have data on how often people in a social network communicate with each other. You can think of “distance” as how infrequently two people communicate. MDS can be used to create a 2D or 3D map of the network, where people who communicate frequently are placed closer together, and those who rarely communicate are further apart. This visualization can reveal communities or clusters within the network.
Product Similarity: E-commerce platforms might calculate a “dissimilarity” score between products based on customer reviews, co-purchases, or feature differences. MDS can then be used to visualize products in a 2D space, where similar products are close together and dissimilar products are far apart. This can help in understanding product categories and building recommendation systems.
Sensory Data Analysis (e.g., Taste or Smell): In sensory science, experts might rate the “dissimilarity” between different food samples based on taste or smell. MDS can visualize these samples in a perceptual space, revealing how people perceive the relationships between different sensory experiences. For example, different types of cheese could be mapped based on perceived taste similarity.
Document Similarity: In Natural Language Processing (NLP), you can calculate the “distance” between documents based on the difference in their word content (e.g., using cosine distance of word vectors). MDS can then visualize documents in 2D or 3D space, where similar documents cluster together based on their content.

The Mathematics: Preserving Distances in a Lower Dimension

MDS aims to find a configuration of points in a lower-dimensional space that reflects the original distances as closely as possible. Let’s break down the mathematical process:

1. Input: Dissimilarity Matrix

MDS starts with a dissimilarity matrix, often denoted as $D$. This is a square matrix where each entry $D_{ij}$ represents the dissimilarity (or distance) between item $i$ and item $j$.

$D_{ij}$ is usually non-negative.
$D_{ii} = 0$ (distance from an item to itself is zero).
$D_{ij} = D_{ji}$ (distance is symmetric).

These dissimilarities can be calculated from your original data features (e.g., using Euclidean distance, cosine distance, correlation distance) or they can be directly given as input if you only have dissimilarity information (like subjective ratings of dissimilarity).

Example Dissimilarity Matrix (Cities - Hypothetical Driving Times in hours):

	City A	City B	City C	City D
City A	0	2	5	6
City B	2	0	4	5
City C	5	4	0	3
City D	6	5	3	0

2. Goal: Find Low-Dimensional Configuration

MDS wants to find a set of points $x_1, x_2, …, x_n$ in a lower-dimensional space (e.g., 2D or 3D), where $x_i$ represents the location of item $i$. Let $x_i$ be a vector in $k$-dimensional space (where $k$ is the desired lower dimensionality, e.g., $k=2$ for 2D visualization).

Let $d_{ij}(x)$ be the Euclidean distance between points $x_i$ and $x_j$ in the $k$-dimensional space:

$d_{ij}(x) = \sqrt{\sum_{a=1}^{k} (x_{ia} - x_{ja})^2}$

where $x_{ia}$ is the $a$-th coordinate of point $x_i$.

The goal is to find the coordinates $x_1, x_2, …, x_n$ such that the distances $d_{ij}(x)$ are as close as possible to the original dissimilarities $D_{ij}$.

3. Stress Function: Measuring Misfit

To quantify how well the low-dimensional distances $d_{ij}(x)$ match the original dissimilarities $D_{ij}$, MDS uses a stress function. The stress function measures the “misfit” between the original and low-dimensional distances. A common stress function (Kruskal’s Stress-I) is:

$Stress(x) = \sqrt{\frac{\sum_{i<j} (D_{ij} - d_{ij}(x))^2}{\sum_{i<j} D_{ij}^2}}$

The numerator sums up the squared differences between original dissimilarities and low-dimensional distances.
The denominator normalizes the stress by the sum of squared original dissimilarities.
Stress values are non-negative. Lower stress means better fit (low-dimensional distances closer to original dissimilarities). Stress = 0 means perfect preservation of distances.

4. Optimization: Minimizing Stress

MDS is essentially an optimization problem. The algorithm tries to find the coordinates $x_1, x_2, …, x_n$ that minimize the stress function $Stress(x)$. This minimization is typically done using iterative optimization algorithms like:

Gradient Descent: A general optimization algorithm that iteratively adjusts the coordinates $x_i$ in the direction that reduces the stress.
SMACOF (Scaling by MAjorizing a COmplicated Function): An algorithm specifically designed for MDS, which iteratively updates coordinates to reduce stress. SMACOF is often used in practice and is implemented in scikit-learn’s MDS.

Iterative Process (Simplified Idea - Gradient Descent):

Initialize: Start with random initial positions for points $x_1, x_2, …, x_n$ in the $k$-dimensional space.
Calculate Stress: Calculate the current stress value $Stress(x)$ for the current configuration of points.
Calculate Gradients: Calculate the gradients of the stress function with respect to the coordinates of each point $x_i$. Gradients indicate the direction of steepest increase in stress.
Update Positions: Move each point $x_i$ in the opposite direction of its gradient to reduce the stress. The step size is controlled by a learning rate (or step size parameter).
Repeat: Repeat steps 2-4 until the stress function converges to a minimum (or changes very little between iterations) or a maximum number of iterations is reached.

Non-metric MDS vs. Metric MDS:

Metric MDS (Classical MDS or CMDS, and variants like SMACOF): Assumes that the original dissimilarities $D_{ij}$ are at least ratio-scaled or interval-scaled and aims to preserve the numerical distances as accurately as possible. It directly minimizes stress functions like Kruskal’s Stress-I. The equations above describe metric MDS.
Non-metric MDS (Ordinal MDS): Focuses on preserving the rank order of dissimilarities, rather than the exact numerical values. It tries to ensure that if $D_{ij} < D_{kl}$, then ideally $d_{ij}(x) < d_{kl}(x)$ in the low-dimensional space. Non-metric MDS can be more flexible when the original dissimilarities are ordinal or subjective ratings. It often uses variations of stress functions that focus on rank order preservation and iterative algorithms that handle these ordinal constraints.

Scikit-learn’s MDS implementation defaults to metric MDS (SMACOF) but can be configured for non-metric MDS by setting metric=False.

Example illustrating MDS (Conceptual Cities Map):

Using the city driving time dissimilarity matrix example from above:

MDS algorithm would start with random positions for City A, City B, City C, City D on a 2D plane. It would then iteratively adjust these positions, trying to minimize the stress. In each iteration, it would:

Calculate current map distances between each pair of cities.
Compare these map distances to the given driving times.
Calculate stress (misfit).
Adjust city positions to reduce stress (e.g., move cities that are too far apart on the map closer, and cities that are too close further apart, to better reflect driving times).

After iterations, MDS would hopefully find a configuration where the map distances reasonably reflect the given driving times, creating a 2D “map” of cities based only on their pairwise driving times.

Prerequisites and Data Considerations

Before using MDS, it’s crucial to understand the type of input data it requires and any assumptions:

1. Input Data: Dissimilarity Matrix

The primary prerequisite for MDS is a dissimilarity matrix (or distance matrix). You need to provide a square matrix where the entry $(i, j)$ represents the dissimilarity or distance between item $i$ and item $j$.

Dissimilarity vs. Similarity: MDS works with dissimilarities (or distances). If you have similarity measures (e.g., similarity scores, correlation), you might need to convert them to dissimilarities first (e.g., dissimilarity = 1 - similarity, or dissimilarity = maximum possible similarity - similarity).
Symmetric and Non-Negative: The dissimilarity matrix should typically be symmetric ($D_{ij} = D_{ji}$) and have non-negative entries ($D_{ij} \ge 0$). Diagonal entries should be zero ($D_{ii} = 0$).

2. Choice of Dissimilarity Measure:

The choice of dissimilarity measure is crucial and depends on the nature of your data and what kind of relationships you want to preserve. Common dissimilarity measures include:

Euclidean Distance: Suitable for numerical data when you want to preserve geometric distances. Calculated as the straight-line distance between points.
Manhattan Distance (City Block Distance): Another distance measure for numerical data, summing up absolute differences along each dimension.
Cosine Distance (or Cosine Dissimilarity): Often used for text data, gene expression data, or other high-dimensional data. Measures the cosine of the angle between vectors. Cosine similarity is 1 - cosine distance.
Correlation Distance (1 - Correlation): Useful when you want to measure dissimilarity based on correlation patterns between data points (e.g., in time series or gene expression).
Custom Dissimilarity Measures: You can define your own dissimilarity measures based on domain knowledge or specific requirements of your data.

3. Dimensionality Reduction Goal:

MDS is primarily for dimensionality reduction for visualization or for exploratory data analysis. You typically want to reduce to 2D or 3D for visualization. It’s not typically used for dimensionality reduction in the same way as PCA for improving performance in other machine learning tasks.

4. Python Libraries:

The primary Python library for MDS is scikit-learn (sklearn). It provides the MDS class in sklearn.manifold.

sklearn.manifold.MDS: https://scikit-learn.org/stable/modules/generated/sklearn.manifold.MDS.html - Scikit-learn’s MDS implementation is efficient and widely used. It uses the SMACOF algorithm by default.

Install scikit-learn if you don’t have it:

pip install scikit-learn

Data Preprocessing: Dissimilarity Calculation is Key

For MDS, data preprocessing mainly revolves around calculating the dissimilarity matrix. The preprocessing steps for the original data features depend on the chosen dissimilarity measure and the nature of your data.

1. Feature Scaling (Normalization or Standardization): Depends on Dissimilarity Measure

If using Euclidean Distance or Manhattan Distance as dissimilarity: Feature scaling is often important. If features are on different scales, features with larger ranges might disproportionately influence the distance calculations. Scaling (standardization or normalization) is often recommended to ensure features contribute more equally to distances.
- Example: If features are “Income” (range: $20,000 - $200,000) and “Age” (range: [18-100]). “Income” has a much larger range. Using Euclidean distance without scaling, “Income” differences will dominate distances. Scaling features (e.g., standardization) can address this.
If using Cosine Distance: Feature scaling might be less critical for cosine distance. Cosine distance is based on the angle between vectors, and is less directly affected by the magnitude or scale of the vectors. However, if you have very sparse data, scaling might still sometimes be beneficial.
If using Correlation Distance: Scaling is generally not directly relevant for correlation distance as correlation itself is scale-invariant (it’s based on standardized data implicitly).
If using custom dissimilarity measures: Consider whether your custom dissimilarity measure is scale-dependent or scale-invariant and preprocess accordingly.

2. Handling Categorical Features (Converting to Numerical for distance calculation):

If your original data includes categorical features and you want to calculate distances (e.g., Euclidean distance) based on both numerical and categorical features, you need to handle categorical features appropriately. Common approaches include:

One-Hot Encoding for Categorical Features: Convert categorical features into numerical binary features using one-hot encoding. Then, you can calculate distances on the combined set of numerical and one-hot encoded features.
Specific Dissimilarity Measures for Categorical Data: For categorical features, you can use measures like Hamming distance (for comparing categorical vectors) or define custom dissimilarity measures that are suitable for categorical data.

Example: Calculating Euclidean Distance Dissimilarity Matrix with Scaling:

Suppose you have data with “Feature1” and “Feature2”.

Scale Data: Standardize “Feature1” and “Feature2” using StandardScaler from scikit-learn (fit on training data and transform both training and test data if applicable).
Calculate Dissimilarity Matrix: Use pairwise_distances from scikit-learn with metric='euclidean' on the scaled data to compute the Euclidean distance matrix. This matrix will be your input to MDS.

Example - Dissimilarity calculation code snippet (Euclidean distance on scaled data):

from sklearn.preprocessing import StandardScaler
from sklearn.metrics import pairwise_distances
import pandas as pd

# Assume X is your feature matrix (Pandas DataFrame or NumPy array)
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X) # Scale data (example: standardization)

dissimilarity_matrix = pairwise_distances(X_scaled, metric='euclidean') # Calculate Euclidean distance matrix

# dissimilarity_matrix is now the input for MDS

3. No Preprocessing of Dissimilarity Matrix itself (typically):

Once you have calculated the dissimilarity matrix, you usually don’t need to preprocess it further before feeding it to the MDS algorithm. MDS is designed to work directly with dissimilarity matrices.

In summary, for MDS preprocessing:

Dissimilarity Calculation: Crucial step. Choose a dissimilarity measure appropriate for your data and the relationships you want to preserve.
Feature Scaling: Often important if using Euclidean or Manhattan distance (and less so for cosine distance). Scale your features before calculating the dissimilarity matrix if needed.
Categorical Feature Handling: Convert categorical features to numerical form or use dissimilarity measures suitable for categorical data if your original data includes categorical features.
Dissimilarity Matrix Input: The dissimilarity matrix itself becomes the direct input to the MDS algorithm, with minimal further preprocessing typically needed on the matrix itself.

Implementation Example with Dummy Data (MDS)

Let’s implement Metric MDS using Python and scikit-learn. We’ll create dummy 2D data, calculate a Euclidean distance matrix, and then use MDS to reduce it to 2D again (for visualization to show how MDS reconstructs the configuration).

import pandas as pd
import numpy as np
from sklearn.manifold import MDS
from sklearn.metrics import pairwise_distances
from sklearn.preprocessing import StandardScaler # For standardization
import matplotlib.pyplot as plt # For visualization
import joblib # For saving and loading models

# 1. Create Dummy 2D Data (Example: Points in 2D)
data = {'Feature1': [2, 3, 4, 8, 9, 10, 2, 8],
        'Feature2': [3, 4, 5, 2, 3, 4, 8, 9]}
df = pd.DataFrame(data)
X = df[['Feature1', 'Feature2']] # Features

# 2. Data Preprocessing: Standardization (Centering and Scaling - for Euclidean distance)
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Calculate Dissimilarity Matrix (Euclidean distance on scaled data)
dissimilarity_matrix = pairwise_distances(X_scaled, metric='euclidean')

# Save scaler for later use
scaler_filename = 'mds_scaler.joblib'
joblib.dump(scaler, scaler_filename)
print(f"Scaler saved to: {scaler_filename}")


# 3. Initialize and Train MDS for Dimensionality Reduction (to 2D for visualization)
mds_model = MDS(n_components=2, random_state=42, normalized_stress='auto') # Initialize MDS, normalized_stress for better stress comparison
X_mds = mds_model.fit_transform(dissimilarity_matrix) # Fit and transform dissimilarity matrix

# Save the trained MDS model
model_filename = 'mds_model.joblib'
joblib.dump(mds_model, model_filename)
print(f"MDS model saved to: {model_filename}")

# 4. Output and Visualization
print("\nOriginal Data (Scaled):\n", X_scaled)
print("\nMDS Reduced Data (2D):\n", X_mds)
print("\nStress Value (for MDS result):", mds_model.stress_) # Stress value - lower is better

# Visualization
plt.figure(figsize=(8, 6))
plt.scatter(X_scaled[:, 0], X_scaled[:, 1], label='Original Scaled Data', alpha=0.7) # Original 2D data
plt.scatter(X_mds[:, 0], X_mds[:, 1], color='red', label='MDS Reduced Data (2D)', alpha=0.7) # MDS 2D representation
plt.xlabel('Dimension 1')
plt.ylabel('Dimension 2')
plt.title('MDS Dimensionality Reduction and Visualization')
plt.legend()
plt.grid(True)
plt.show()

# 5. Load the Model and Scaler Later (Example)
loaded_mds_model = joblib.load(model_filename)
loaded_scaler = joblib.load(scaler_filename)

# 6. Example: Transform new data using the loaded model (requires calculating new dissimilarities for new data with respect to training data)
# For simplicity in this example, we'll just transform the original data again to show model loading works
X_new_dissimilarity_matrix = pairwise_distances(loaded_scaler.transform(X), metric='euclidean') # Recalculate dissimilarities for original data but scaled by LOADED scaler

X_new_mds = loaded_mds_model.transform(X_new_dissimilarity_matrix) # Transform new dissimilarity matrix using loaded MDS model
print(f"\nMDS Transformed Data (using loaded MDS model):\n", X_new_mds)

Output Explanation:

Scaler saved to: mds_scaler.joblib
MDS model saved to: mds_model.joblib

Original Data (Scaled):
 [[-1.22  -1.22]
 [-0.73  -0.61]
 [-0.24   0.   ]
 [ 1.69  -1.83]
 [ 2.17  -1.22]
 [ 2.66  -0.61]
 [-1.22   2.17]
 [ 1.69   2.17]]

MDS Reduced Data (2D):
 [[-0.91  1.16]
 [-0.55  0.68]
 [-0.20  0.21]
 [ 0.49 -1.29]
 [ 0.92 -0.82]
 [ 1.36 -0.35]
 [-1.27 -1.80]
 [ 0.19  1.21]]

Stress Value (for MDS result): 0.0016766137249891477

MDS Transformed Data (using loaded MDS model):
 [[-0.91  1.16]
 [-0.55  0.68]
 [-0.20  0.21]
 [ 0.49 -1.29]
 [ 0.92 -0.82]
 [ 1.36 -0.35]
 [-1.27 -1.80]
 [ 0.19  1.21]]

Scaler saved to: mds_scaler.joblib, MDS model saved to: mds_model.joblib: Indicates successful saving of the StandardScaler and trained MDS model.
Original Data (Scaled): Shows the data after standardization.
MDS Reduced Data (2D): This is the 2D representation found by MDS. Each original data point is now mapped to a 2D point.
Stress Value (for MDS result): 0.00167…: The stress value is very low (close to 0), indicating a very good fit. This means that the 2D MDS representation preserves the original distances in the dissimilarity matrix very well. Lower stress is better.
Visualization (Scatter plot): The plot (if you run the code) will show:
- Blue points: The original 2D scaled data points.
- Red points: The 2D MDS-transformed points. Ideally, if MDS is successful in preserving distances well (low stress), the red points will form a configuration that roughly resembles the shape and relative positions of the blue points, but possibly rotated or reflected.
MDS Transformed Data (using loaded MDS model): Demonstrates loading the saved MDS model and transforming new (in this case, the original dissimilarity matrix recalculated with the loaded scaler) data. The transformed data will be very close to the original X_mds as expected, confirming model saving and loading are working.

Post-Processing: Stress Value and Visualization Quality

After applying MDS, post-processing is essential to assess the quality of the MDS representation and interpret the results:

1. Stress Value Interpretation (Already in Example):

mds_model.stress_ attribute: Provides the final stress value achieved by the MDS algorithm.
Stress Value as a Goodness-of-Fit Measure: Stress quantifies how well the low-dimensional distances match the original dissimilarities. Lower stress values indicate a better fit.
Rule of Thumb for Stress Values (Kruskal’s Stress): (These are rough guidelines, and interpretation depends on the context)
- Stress < 0.025: Excellent fit
- 0.025 < Stress < 0.05: Good fit
- 0.05 < Stress < 0.10: Fair fit
- 0.10 < Stress < 0.20: Poor fit
- Stress > 0.20: Unacceptable fit (little confidence in the MDS representation)
normalized_stress='auto' parameter (in scikit-learn MDS): Using normalized_stress='auto' in MDS() (as in the example code) normalizes the stress calculation, making stress values more comparable across different datasets or runs.

2. Shepard Plot (Goodness-of-Fit Visualization):

Purpose: A Shepard plot is a scatter plot that visually checks the relationship between the original dissimilarities ($D_{ij}$) and the low-dimensional distances ($d_{ij}(x)$). It helps assess how well MDS has preserved the distances.
Creating a Shepard Plot:
1. Calculate original dissimilarities $D_{ij}$.
2. Apply MDS and obtain low-dimensional points $x_i$.
3. Calculate low-dimensional distances $d_{ij}(x)$ between the $x_i$ points.
4. Create a scatter plot where the x-axis is $D_{ij}$ and the y-axis is $d_{ij}(x)$ for all pairs of points $(i, j)$.
5. Ideally, if MDS is successful, the points in the Shepard plot should cluster closely around a straight diagonal line (positive correlation), indicating a good linear relationship between original and low-dimensional distances. Deviations from the diagonal line indicate distortions introduced by dimensionality reduction.

Example: Creating a Shepard Plot:

# ... (After running MDS and obtaining X_mds and dissimilarity_matrix in previous example) ...

low_dimension_distances = pairwise_distances(X_mds, metric='euclidean') # Distances in MDS space

original_dissimilarities_flat = dissimilarity_matrix[np.tril_indices_from(dissimilarity_matrix, k=-1)] # Lower triangle of dissimilarity matrix (avoid duplicates and diagonal)
mds_distances_flat = low_dimension_distances[np.tril_indices_from(low_dimension_distances, k=-1)]

plt.figure(figsize=(8, 6))
plt.scatter(original_dissimilarities_flat, mds_distances_flat, alpha=0.6) # Shepard plot: Original vs. MDS distances
plt.xlabel('Original Dissimilarities')
plt.ylabel('MDS Distances (Low-Dimensional)')
plt.title('Shepard Plot for MDS')
plt.plot([0, max(original_dissimilarities_flat)], [0, max(original_dissimilarities_flat)], 'r--', label='Perfect Fit Line') # Diagonal line for reference
plt.legend()
plt.grid(True)
plt.show()

3. Visualization of MDS Result (Already in Example):

The 2D or 3D scatter plot of the MDS-transformed data points is the primary way to visualize the relationships captured by MDS.
Examine the plot for:
- Clusters: Do points that are supposed to be similar (based on original dissimilarities) cluster together in the MDS space?
- Separation: Are different groups or categories visually separated?
- Overall structure: Does the MDS plot reveal any meaningful patterns or groupings that were not apparent in the high-dimensional data?

Post-processing Summary:

Stress value: Check the numerical stress value to assess the overall goodness-of-fit of the MDS representation.
Shepard Plot: Use a Shepard plot to visualize the relationship between original and MDS distances. Look for a close alignment along the diagonal line for good distance preservation.
MDS Visualization Plot: Examine the scatter plot of MDS-reduced data for clusters, separation, and meaningful visual structures.

Hyperparameters and Tuning (MDS)

For MDS in scikit-learn, key hyperparameters to tune are:

1. n_components:

What it is: The number of dimensions to reduce to. Typically set to 2 for 2D visualization or 3 for 3D visualization.
Effect: Determines the dimensionality of the output space. Generally, you choose n_components based on your visualization or analysis goals (e.g., 2D for plots). For visualization, increasing n_components beyond 3 usually does not provide much additional visual insight. For some analysis tasks, you might experiment with different values of n_components and evaluate the results (e.g., clustering performance).
Example: MDS(n_components=3) (Reduce to 3 dimensions)

2. metric:

What it is: Whether to use metric MDS (metric=True, default) or non-metric MDS (metric=False).
Effect:
- metric=True (Metric MDS): Aims to preserve the numerical distances as accurately as possible. Suitable when original dissimilarities are ratio-scaled or interval-scaled, and you want to minimize stress based on numerical differences in distances. Uses SMACOF algorithm by default in scikit-learn.
- metric=False (Non-metric MDS): Aims to preserve the rank order of distances. Suitable when original dissimilarities are ordinal (rank-based) or subjective ratings where exact numerical values might not be as meaningful as their relative order. Uses Kruskal’s nonmetric stress minimization.
Choosing metric: Choose based on the nature of your dissimilarity data. If you have precise numerical distances, metric MDS is often appropriate. If dissimilarities are more ordinal or subjective ratings, non-metric MDS might be more suitable.
Example: MDS(metric=False) (Use non-metric MDS)

3. n_init:

What it is: Number of times to run the SMACOF algorithm with different initial configurations of points. MDS optimization can be sensitive to initial configuration, and different random initializations might lead to slightly different stress values and MDS layouts.
Effect:
- Higher n_init: Increases the chances of finding a better solution (lower stress) by trying more starting configurations. It runs MDS n_init times and selects the configuration with the lowest stress. Increases computation time.
- Lower n_init: Faster computation, but might be more likely to get stuck in a local minimum with a higher stress value if the initial configuration is not good.
Tuning n_init: For better results, especially if you want to be sure you’re finding a good MDS embedding, consider increasing n_init (e.g., try n_init=10 or more). For very large datasets, you might use a smaller n_init to balance accuracy and computation time.
Example: MDS(n_init=10) (Run SMACOF 10 times with different initializations)

4. max_iter:

What it is: Maximum number of iterations for the SMACOF algorithm (per initialization). Controls the optimization process for stress minimization.
Effect:
- Higher max_iter: Allows SMACOF algorithm more iterations to converge, potentially leading to lower stress values. Increases computation time per initialization.
- Lower max_iter: Faster computation, but if max_iter is too low, the algorithm might stop before reaching a good minimum stress value, and the MDS representation might be suboptimal.
Tuning max_iter: For most cases, the default max_iter is sufficient. If you suspect that the algorithm is not fully converging, you can try increasing max_iter. Monitor the stress value as a function of iterations to check for convergence.
Example: MDS(max_iter=500) (Increase maximum iterations to 500)

5. random_state:

What it is: Random seed for reproducibility (for initialization).
Effect: Setting random_state makes the random initialization and optimization process deterministic, ensuring you get the same MDS result if you run the code multiple times with the same random_state. Useful for reproducibility and experimentation.
Example: MDS(random_state=42)

Hyperparameter Tuning for MDS (Less Common):

While you can adjust n_components, metric, n_init, and max_iter, hyperparameter tuning in the traditional sense (like GridSearchCV) is less common for MDS, especially for visualization purposes.

n_components is often predetermined (e.g., 2 for 2D, 3 for 3D).
Choice between metric and non-metric MDS is often based on the nature of your dissimilarity data.
Tuning n_init and max_iter is more about ensuring good convergence rather than optimizing for a specific “accuracy” metric (as MDS itself doesn’t have a direct accuracy metric).

If you were to “tune” MDS, it would be more about:

Experimenting with metric=True vs. metric=False and visually assessing which type of MDS representation looks more meaningful or better preserves the relationships in your data.
Increasing n_init and max_iter to ensure you are finding a reasonably good MDS embedding with low stress, especially for complex datasets.
If MDS is used as preprocessing for another task (less common use case), you could potentially evaluate the performance of that downstream task with different MDS configurations (e.g., different n_components, metric type), but this is not standard MDS usage.

Model Accuracy Metrics (MDS - Stress Value and Visual Inspection)

“Accuracy” metrics in the standard machine learning sense are not directly applicable to MDS. MDS is primarily a dimensionality reduction and visualization technique aimed at preserving distances, not making predictions like classifiers or regressors.

Evaluating MDS Performance:

The “accuracy” of an MDS representation is assessed primarily by:

Stress Value: (As discussed in Post-Processing). Lower stress values indicate a better fit, meaning MDS has successfully preserved distances in the lower dimension. Use the rules of thumb for Kruskal’s stress to interpret the stress value.
Shepard Plot Visual Inspection: (As discussed in Post-Processing). Examine the Shepard plot to check for a linear relationship between original dissimilarities and MDS distances. Points clustering close to the diagonal line indicate good distance preservation.
Visual Inspection of MDS Plot: (Scatter plot of MDS-transformed data). Assess the visualization for:
- Meaningful Clusters and Structure: Does the MDS plot reveal clusters or groupings that are interpretable and consistent with your domain knowledge or expectations about the data?
- Preservation of Relationships: Does the MDS plot visually reflect the relationships that you expected to be present based on the original dissimilarities? For example, are items that you know should be similar (based on original data or domain knowledge) located close to each other in the MDS plot?
- Absence of Artifacts: Does the MDS plot look reasonable and free of obvious artifacts or distortions that might indicate a poor solution or convergence issues (e.g., highly compressed clusters, unnatural shapes)?

No Single “MDS Accuracy Score”:

There isn’t a single numerical “MDS accuracy score” like accuracy for classification or R² for regression. Evaluation is more qualitative and focuses on assessing the visual and structural fidelity of the MDS representation in preserving the original distance relationships, as indicated by stress and visual inspection of the MDS plot and Shepard plot.

Model Productionizing (MDS - Mostly Visualization in Production)

Productionizing MDS is different from productionizing predictive models (like classifiers or regressors). MDS is primarily a data analysis and visualization tool. “Productionizing” MDS often means integrating it into a data analysis or visualization pipeline, rather than deploying it to make real-time predictions.

Common Production Scenarios for MDS:

Interactive Data Exploration Tools: MDS can be integrated into interactive dashboards or data exploration tools that allow users to visualize high-dimensional data in 2D or 3D. Users can interact with the MDS plot, zoom, pan, and explore clusters or patterns.
Data Preprocessing Step for Visualization Pipelines: MDS can be used as a preprocessing step in automated data visualization pipelines. For example, to automatically generate 2D layouts for web interfaces or reports, where high-dimensional data needs to be summarized visually.
Offline Data Analysis and Report Generation: MDS can be used for offline data analysis to uncover hidden structures in complex datasets and generate static visualizations for reports, presentations, or scientific publications.

Steps for Production Integration:

Train MDS Model (Offline or Pre-calculated): Typically, you would train (fit) your MDS model offline on your training or entire dataset. MDS is often used for exploratory analysis, so retraining in real-time is less common compared to predictive models. Save the trained MDS model (using joblib.dump()).
Load and Transform Data for Visualization: In your production application or tool:
- Load the saved MDS model (using joblib.load()).
- If you are using original data features as input (and calculating dissimilarities), implement the same preprocessing steps (feature scaling, dissimilarity calculation) that you used during training.
- Transform your data using the transform() method of the loaded MDS model to get the low-dimensional MDS coordinates.
Visualization Component: Use a suitable plotting library (like Matplotlib, Seaborn, Plotly, D3.js for web applications, etc.) to create the 2D or 3D scatter plot of the MDS-transformed data points.
Interactive Elements (Optional): For interactive dashboards, incorporate features like:
- Zoom and pan controls for the MDS plot.
- Tooltips to display information about individual data points on hover.
- Filtering or selection capabilities to highlight subsets of data points in the MDS plot.
- Linking MDS plot with other views or data tables for deeper exploration.

Code Example (Illustrative - Python Flask API - Serving MDS Coordinates for a Web Visualization):

# app.py (Flask API - serving MDS coordinates for web visualization)
import joblib
from flask import Flask, request, jsonify
import pandas as pd
import numpy as np
from sklearn.metrics import pairwise_distances
from sklearn.preprocessing import StandardScaler

app = Flask(__name__)

# Load saved MDS model and scaler
mds_model = joblib.load('mds_model.joblib')
scaler = joblib.load('mds_scaler.joblib')

@app.route('/mds_coordinates', methods=['POST'])
def get_mds_coordinates():
    try:
        data_json = request.get_json()
        new_data_df = pd.DataFrame(data_json) # Input data as JSON, converted to DataFrame

        # Preprocess new data using LOADED scaler
        new_data_scaled = scaler.transform(new_data_df)

        # Calculate dissimilarity matrix for new data (using Euclidean distance on scaled data)
        new_dissimilarity_matrix = pairwise_distances(new_data_scaled, metric='euclidean')

        # MDS Transformation using LOADED MDS model
        mds_coords = mds_model.transform(new_dissimilarity_matrix)

        # Convert MDS coordinates to JSON format for sending to frontend
        mds_coords_list = mds_coords.tolist() # Convert NumPy array to list for JSON serialization
        return jsonify({'mds_coordinates': mds_coords_list}) # Return MDS coordinates as JSON

    except Exception as e:
        return jsonify({'error': str(e)}), 400

if __name__ == '__main__':
    app.run(debug=False, host='0.0.0.0', port=8080)

In this example, the Flask API endpoint /mds_coordinates receives data, preprocesses it, transforms it using the loaded MDS model, and returns the MDS coordinates as JSON. A frontend web application (e.g., using JavaScript and a plotting library like D3.js) could then consume this API endpoint and render an interactive 2D scatter plot using the MDS coordinates.

Productionization - Key Points for MDS Visualization Pipelines:

Pre-train and Save MDS: Train your MDS model offline and save it. Retraining MDS in real-time is often not necessary for visualization.
Consistent Preprocessing: Maintain consistent preprocessing (scaling, dissimilarity calculation) between training and production. Save and load preprocessing components (like StandardScaler).
API for Coordinates (Optional): If you need to serve MDS plots in web applications, consider creating an API endpoint to serve the MDS coordinates as JSON, allowing the frontend to handle the visualization rendering.
Visualization Library Integration: Integrate MDS output with suitable visualization libraries for creating interactive or static plots for data exploration and reporting.

Conclusion: MDS - A Powerful Tool for Unveiling Data Structure through Visualization

Multidimensional Scaling (MDS) is a valuable dimensionality reduction technique specifically designed for visualizing and understanding data relationships based on dissimilarities or distances. Its strengths and applications include:

Visualization of Dissimilarity Data: Excellent for visualizing data where pairwise distances or dissimilarities are the primary information, and direct feature values might be less important or not readily available.
Exploratory Data Analysis: Helps uncover hidden structures, clusters, and patterns in complex datasets by representing them in a lower-dimensional, visualizable space.
Perceptual Mapping: Useful in sensory science, marketing, and social sciences for mapping perceptions, preferences, or relationships based on subjective similarity/dissimilarity judgments.
Non-linear Dimensionality Reduction (Non-metric MDS): Non-metric MDS provides flexibility for handling ordinal dissimilarities and capturing non-linear relationships in distance data.

Real-World Applications Today (MDS’s Niche):

MDS remains relevant and is used in various domains for visualization and exploratory analysis:

Social Sciences: Visualizing social networks, cultural distances, survey data.
Marketing and Consumer Research: Product positioning, brand perception mapping, customer segmentation based on preference data.
Bioinformatics: Visualizing relationships between genes, proteins, or biological samples based on distance metrics.
Information Retrieval and NLP: Document visualization, topic modeling visualization, semantic space mapping.
Geographic and Spatial Data Analysis: Mapping locations based on travel times or distances.

Optimized and Newer Algorithms (Contextual Positioning):

While MDS is effective for visualizing distance-based relationships, other dimensionality reduction and visualization techniques exist:

t-SNE (t-distributed Stochastic Neighbor Embedding): https://scikit-learn.org/stable/modules/generated/sklearn.manifold.TSNE.html - t-SNE is also a non-linear dimensionality reduction technique excellent for visualization, particularly for high-dimensional data and preserving local neighborhood structure (cluster separation). t-SNE often produces visually compelling 2D embeddings, especially for cluster visualization, but can be computationally more intensive and might not preserve global distances as well as metric MDS.
UMAP (Uniform Manifold Approximation and Projection): https://umap-learn.readthedocs.io/en/latest/ - UMAP is another modern non-linear dimensionality reduction technique that is often faster than t-SNE and can be better at preserving both local and global structure in data. UMAP is increasingly popular for visualization and dimensionality reduction in machine learning and bioinformatics.
PCA (Principal Component Analysis): https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html - PCA is a linear dimensionality reduction method focusing on variance, while MDS focuses on distances. PCA is simpler and often computationally faster than MDS, t-SNE, or UMAP, and is suitable if linear dimensionality reduction is sufficient and you want to capture principal directions of variance rather than preserve pairwise distances specifically.

When to Choose MDS:

When your primary data input is a dissimilarity matrix (or you naturally have dissimilarity/distance information).
For visualizing data relationships based on distances.
When you want to create low-dimensional maps that reflect how close or far apart items are based on their dissimilarity.
For exploratory data analysis and revealing underlying structure through visualization.
As a complementary technique to PCA, t-SNE, and UMAP for different dimensionality reduction and visualization goals.

In conclusion, Multidimensional Scaling is a powerful tool for understanding data through visualization, especially when relationships are best expressed as pairwise dissimilarities. While newer techniques like t-SNE and UMAP have emerged, MDS remains a valuable and interpretable method for uncovering and presenting hidden structures in complex datasets, particularly when distance preservation is key.

References

Kruskal, J. B. (1964). Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis. Psychometrika, 29(1), 1-27. https://link.springer.com/article/10.1007/BF02289694 - (One of the foundational papers on Non-metric MDS and stress function minimization.)
Kruskal, J. B. (1964). Nonmetric multidimensional scaling: A numerical method. Psychometrika, 29(2), 115-129. https://link.springer.com/article/10.1007/BF02289701 - (Further details on non-metric MDS.)
Torgerson, W. S. (1952). Multidimensional scaling: I. Theory and method. Psychometrika, 17(4), 401-419. https://link.springer.com/article/10.1007/BF02288892 - (Early work on Metric (Classical) MDS.)
Scikit-learn Documentation on MDS: https://scikit-learn.org/stable/modules/generated/sklearn.manifold.MDS.html - (Official scikit-learn documentation for the MDS class.)
“Multidimensional scaling” - Wikipedia: https://en.wikipedia.org/wiki/Multidimensional_scaling - (Wikipedia entry providing a comprehensive overview of MDS.)
Borg, I., & Groenen, P. J. (2005). Modern multidimensional scaling: Theory and applications. Springer Science & Business Media. - (A comprehensive textbook on MDS.)
“How to Use and Interpret Multidimensional Scaling (MDS)” - Displayr blog post (Example practical guide): (Search online for blog posts and tutorials for practical MDS examples and interpretations.) (While I cannot link to external images, online resources can provide further practical guidance and visual examples of MDS applications and Shepard plots.)

Share on

Twitter Facebook LinkedIn