Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
167 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

t-viSNE: Interactive Assessment and Interpretation of t-SNE Projections (2002.06910v5)

Published 17 Feb 2020 in cs.LG, cs.HC, and stat.ML

Abstract: t-Distributed Stochastic Neighbor Embedding (t-SNE) for the visualization of multidimensional data has proven to be a popular approach, with successful applications in a wide range of domains. Despite their usefulness, t-SNE projections can be hard to interpret or even misleading, which hurts the trustworthiness of the results. Understanding the details of t-SNE itself and the reasons behind specific patterns in its output may be a daunting task, especially for non-experts in dimensionality reduction. In this work, we present t-viSNE, an interactive tool for the visual exploration of t-SNE projections that enables analysts to inspect different aspects of their accuracy and meaning, such as the effects of hyper-parameters, distance and neighborhood preservation, densities and costs of specific neighborhoods, and the correlations between dimensions and visual patterns. We propose a coherent, accessible, and well-integrated collection of different views for the visualization of t-SNE projections. The applicability and usability of t-viSNE are demonstrated through hypothetical usage scenarios with real data sets. Finally, we present the results of a user study where the tool's effectiveness was evaluated. By bringing to light information that would normally be lost after running t-SNE, we hope to support analysts in using t-SNE and making its results better understandable.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (76)
  1. I. T. Jolliffe and J. Cadima, “Principal Component Analysis: A Review and Recent Developments,” Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, vol. 374, no. 2065, pp. 1–16, 2016.
  2. J. W. Sammon, “A Nonlinear Mapping for Data Structure Analysis,” IEEE Transactions on Computers, vol. C-18, no. 5, pp. 401–409, 1969.
  3. J. B. Tenenbaum, V. De Silva, and J. C. Langford, “A Global Geometric Framework for Nonlinear Dimensionality Reduction,” Science, vol. 290, no. 5500, pp. 2319–2323, 2000.
  4. S. T. Roweis and L. K. Saul, “Nonlinear Dimensionality Reduction by Locally Linear Embedding,” Science, vol. 290, no. 5500, pp. 2323–2326, 2000.
  5. P. Joia, D. Coimbra, J. A. Cuminato, F. V. Paulovich, and L. G. Nonato, “Local Affine Multidimensional Projection,” IEEE Transactions on Visualization and Computer Graphics, vol. 17, no. 12, pp. 2563–2571, 2011.
  6. L. van der Maaten, E. Postma, and J. van den Herik, “Dimensionality Reduction: A Comparative Review,” Journal of Machine Learning Research, vol. 10, pp. 66–71, 2009.
  7. M. Espadoto, R. M. Martins, A. Kerren, N. S. T. Hirata, and A. C. Telea, “Towards a Quantitative Survey of Dimension Reduction Techniques,” IEEE Transactions on Visualization and Computer Graphics, 2019.
  8. L. van der Maaten and G. Hinton, “Visualizing Data Using t-SNE,” Journal of Machine Learning Research, vol. 9, pp. 2579–2605, 2008.
  9. T. Höllt, N. Pezzotti, V. van Unen, F. Koning, E. Eisemann, B. Lelieveldt, and A. Vilanova, “Cytosplore: Interactive Immune Cell Phenotyping for Large Single-Cell Datasets,” Computer Graphics Forum, vol. 35, no. 3, pp. 171–180, 2016.
  10. M. Johnson, M. Schuster, Q. V. Le, M. Krikun, Y. Wu, Z. Chen, N. Thorat, F. Viégas, M. Wattenberg, G. Corrado, M. Hughes, and J. Dean, “Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation,” Transactions of the Association for Computational Linguistics, vol. 5, pp. 339–351, 2017.
  11. E. D. Amir, K. L. Davis, M. D. Tadmor, E. F. Simonds, J. H. Levine, S. C. Bendall, D. K. Shenfeld, S. Krishnaswamy, G. P. Nolan, and D. Pe’er, “viSNE Enables Visualization of High Dimensional Single-Cell Data and Reveals Phenotypic Heterogeneity of Leukemia,” Nature Biotechnology, vol. 31, no. 6, pp. 545–552, 2013.
  12. M. Wattenberg, F. Viégas, and I. Johnson, “How to Use t-SNE Effectively,” Distill, 2016. [Online]. Available: http://distill.pub/2016/misread-tsne
  13. D. Sacha, L. Zhang, M. Sedlmair, J. A. Lee, J. Peltonen, D. Weiskopf, S. C. North, and D. A. Keim, “Visual Interaction with Dimensionality Reduction: A Structured Literature Analysis,” IEEE Transactions on Visualization and Computer Graphics, vol. 23, no. 1, pp. 241–250, 2017.
  14. L. G. Nonato and M. Aupetit, “Multidimensional Projection for Visual Analytics: Linking Techniques with Distortions, Tasks, and Layout Enrichment,” IEEE Transactions on Visualization and Computer Graphics, vol. 25, no. 8, pp. 2650–2673, 2019.
  15. A. Chatzimparmpas, R. M. Martins, and A. Kerren, “t-viSNE: A Visual Inspector for the Exploration of t-SNE,” in Poster Abstracts, IEEE Information Visualization (VIS ’18), 2018.
  16. T. Schreck, T. von Landesberger, and S. Bremm, “Techniques for Precision-Based Visual Analysis of Projected Data,” Information Visualization, vol. 9, no. 3, pp. 181–193, 2010.
  17. E. Sherkat, S. Nourashrafeddin, E. E. Milios, and R. Minghim, “Interactive Document Clustering Revisited: A Visual Analytics Approach,” in Proceedings of the 23rd International Conference on Intelligent User Interfaces, ser. IUI ’18.   ACM, 2018, pp. 281–292.
  18. A. Endert, P. Fiaux, and C. North, “Semantic Interaction for Visual Text Analytics,” in Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, ser. CHI ’12.   ACM, 2012, pp. 473–482.
  19. “t-viSNE Code,” 2020, accessed April 04, 2020. [Online]. Available: http://bit.ly/t-visne-code
  20. R. Cutura, S. Holzer, M. Aupetit, and M. Sedlmair, “VisCoDeR: A Tool for Visually Comparing Dimensionality Reduction Algorithms,” in Proceedings of the European Symposium on Artificial Neural Networks (ESANN ’18).   i6doc.com publication, 2018, pp. 105–110.
  21. M. Cavallo and C. D.   “Clustrophile 2: Guided Visual Clustering Analysis,” IEEE Transactions on Visualization and Computer Graphics, vol. 25, no. 1, pp. 267–276, 2019.
  22. J. Venna and S. Kaski, “Visualizing Gene Interaction Graphs with Local Multidimensional Scaling,” in Proceedings of the European Symposium on Artificial Neural Networks (ESANN ’06), 2006, pp. 557–562.
  23. M. Sips, B. Neubert, J. Lewis, and P. Hanrahan, “Selecting Good Views of High-Dimensional Data Using Class Consistency,” Computer Graphics Forum, vol. 28, no. 3, pp. 831–838, 2009.
  24. M. M. Abbas, M. Aupetit, M. Sedlmair, and H. Bensmail, “ClustMe: A Visual Quality Measure for Ranking Monochrome Scatterplots based on Cluster Patterns,” Computer Graphics Forum, vol. 38, no. 3, pp. 225–236, 2019.
  25. R. M. Martins, D. Coimbra, R. Minghim, and A. C. Telea, “Visual Analysis of Dimensionality Reduction Quality for Parameterized Projections,” Computers & Graphics, vol. 41, pp. 26–42, 2014.
  26. B. Mokbel, W. Lueks, A. Gisbrecht, and B. Hammer, “Visualizing the Quality of Dimensionality Reduction,” Neurocomputing, vol. 112, pp. 109–123, 2013, Advances in Artificial Neural Networks, Machine Learning, and Computational Intelligence.
  27. C. Seifert, V. Sabol, and W. Kienreich, “Stress Maps: Analysing Local Phenomena in Dimensionality Reduction Based Visualisations,” in Proceedings of the International Symposium on Visual Analytics Science and Technology (EuroVAST ’10).   The Eurographics Association, 2010.
  28. J. A. Lee and M. Verleysen, “Quality Assessment of Dimensionality Reduction: Rank-Based Criteria,” Neurocomputing, vol. 72, no. 7, pp. 1431–1443, 2009, Advances in Machine Learning and Computational Intelligence.
  29. S. Lespinats and M. Aupetit, “CheckViz: Sanity Check and Topological Clues for Linear and Non-Linear Mappings,” Computer Graphics Forum, vol. 30, no. 1, pp. 113–125, 2011.
  30. M. Aupetit, “Visualizing Distortions and Recovering Topology in Continuous Projection Techniques,” Neurocomputing, vol. 70, no. 7–9, pp. 1304–1330, 2007.
  31. R. M. Martins, R. Minghim, and A. C. Telea, “Explaining Neighborhood Preservation for Multidimensional Projections,” in Proceedings of the Computer Graphics & Visual Computing (CGVC ’15).   Eurographics, 2015, pp. 121–128.
  32. N. Heulot, M. Aupetit, and J.-D. Fekete, “ProxiLens: Interactive Exploration of High-Dimensional Data Using Projections,” in Proceedings of the EuroVis Workshop on Visual Analytics using Multidimensional Projections.   The Eurographics Association, 2013.
  33. S. Liu, B. Wang, P.-T. Bremer, and V. Pascucci, “Distortion-Guided Structure-Driven Interactive Exploration of High-Dimensional Data,” Computer Graphics Forum, vol. 33, no. 3, pp. 101–110, 2014.
  34. J. Stahnke, M. Dörk, B. Müller, and A. Thom, “Probing Projections: Interaction Techniques for Interpreting Arrangements and Errors of Dimensionality Reductions,” IEEE Transactions on Visualization and Computer Graphics, vol. 22, no. 1, pp. 629–638, 2016.
  35. S. J. Fernstad, J. Shaw, and J. Johansson, “Quality-Based Guidance for Exploratory Dimensionality Reduction,” Information Visualization, vol. 12, no. 1, pp. 44–64, 2013.
  36. R. da Silva, P. Rauber, R. M. Martins, R. Minghim, and A. C. Telea, “Attribute-Based Visual Explanation of Multidimensional Projections,” in Proceedings of the EuroVis Workshop on Visual Analytics (EuroVA ’15), 2015, pp. 31–35.
  37. E. Kandogan, “Just-in-Time Annotation of Clusters, Outliers, and Trends in Point-Based Data Visualizations,” in Proceedings of the IEEE Conference on Visual Analytics Science and Technology (VAST ’12).   IEEE, 2012, pp. 73–82.
  38. Y. Chen, S. Barlowe, and J. Yang, “Click2Annotate: Automated Insight Externalization with Rich Semantics,” in Proceedings of the IEEE Symposium on Visual Analytics Science and Technology (VAST ’10).   IEEE, 2010, pp. 155–162.
  39. L. Tan, Y. Song, S. Liu, and L. Xie, “ImageHive: Interactive Content-Aware Image Summarization,” IEEE Computer Graphics and Applications, vol. 32, no. 1, pp. 46–55, 2012.
  40. D. B. Coimbra, R. M. Martins, T. T. Neves, A. C. Telea, and F. V. Paulovich, “Explaining Three-Dimensional Dimensionality Reduction Plots,” Information Visualization, vol. 15, no. 2, pp. 154–172, 2016.
  41. I. Borg and P. Groenen, “Modern Multidimensional Scaling: Theory and Applications,” Journal of Educational Measurement, vol. 40, no. 3, pp. 277–280, 2003.
  42. T. Fujiwara, O. Kwon, and K. Ma, “Supporting Analysis of Dimensionality Reduction Results with Contrastive Learning,” IEEE Transactions on Visualization and Computer Graphics, vol. 26, no. 1, pp. 45–55, 2020.
  43. R. Faust, D. Glickenstein, and C. Scheidegger, “DimReader: Axis Lines that Explain Non-Linear Projections,” IEEE Transactions on Visualization and Computer Graphics, vol. 25, no. 1, pp. 481–490, 2019.
  44. M. Cavallo and Ç. Demiralp, “A Visual Interaction Framework for Dimensionality Reduction Based Data Exploration,” in Extended Abstracts of the 2018 CHI Conference on Human Factors in Computing Systems, ser. CHI EA ’18.   ACM, 2018, pp. D112:1–D112:4.
  45. B. C. Kwon, H. Kim, E. Wall, J. Choo, H. Park, and A. Endert, “AxiSketcher: Interactive Nonlinear Axis Mapping of Visualizations through User Drawings,” IEEE Transactions on Visualization and Computer Graphics, vol. 23, no. 1, pp. 221–230, 2017.
  46. H. Kim, J. Choo, H. Park, and A. Endert, “InterAxis: Steering Scatterplot Axes via Observation-Level Interaction,” IEEE Transactions on Visualization and Computer Graphics, vol. 22, no. 1, pp. 131–140, 2016.
  47. M. Dowling, J. Wenskovitch, J. T. Fry, S. Leman, L. House, and C. North, “SIRIUS: Dual, Symmetric, Interactive Dimension Reductions,” IEEE Transactions on Visualization and Computer Graphics, vol. 25, no. 1, pp. 172–182, 2019.
  48. C. Lai, Y. Zhao, and X. Yuan, “Exploring High-Dimensional Data Through Locally Enhanced Projections,” Journal of Visual Languages & Computing, vol. 48, pp. 144–156, 2018.
  49. B. C. Kwon, B. Eysenbach, J. Verma, K. Ng, C. De Filippi, W. F. Stewart, and A. Perer, “Clustervision: Visual Supervision of Unsupervised Clustering,” IEEE Transactions on Visualization and Computer Graphics, vol. 24, no. 1, pp. 142–151, 2018.
  50. L. van der Maaten, “Accelerating t-SNE Using Tree-Based Algorithms,” Journal of Machine Learning Research, vol. 15, no. 1, pp. 3221–3245, 2014.
  51. N. Pezzotti, B. P. F. Lelieveldt, L. v. d. Maaten, T. Höllt, E. Eisemann, and A. Vilanova, “Approximated and User Steerable tSNE for Progressive Visual Analytics,” IEEE Transactions on Visualization and Computer Graphics, vol. 23, no. 7, pp. 1739–1752, 2017.
  52. D. M. Chan, R. Rao, F. Huang, and J. F. Canny, “t-SNE-CUDA: GPU-Accelerated t-SNE and its Applications to Modern Data,” in Proceedings of the 30th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD).   IEEE, 2018, pp. 330–338.
  53. L. Kaufman and P. Rousseeuw, “Clustering by Means of Medoids,” Faculty of Mathematics and Informatics, Delft University of Technology, the Netherlands, Tech. Rep., 1987.
  54. J. D. Leeuw and P. Mair, “Shepard Diagram,” in Wiley StatsRef: Statistics Reference Online.   American Cancer Society, 2015, pp. 1–3.
  55. D. Dua and C. Graff, “UCI Machine Learning Repository,” 2017. [Online]. Available: http://archive.ics.uci.edu/ml
  56. A. Inselberg and B. Dimsdale, “Parallel Coordinates: A Tool for Visualizing Multi-Dimensional Geometry,” in Proceedings of the 1st Conference on Visualization (Vis ’90).   IEEE, 1990, pp. 361–378.
  57. Y. Ming, H. Qu, and E. Bertini, “RuleMatrix: Visualizing and Understanding Classifiers with Rules,” IEEE Transactions on Visualization and Computer Graphics, vol. 25, no. 1, pp. 342–352, 2019.
  58. J. Smith, J. Everhart, W. Dickson, W. Knowler, and R. Johannes, “Using the ADAP Learning Algorithm to Forecast the Onset of Diabetes Mellitus,” in Proceedings of the Annual Symposium Computer Application in Medical Care.   American Medical Informatics Association, 1988, pp. 261–265.
  59. D. Smilkov, N. Thorat, C. Nicholson, E. Reif, F. B. Viégas, and M. Wattenberg, “Embedding Projector: Interactive Visualization and Interpretation of Embeddings,” in Proceedings of the NIPS 2016 Workshop on Interpretable Machine Learning for Complex Systems, 2016.
  60. E. Wall, M. Agnihotri, L. Matzen, K. Divis, M. Haass, A. Endert, and J. Stasko, “A Heuristic Approach to Value-Driven Evaluation of Visualizations,” IEEE Transactions on Visualization and Computer Graphics, vol. 25, no. 1, pp. 491–500, 2019.
  61. L. R. Borges, “Analysis of the Wisconsin Breast Cancer Dataset and Machine Learning for Breast Cancer Detection,” in Proceedings of the XI Workshop on Computational Vision (WVC), 2015.
  62. S. M. Longshaw, M. J. Turner, and W. T. Hewitt, “Interactive Grid Based Binning for Information Visualization,” in Theory and Practice of Computer Graphics, I. S. Lim and W. Tang, Eds.   The Eurographics Association, 2008.
  63. Y. Liu and J. Heer, “Somewhere over the Rainbow: An Empirical Assessment of Quantitative Colormaps,” in Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems, ser. CHI ’18.   ACM, 2018, pp. 598:1–598:12.
  64. D. H. Jeong, C. Ziemkiewicz, B. Fisher, W. Ribarsky, and R. Chang, “iPCA: An Interactive System for PCA-Based Visual Analytics,” Computer Graphics Forum, vol. 28, no. 3, pp. 767–774, 2009.
  65. M. Ankerst, S. Berchtold, and D. A. Keim, “Similarity Clustering of Dimensions for an Enhanced Visualization of Multidimensional Data,” in Proceedings of the IEEE Symposium on Information Visualization, 1998, pp. 52–60.
  66. L. F. Lu, M. L. Huang, and J. Zhang, “Two Axes Re-Ordering Methods in Parallel Coordinates Plots,” Journal of Visual Languages & Computing, vol. 33, pp. 3–12, 2016.
  67. “D3 — Data-Driven Documents,” 2011, accessed April 04, 2020. [Online]. Available: https://d3js.org/
  68. “Three.js — JavaScript 3D Library,” 2010, accessed April 04, 2020. [Online]. Available: https://threejs.org
  69. “Plotly — JavaScript Open Source Graphing Library,” 2010, accessed April 04, 2020. [Online]. Available: https://plot.ly
  70. “Projlib – A python library to support research on multidimensional projections,” 2020. [Online]. Available: https://github.com/rafaelmessias/projlib
  71. C. de Bodt, D. Mulders, M. Verleysen, and J. A. Lee, “Perplexity-Free t-SNE and Twice Student tt-SNE,” in Proceedings of the European Symposium on Artificial Neural Networks (ESANN ’18), 2018.
  72. C. De Bodt, D. Mulders, M. Verleysen, and J. A. Lee, “Extensive Assessment of Barnes-Hut t-SNE.” in Proceedings of the European Symposium on Artificial Neural Networks (ESANN ’18), 2018.
  73. G. C. Linderman and S. Steinerberger, “Clustering with t-SNE, Provably,” SIAM Journal on Mathematics of Data Science, vol. 1, no. 2, pp. 313–332, 2019.
  74. V. van Unen, T. Höllt, N. Pezzotti, N. Li, M. J. Reinders, E. Eisemann, F. Koning, A. Vilanova, and B. P. Lelieveldt, “Visual Analysis of Mass Cytometry Data by Hierarchical Stochastic Neighbour Embedding Reveals Rare Cell Types,” Nature Communications, vol. 8, no. 1, p. 1740, 2017.
  75. G. C. Linderman, M. Rachh, J. G. Hoskins, S. Steinerberger, and Y. Kluger, “Fast Interpolation-Based t-SNE for Improved Visualization of Single-Cell RNA-Seq Data,” Nature Methods, vol. 16, no. 3, p. 243, 2019.
  76. S. Carpendale, “Evaluating Information Visualizations,” in Information Visualization: Human-Centered Issues and Perspectives.   Springer Berlin Heidelberg, 2008, pp. 19–45.
Citations (122)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com