Skip to main content

09.05.2024 | Original Research

Accelerated Sequential Data Clustering

verfasst von: Reza Mortazavi, Elham Enayati, Abdolali Basiri

Erschienen in: Journal of Classification

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Data clustering is an important task in the field of data mining. In many real applications, clustering algorithms must consider the order of data, resulting in the problem of clustering sequential data. For instance, analyzing the moving pattern of an object and detecting community structure in a complex network are related to sequential data clustering. The constraint of the continuous region prevents previous clustering algorithms from being directly applied to the problem. A dynamic programming algorithm was proposed to address the issue, which returns the optimal sequential data clustering. However, it is not scalable and hence the practicality is limited. This paper revisits the solution and enhances it by introducing a greedy stopping condition. This condition halts the algorithm’s search process when it is likely that the optimal solution has been found. Experimental results on multiple datasets show that the algorithm is much faster than its original solution while the optimality gap is negligible.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Abbasi, M., Bhaskara, A., & Venkatasubramanian, S. (2021). Fair clustering via equitable group representations. In: Proceedings of the ACM conference on fairness, accountability, and transparency (pp. 504–514) Abbasi, M., Bhaskara, A., & Venkatasubramanian, S. (2021). Fair clustering via equitable group representations. In: Proceedings of the ACM conference on fairness, accountability, and transparency (pp. 504–514)
Zurück zum Zitat Abbasimehr, H., & Baghery, F. S. (2022). A novel time series clustering method with fine-tuned support vector regression for customer behavior analysis. Expert Systems with Applications (p. 117584) Abbasimehr, H., & Baghery, F. S. (2022). A novel time series clustering method with fine-tuned support vector regression for customer behavior analysis. Expert Systems with Applications (p. 117584)
Zurück zum Zitat Aloise, D., Deshpande, A., Hansen, P., et al. (2009). NP-hardness of Euclidean sum-of-squares clustering. Machine Learning, 75(2), 245–248.CrossRef Aloise, D., Deshpande, A., Hansen, P., et al. (2009). NP-hardness of Euclidean sum-of-squares clustering. Machine Learning, 75(2), 245–248.CrossRef
Zurück zum Zitat Arthur, D. (2007). K-means++: The advantages of careful seeding. In: Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms (pp. 1027–1035). New Orleans, Louisiana, Society for Industrial and Applied Mathematics. Arthur, D. (2007). K-means++: The advantages of careful seeding. In: Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms (pp. 1027–1035). New Orleans, Louisiana, Society for Industrial and Applied Mathematics.
Zurück zum Zitat Bigdeli, A., Maghsoudi, A., & Ghezelbash, R. (2022). Application of self-organizing map (SOM) and K-means clustering algorithms for portraying geochemical anomaly patterns in Moalleman District, NE Iran. Journal of Geochemical Exploration, 233(106), 923. Bigdeli, A., Maghsoudi, A., & Ghezelbash, R. (2022). Application of self-organizing map (SOM) and K-means clustering algorithms for portraying geochemical anomaly patterns in Moalleman District, NE Iran. Journal of Geochemical Exploration, 233(106), 923.
Zurück zum Zitat Cerqueti, R., D’Urso, P., De Giovanni, L., et al. (2022). Weighted score-driven fuzzy clustering of time series with a financial application. Expert Systems with Applications, 198(116), 752. Cerqueti, R., D’Urso, P., De Giovanni, L., et al. (2022). Weighted score-driven fuzzy clustering of time series with a financial application. Expert Systems with Applications, 198(116), 752.
Zurück zum Zitat Chan, Z. S., Collins, L., & Kasabov, N. (2006). An efficient greedy k-means algorithm for global gene trajectory clustering. Expert Systems with Applications, 30(1), 137–141.CrossRef Chan, Z. S., Collins, L., & Kasabov, N. (2006). An efficient greedy k-means algorithm for global gene trajectory clustering. Expert Systems with Applications, 30(1), 137–141.CrossRef
Zurück zum Zitat Ding, C., Sun, S., & Zhao, J. (2022). MST-GAT: A multimodal spatial-temporal graph attention network for time series anomaly detection. Information Fusion,. Ding, C., Sun, S., & Zhao, J. (2022). MST-GAT: A multimodal spatial-temporal graph attention network for time series anomaly detection. Information Fusion,.
Zurück zum Zitat Dogan, A., & Birant, D. (2022). K-centroid link: A novel hierarchical clustering linkage method. Applied Intelligence, 52(5), 5537–5560.CrossRef Dogan, A., & Birant, D. (2022). K-centroid link: A novel hierarchical clustering linkage method. Applied Intelligence, 52(5), 5537–5560.CrossRef
Zurück zum Zitat Dupin, N., Nielsen, F., & Talbi, E. (2018). Dynamic programming heuristic for K-means clustering among a 2-dimensional Pareto frontier. In: 7th International conference on metaheuristics and nature inspired computing (pp. 1–8) Dupin, N., Nielsen, F., & Talbi, E. (2018). Dynamic programming heuristic for K-means clustering among a 2-dimensional Pareto frontier. In: 7th International conference on metaheuristics and nature inspired computing (pp. 1–8)
Zurück zum Zitat Enayati, E., Mortazavi, R., Basiri, A., et al. (2023). Time series anomaly detection via clustering-based representation. Evolving Systems. In press Enayati, E., Mortazavi, R., Basiri, A., et al. (2023). Time series anomaly detection via clustering-based representation. Evolving Systems. In press
Zurück zum Zitat Frey, B. J., & Dueck, D. (2007). Clustering by passing messages between data points. Science, 315(5814), 972–976.MathSciNetCrossRef Frey, B. J., & Dueck, D. (2007). Clustering by passing messages between data points. Science, 315(5814), 972–976.MathSciNetCrossRef
Zurück zum Zitat Houssein, E. H., Ibrahim, I. E., Neggaz, N., et al. (2021). An efficient ECG arrhythmia classification method based on Manta ray foraging optimization. Expert Systems with Applications, 181(115), 131. Houssein, E. H., Ibrahim, I. E., Neggaz, N., et al. (2021). An efficient ECG arrhythmia classification method based on Manta ray foraging optimization. Expert Systems with Applications, 181(115), 131.
Zurück zum Zitat Jezewski, J., Matonia, A., Kupka, T., et al. (2012). Determination of fetal heart rate from abdominal signals: Evaluation of beat-to-beat accuracy in relation to the direct fetal electrocardiogram. Biomedizinische Technik/Biomedical Engineering, 57(5), 383–394.CrossRef Jezewski, J., Matonia, A., Kupka, T., et al. (2012). Determination of fetal heart rate from abdominal signals: Evaluation of beat-to-beat accuracy in relation to the direct fetal electrocardiogram. Biomedizinische Technik/Biomedical Engineering, 57(5), 383–394.CrossRef
Zurück zum Zitat Kalti, K., & Touil, A. (2023). A robust contextual fuzzy C-means clustering algorithm for noisy image segmentation. Journal of Classification. In press Kalti, K., & Touil, A. (2023). A robust contextual fuzzy C-means clustering algorithm for noisy image segmentation. Journal of Classification. In press
Zurück zum Zitat Kaya, M. F., & Schoop, M. (2022). Analytical comparison of clustering techniques for the recognition of communication patterns. Group Decision and Negotiation, 31(3), 555–589.CrossRef Kaya, M. F., & Schoop, M. (2022). Analytical comparison of clustering techniques for the recognition of communication patterns. Group Decision and Negotiation, 31(3), 555–589.CrossRef
Zurück zum Zitat Laguna, P., Mark, R. G., Goldberg, A., et al. (1997). A database for evaluation of algorithms for measurement of QT and other waveform intervals in the ECG. In: Computers in cardiology 1997 (pp. 673–676). IEEE Laguna, P., Mark, R. G., Goldberg, A., et al. (1997). A database for evaluation of algorithms for measurement of QT and other waveform intervals in the ECG. In: Computers in cardiology 1997 (pp. 673–676). IEEE
Zurück zum Zitat Lei, T., Jia, X., Zhang, Y., et al. (2018). Significantly fast and robust fuzzy C-means clustering algorithm based on morphological reconstruction and membership filtering. IEEE Transactions on Fuzzy Systems, 26(5), 3027–3041.CrossRef Lei, T., Jia, X., Zhang, Y., et al. (2018). Significantly fast and robust fuzzy C-means clustering algorithm based on morphological reconstruction and membership filtering. IEEE Transactions on Fuzzy Systems, 26(5), 3027–3041.CrossRef
Zurück zum Zitat Li, A., Xiong, S., Li, J., et al. (2022). AngClust: Angle feature-based clustering for short time series gene expression profiles. IEEE/ACM Transactions on Computational Biology and Bioinformatics,. Li, A., Xiong, S., Li, J., et al. (2022). AngClust: Angle feature-based clustering for short time series gene expression profiles. IEEE/ACM Transactions on Computational Biology and Bioinformatics,.
Zurück zum Zitat Li, H. (2019). Multivariate time series clustering based on common principal component analysis. Neurocomputing, 349, 239–247.CrossRef Li, H. (2019). Multivariate time series clustering based on common principal component analysis. Neurocomputing, 349, 239–247.CrossRef
Zurück zum Zitat Li, X., & Liu, H. (2018). Greedy optimization for K-means-based consensus clustering. Tsinghua Science and Technology, 23(2), 184–194.CrossRef Li, X., & Liu, H. (2018). Greedy optimization for K-means-based consensus clustering. Tsinghua Science and Technology, 23(2), 184–194.CrossRef
Zurück zum Zitat Li, Y., Ma, J., Miao, Y., et al. (2020). Similarity search for encrypted images in secure cloud computing. IEEE Transactions on Cloud Computing,. Li, Y., Ma, J., Miao, Y., et al. (2020). Similarity search for encrypted images in secure cloud computing. IEEE Transactions on Cloud Computing,.
Zurück zum Zitat Lin, C. R., & Chen, M. S. (2002). On the optimal clustering of sequential data. In: Proceedings of the 2002 SIAM international conference on data mining (pp. 141–157). SIAM Lin, C. R., & Chen, M. S. (2002). On the optimal clustering of sequential data. In: Proceedings of the 2002 SIAM international conference on data mining (pp. 141–157). SIAM
Zurück zum Zitat Maršánová, L., Smisek, R., Němcová, A., et al. (2021). Brno University of Technology ECG signal database with annotations of P wave (BUT PDB) Maršánová, L., Smisek, R., Němcová, A., et al. (2021). Brno University of Technology ECG signal database with annotations of P wave (BUT PDB)
Zurück zum Zitat Moody, G. B., & Mark, R. G. (2001). The impact of the MIT-BIH arrhythmia database. IEEE Engineering in Medicine and Biology Magazine, 20(3), 45–50.CrossRef Moody, G. B., & Mark, R. G. (2001). The impact of the MIT-BIH arrhythmia database. IEEE Engineering in Medicine and Biology Magazine, 20(3), 45–50.CrossRef
Zurück zum Zitat Mortazavi, R., & Erfani, S. H. (2018). An effective method for utility preserving social network graph anonymization based on mathematical modeling. International Journal of Engineering, 31(10), 1624–1632. Mortazavi, R., & Erfani, S. H. (2018). An effective method for utility preserving social network graph anonymization based on mathematical modeling. International Journal of Engineering, 31(10), 1624–1632.
Zurück zum Zitat Mortazavi, R., & Jalili, S. (2014). Fast data-oriented microaggregation algorithm for large numerical datasets. Knowledge-Based Systems, 67, 195–205.CrossRef Mortazavi, R., & Jalili, S. (2014). Fast data-oriented microaggregation algorithm for large numerical datasets. Knowledge-Based Systems, 67, 195–205.CrossRef
Zurück zum Zitat Mortazavi, R., & Jalili, S. (2017). Fine granular proximity breach prevention during numerical data anonymization. Transactions on Data Privacy, 10(2), 117–144. Mortazavi, R., & Jalili, S. (2017). Fine granular proximity breach prevention during numerical data anonymization. Transactions on Data Privacy, 10(2), 117–144.
Zurück zum Zitat Moshkovitz, M., Dasgupta, S., Rashtchian, C., et al. (2020). Explainable K-means and K-medians clustering. In: International Conference on Machine Learning (pp. 7055–7065). PMLR Moshkovitz, M., Dasgupta, S., Rashtchian, C., et al. (2020). Explainable K-means and K-medians clustering. In: International Conference on Machine Learning (pp. 7055–7065). PMLR
Zurück zum Zitat Nielsen, F. (2016). Hierarchical clustering. In: Introduction to HPC with MPI for data science (pp. 195–211). Springer, chap 8 Nielsen, F. (2016). Hierarchical clustering. In: Introduction to HPC with MPI for data science (pp. 195–211). Springer, chap 8
Zurück zum Zitat Pakhira, M. K. (2014). A linear time-complexity k-means algorithm using cluster shifting. In: International conference on computational intelligence and communication networks (pp. 1047–1051). IEEE Pakhira, M. K. (2014). A linear time-complexity k-means algorithm using cluster shifting. In: International conference on computational intelligence and communication networks (pp. 1047–1051). IEEE
Zurück zum Zitat Pasupathi, S., Shanmuganathan, V., Madasamy, K., et al. (2021). Trend analysis using agglomerative hierarchical clustering approach for time series big data. The Journal of Supercomputing, 77(7), 6505–6524.CrossRef Pasupathi, S., Shanmuganathan, V., Madasamy, K., et al. (2021). Trend analysis using agglomerative hierarchical clustering approach for time series big data. The Journal of Supercomputing, 77(7), 6505–6524.CrossRef
Zurück zum Zitat Sun, L., Qin, X., Ding, W., et al. (2022). Nearest neighbors-based adaptive density peaks clustering with optimized allocation strategy. Neurocomputing, 473, 159–181.CrossRef Sun, L., Qin, X., Ding, W., et al. (2022). Nearest neighbors-based adaptive density peaks clustering with optimized allocation strategy. Neurocomputing, 473, 159–181.CrossRef
Zurück zum Zitat Suo, Y., Ji, Y., Zhang, Z., et al. (2022). A formal and visual data-mining model for complex ship behaviors and patterns. Sensors, 22(14), 5281.CrossRef Suo, Y., Ji, Y., Zhang, Z., et al. (2022). A formal and visual data-mining model for complex ship behaviors and patterns. Sensors, 22(14), 5281.CrossRef
Zurück zum Zitat Wang, H., & Song, M. (2011). Ckmeans. 1d. dp: Optimal K-means clustering in one dimension by dynamic programming. The R journal, 3(2), 29.CrossRef Wang, H., & Song, M. (2011). Ckmeans. 1d. dp: Optimal K-means clustering in one dimension by dynamic programming. The R journal, 3(2), 29.CrossRef
Zurück zum Zitat Wang, Q., Zhang, F., & Li, X. (2018). Optimal clustering framework for hyperspectral band selection. IEEE Transactions on Geoscience and Remote Sensing, 56(10), 5910–5922. Wang, Q., Zhang, F., & Li, X. (2018). Optimal clustering framework for hyperspectral band selection. IEEE Transactions on Geoscience and Remote Sensing, 56(10), 5910–5922.
Metadaten
Titel
Accelerated Sequential Data Clustering
verfasst von
Reza Mortazavi
Elham Enayati
Abdolali Basiri
Publikationsdatum
09.05.2024
Verlag
Springer US
Erschienen in
Journal of Classification
Print ISSN: 0176-4268
Elektronische ISSN: 1432-1343
DOI
https://doi.org/10.1007/s00357-024-09472-4

Premium Partner