Emergent Mind

A Novel Sampled Clustering Algorithm for Rice Phenotypic Data

(2312.14920)
Published Dec 22, 2023 in cs.LG and cs.AI

Abstract

Phenotypic (or Physical) characteristics of plant species are commonly used to perform clustering. In one of our recent works (Shastri et al. (2021)), we used a probabilistically sampled (using pivotal sampling) and spectrally clustered algorithm to group soybean species. These techniques were used to obtain highly accurate clusterings at a reduced cost. In this work, we extend the earlier algorithm to cluster rice species. We improve the base algorithm in three ways. First, we propose a new function to build the similarity matrix in Spectral Clustering. Commonly, a natural exponential function is used for this purpose. Based upon the spectral graph theory and the involved Cheeger's inequality, we propose the use a base "a" exponential function instead. This gives a similarity matrix spectrum favorable for clustering, which we support via an eigenvalue analysis. Also, the function used to build the similarity matrix in Spectral Clustering was earlier scaled with a fixed factor (called global scaling). Based upon the idea of Zelnik-Manor and Perona (2004), we now use a factor that varies with matrix elements (called local scaling) and works better. Second, to compute the inclusion probability of a specie in the pivotal sampling algorithm, we had earlier used the notion of deviation that captured how far specie's characteristic values were from their respective base values (computed over all species). A maximum function was used before to find the base values. We now use a median function, which is more intuitive. We support this choice using a statistical analysis. Third, with experiments on 1865 rice species, we demonstrate that in terms of silhouette values, our new Sampled Spectral Clustering is 61% better than Hierarchical Clustering (currently prevalent). Also, our new algorithm is significantly faster than Hierarchical Clustering due to the involved sampling.

We're not able to analyze this paper right now due to high demand.

Please check back later (sorry!).

Generate a summary of this paper on our Pro plan:

We ran into a problem analyzing this paper.

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.