Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 165 tok/s
Gemini 2.5 Pro 47 tok/s Pro
GPT-5 Medium 28 tok/s Pro
GPT-5 High 24 tok/s Pro
GPT-4o 112 tok/s Pro
Kimi K2 208 tok/s Pro
GPT OSS 120B 466 tok/s Pro
Claude Sonnet 4.5 36 tok/s Pro
2000 character limit reached

Distance-based mutual congestion feature selection with genetic algorithm for high-dimensional medical datasets (2407.15611v1)

Published 22 Jul 2024 in cs.LG and cs.NE

Abstract: Feature selection poses a challenge in small-sample high-dimensional datasets, where the number of features exceeds the number of observations, as seen in microarray, gene expression, and medical datasets. There isn't a universally optimal feature selection method applicable to any data distribution, and as a result, the literature consistently endeavors to address this issue. One recent approach in feature selection is termed frequency-based feature selection. However, existing methods in this domain tend to overlook feature values, focusing solely on the distribution in the response variable. In response, this paper introduces the Distance-based Mutual Congestion (DMC) as a filter method that considers both the feature values and the distribution of observations in the response variable. DMC sorts the features of datasets, and the top 5% are retained and clustered by KMeans to mitigate multicollinearity. This is achieved by randomly selecting one feature from each cluster. The selected features form the feature space, and the search space for the Genetic Algorithm with Adaptive Rates (GAwAR) will be approximated using this feature space. GAwAR approximates the combination of the top 10 features that maximizes prediction accuracy within a wrapper scheme. To prevent premature convergence, GAwAR adaptively updates the crossover and mutation rates. The hybrid DMC-GAwAR is applicable to binary classification datasets, and experimental results demonstrate its superiority over some recent works. The implementation and corresponding data are available at https://github.com/hnematzadeh/DMC-GAwAR

Summary

We haven't generated a summary for this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Github Logo Streamline Icon: https://streamlinehq.com
X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 1 tweet and received 0 likes.

Upgrade to Pro to view all of the tweets about this paper: