Emergent Mind

A New Clustering neural network for Chinese word segmentation

(2002.07458)
Published Feb 18, 2020 in cs.CL

Abstract

In this article I proposed a new model to achieve Chinese word segmentation(CWS),which may have the potentiality to apply in other domains in the future.It is a new thinking in CWS compared to previous works,to consider it as a clustering problem instead of a labeling problem.In this model,LSTM and self attention structures are used to collect context also sentence level features in every layer,and after several layers,a clustering model is applied to split characters into groups,which are the final segmentation results.I call this model CLNN.This algorithm can reach 98 percent of F score (without OOV words) and 85 percent to 95 percent F score (with OOV words) in training data sets.Error analyses shows that OOV words will greatly reduce performances,which needs a deeper research in the future.

We're not able to analyze this paper right now due to high demand.

Please check back later (sorry!).

Generate a summary of this paper on our Pro plan:

We ran into a problem analyzing this paper.

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.