1. What do named entities do ?

In the field of natural language processing applications , Named entity recognition is information retrieval , knowledge graph , MT , Emotional analysis , Q & a system and other basic tasks of natural language processing applications , for example , We need to use named entity recognition technology to automatically identify user queries , Then the
The entity in the query is linked to the node corresponding to the knowledge map, and its recognition accuracy will directly affect the follow-up work .

2. What are the difficulties of named entity recognition ?

* The recognition of named entities is different in different fields or scenes
. At present, the labeled corpus is usually limited to some fields , It is difficult to apply to other corpora , for example : Training based on news corpus , Then the test was conducted in social corpus , The test results are often difficult to achieve the desired results , Because there are a lot of nonstandard words in social corpus .
* The cost of named entity identification and annotation is high
, At present, there are few labeled corpus for named entity recognition , How to learn a better model from less corpus , Or with the help of other similar task corpus and a large number of unlabeled corpus for learning , This brings new challenges to named entity recognition .
* Chinese named entity recognition “ word ” The boundary of is determined , however “ Word ” The boundary is fuzzy , Therefore, there are usually some semantic ambiguity
The situation , for example :“ It's amazing ” There are two word segmentation schemes for this sentence ,“ Give Way / People's Congress / Take a surprise ” and “ Let people / be startled at ”, The sentence meaning of the two word segmentation schemes is completely different . Chinese named entity recognition is usually combined with Chinese word segmentation , The combination of shallow grammar analysis and other processes , and
participle , The accuracy of parsing directly affects the effect of named entity recognition .
* There are a lot of unknown words in the text to be recognized , It is a new entity word , as time goes on , It is difficult for us to maintain these new words .
3 Existing research

Induction of named entity recognition methods based on statistical model

4 CRF( Conditional Random Fields, Conditional random field )

4.1 Introduction to conditional random fields

Comparison of four models

In a given observation sequence X Time , A specific sequence of tags Y The probability can be defined as

4.2 CRF Parameter estimation of

4.3 forecast

 

5  experiment

1998 People's daily

 

#sentence

#PER

#LOC

#ORG

train

46364

17615

36517

20571

test

4365

1973

2877

1331

Technology
©2019-2020 Toolsou All rights reserved,
AWS Elastic Compute Cloud (Amazon EC2) Study notes ControllerAdvice Common scenes of PTA-MOOC《Python Programming Zhejiang University 》 Puzzle A Chapter 2 programming problems Dart In Isolatejetson Nano debugging CSI Interface camera , Combined use opencv Turn on the camera Knowledge points of preparing for the Blue Bridge Cup find method Force button 496 The next bigger element IBert/Transformer Calculation of model parameters 2020 year Java industry , General analysis of market environment and position Lambda Expression and functional interface