首页  >  科研动态  >  正文
科研动态
博士生张琛,郭仁忠的论文在COMPUTERS ENVIRONMENT AND URBAN SYSTEMS刊出
发布时间:2022-06-24 10:27:55     发布者:易真     浏览次数:

标题: W-TextCNN: A TextCNN model with weighted word embeddings for Chinese address pattern classification

作者: Zhang, C (Zhang, Chen); Guo, RZ (Guo, Renzhong); Ma, XY (Ma, Xiangyuan); Kuai, X (Kuai, Xi); He, B (He, Biao)

来源出版物: COMPUTERS ENVIRONMENT AND URBAN SYSTEMS : 95 文献号: 101819 DOI: 10.1016/j.compenvurbsys.2022.101819 出版年: JUL 2022

摘要: Geocoding is crucial to support location-based services and has become a widely accessible technique in geographic information systems (GIS). In a geocoding system, addresses are one of the main geographical reference texts as input. Address patterns refer to the organizational rules of combining address components into an address. In China, intricate rules and backwards address planning make address patterns not systematic and difficult to recognize, which creates significant challenges for database construction and address standardization. Inspired by deep learning methods, this paper provides a convolutional neural network for text with weighted word embeddings (W-TextCNN) for Chinese address pattern classification. Specifically, we define eight address patterns to represent the structures of addresses considering the characteristics of address components. For calculating addresses in the neural network, word embeddings with a weighted strategy are implemented for transforming address texts into real-valued vectors. The vectors are fed into a convolutional neural network for text (TextCNN) to train for classifying address patterns automatically. Furthermore, we apply W-TextCNN in the address corpus after fine-tuning the hyperparameters and compare it with several methods commonly used in text classification. We also design two tasks address segmentation and address matching to explore the effect of address pattern classification. The accuracy and F1 score of the model on classification achieve 97.45% and 96%, respectively, and W-TextCNN outperforms TextCNN because of the employment of the weighted word embeddings. Additionally, the results reveal the positive impact of address pattern classification on improving segmentation precision and address quality. The proposed model is expected to expand the toolkit of computational address study with deep learning methods.

作者关键词: Address patterns; Address components; Address structure; Geocoding; Weighted word embeddings; Convolutional neural network

地址: [Zhang, Chen; Guo, Renzhong; Ma, Xiangyuan] Wuhan Univ, Sch Resource & Environm Sci, Wuhan 430079, Peoples R China.

[Guo, Renzhong; Kuai, Xi; He, Biao] Shenzhen Univ, Res Inst Smart Cities, Sch Architecture & Urban Planning, Shenzhen 518060, Peoples R China.

通讯作者地址: Guo, RZ (通讯作者)Wuhan Univ, Sch Resource & Environm Sci, Wuhan 430079, Peoples R China.

电子邮件地址: czhang0315@whu.edu.cn; guorz@szu.edu.cn; maxiangyuan@whu.edu.cn; kuaixi@szu.edu.cn

影响因子:5.324


信息服务
学院网站教师登录 学院办公电话 学校信息门户登录

版权所有 © 武汉大学资源与环境科学学院
地址:湖北省武汉市珞喻路129号 邮编:430079 
电话:027-68778381,68778284,68778296 传真:027-68778893    邮箱:sres@whu.edu.cn