Code Monkey home page Code Monkey logo

geocoding's Introduction

Welcome to bitlap 👋

License: Apache License 2.0 Project Status CI Codecov

Demo

username:root, no password

Author

👤 IceMimosa

👤 jxnu-liguobin

Code Contributors

This project exists thanks to all the people who contribute. [Contribute].

🤝 Contributing

Contributions, issues and feature requests are welcome!
Feel free to check issues page. You can also take a look at the contributing guide.

Show your support

Give a ⭐️ if this project helped you!

📝 License

Copyright © 2023 bitlap.
This project is Apache License 2.0 licensed.


This README was generated with ❤️ by readme-md-generator

geocoding's People

Contributors

anqihui avatar blvyoucan avatar cheese8 avatar dependabot-preview[bot] avatar icemimosa avatar jxnu-liguobin avatar overcat avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

geocoding's Issues

解析地址时buildingNum出现问题,怎样修改,谢谢大神

Address(
provinceId=110000000000, province=北京,
cityId=110100000000, city=北京市,
districtId=110102000000, district=西城区,
streetId=null, street=null,
townId=null, town=null,
villageId=null, village=null,
road=新康街,
roadNum=2号院,
buildingNum=null,
text=1号楼北侧楼房
)

无法精确到五级

大佬好,按照说明方法导入了五级地址库至mysql中,重新生成了dat文件,发现地址标准返回无法精确五级,这个怎么处理?

省/直辖市
市/州
县/区
乡/镇
村/社区

使用国标库 标准化没办法到5级

比如这个地址:东莞市莞城街道罗沙社区东兴路(道路)东兴门诊部1层.
图片

结果:

  Address(
	  provinceId=440000000000, province=广东省, 
	  cityId=441900000000, city=东莞市, 
	  districtId=441900006000, district=莞城街道, 
	  streetId=null, street=null, 
	  townId=null, town=null, 
	  villageId=null, village=null, 
	  road=罗沙社区东兴路, 
	  roadNum=, 
	  buildingNum=null, 
	  text=东兴门诊部1层道路
  )

自定义地址时可将错误的地址关联到正确的上面去么?

感觉自定义地址是在字典里面新增地址的,而不是用于将错误地址改正后解析的? @IceMimosa

举个例子:

Geocoding.addRegionEntry(510000000000L, 100000000000L, "四州省", RegionType.Province, "四川")
Geocoding.normalizing("四州省广安市广安区")

能够将地址正确的解析为:四川省广安市广安区么?

相似度为0

Addr1:江苏省南京市建邺区庐山路98-1号
Addr2:江苏省南京市庐山路98-1号

But I got the result 0.0 ?

不知道是我倒腾代码搞错?还是本来的bug?

如何由唯一城镇来定位

QQ截图20220106155405

灵山镇海榆大道4号绿地城.润园11#楼2单元203

只取一个省的信息, 进行匹配, 最后匹配出来了街道和区的id, 但没有继续匹配上省, 市, 该怎样修改一下代码呢?

“天津市静海区“ 静海区被解析成县

“天津市静海区大丰堆镇齐小王村村委会东100米“
这个地址会被解析成
provinceId=120000000000, province=天津,
cityId=120100000000, city=天津市,
districtId=120223000000, district=静海县,
streetId=120223113000, street=大丰堆镇,
townId=120223113000, town=大丰堆镇,
villageId=null, village=null,
road=null,
roadNum=null,
buildingNum=null,
text=齐小王村村委会东100米

通过自定义数据增加“静海区”还是不能解决。
是地址库没更新的问题吗?
这个是要通过修改地址库修改吗?

编写一个基于国家地址库生成dat文件的工具类

实现思路

工具类输入

1. 地址数据网址

比如:http://www.stats.gov.cn/sj/tjbz/tjyqhdmhcxhfdm/2023/,或者类参数固定死2022、2023等等的输入。
如果有接口直接调用更好,没有的话可以用jsoup对页面进行爬虫

2. 地址层级

由于层级越深,生成的最终文件肯定越大。所以需要限制下地址的层级,比如1:省,2:市,3:区,4:街道/镇,5:居委会

3. 文件格式

json/pb...

RegionEntity的children属性未初始化,下级行政区划添加失败

对于一个空的dat字典文件,GeocodingX.addRegionEntry时,未初始化RegionEntity的children属性,导致下级的行政区划未能成功添加。DefaultRegionCache中的如下代码,最后一行在children未初始化(null)时,父RegionEntity不会添加子RegionEntity

override fun addRegionEntity(entity: RegionEntity) {
    this.loadChildrenInCache(entity)
    this.REGION_CACHE[entity.id] = entity
    this.REGION_CACHE[entity.parentId]?.children?.add(entity)
}

关于地址后期出现高级信息对标准化的影响

去除后期出现的更高级的信息. 会大幅提升相似度, 作者大大能优化一些这种情况吗?

String t1 = "海南省海口市灵山镇海榆大道4号绿地城.润园海口市灵山西片去旧改项目A-32地块11#楼(栋)2(单元)2(层)203(号)";
String t2 = "海南省海口市灵山镇海榆大道4号绿地城.润园11#楼2单元203";

结果:

海南省海口市灵山镇海榆大道4号绿地城.润园海口市灵山西片去旧改项目A-32地块11#楼(栋)2(单元)2(层)203(号)
addr1 >>>> Address(
	provinceId=460000000000, province=海南省, 
	cityId=460100000000, city=海口市, 
	districtId=460108000000, district=美兰区, 
	streetId=460108101000, street=灵山镇, 
	townId=460108101000, town=灵山镇, 
	villageId=null, village=null, 
	road=null, 
	roadNum=null, 
	buildingNum=A-32, 
	text=西片去旧改项目地块11#楼22203栋单元层号
)
>>>>>>>>>>>>>>>>>
海南省海口市灵山镇海榆大道4号绿地城.润园11#楼2单元203
addr2 >>>> Address(
	provinceId=460000000000, province=海南省, 
	cityId=460100000000, city=海口市, 
	districtId=460108000000, district=美兰区, 
	streetId=460108101000, street=灵山镇, 
	townId=460108101000, town=灵山镇, 
	villageId=null, village=null, 
	road=海榆大道, 
	roadNum=4号, 
	buildingNum=11#楼2单元203, 
	text=绿地城润园
)
加载扩展词典:dic/region.dic
加载扩展词典:dic/community.dic
加载扩展停止词典:dic/stop.dic
相似度结果分析 >>>>>>>>> MatchedResult(
	doc1=Document(terms=[Term(灵山镇), Term(A), Term(32), Term(西片), Term(去), Term(旧), Term(改), Term(项目), Term(地块), Term(11#), Term(楼), Term(22203), Term(栋), Term(单元), Term(层), Term(号)], town=Term(灵山镇), village=null, road=null, roadNum=null, roadNumValue=0), 
	doc2=Document(terms=[Term(灵山镇), Term(海榆大道), Term(4号), Term(11), Term(2), Term(203), Term(绿地城), Term(润园)], town=Term(灵山镇), village=null, road=Term(海榆大道), roadNum=Term(4号), roadNumValue=4), 
	terms=[io.patamon.geocoding.similarity.MatchedTerm@2cfb4a64], 
	similarity=0.4886777774252209
)

去除第二个海口市

String t1 = "海南省海口市灵山镇海榆大道4号绿地城.润园灵山西片去旧改项目A-32地块11#楼(栋)2(单元)2(层)203(号)";
String t2 = "海南省海口市灵山镇海榆大道4号绿地城.润园11#楼2单元203";

结果

海南省海口市灵山镇海榆大道4号绿地城.润园灵山西片去旧改项目A-32地块11#楼(栋)2(单元)2(层)203(号)
addr1 >>>> Address(
	provinceId=460000000000, province=海南省, 
	cityId=460100000000, city=海口市, 
	districtId=460108000000, district=美兰区, 
	streetId=460108101000, street=灵山镇, 
	townId=460108101000, town=灵山镇, 
	villageId=null, village=null, 
	road=海榆大道, 
	roadNum=4号, 
	buildingNum=A-32, 
	text=绿地城润园灵山西片去旧改项目地块11#楼22203栋单元层号
)
>>>>>>>>>>>>>>>>>
海南省海口市灵山镇海榆大道4号绿地城.润园11#楼2单元203
addr2 >>>> Address(
	provinceId=460000000000, province=海南省, 
	cityId=460100000000, city=海口市, 
	districtId=460108000000, district=美兰区, 
	streetId=460108101000, street=灵山镇, 
	townId=460108101000, town=灵山镇, 
	villageId=null, village=null, 
	road=海榆大道, 
	roadNum=4号, 
	buildingNum=11#楼2单元203, 
	text=绿地城润园
)
加载扩展词典:dic/region.dic
加载扩展词典:dic/community.dic
加载扩展停止词典:dic/stop.dic
相似度结果分析 >>>>>>>>> MatchedResult(
	doc1=Document(terms=[Term(灵山镇), Term(海榆大道), Term(4号), Term(A), Term(32), Term(绿地城), Term(润园), Term(灵山), Term(西片), Term(去), Term(旧), Term(改), Term(项目), Term(地块), Term(11#), Term(楼), Term(22203), Term(栋), Term(单元), Term(层), Term(号)], town=Term(灵山镇), village=null, road=Term(海榆大道), roadNum=Term(4号), roadNumValue=4), 
	doc2=Document(terms=[Term(灵山镇), Term(海榆大道), Term(4号), Term(11), Term(2), Term(203), Term(绿地城), Term(润园)], town=Term(灵山镇), village=null, road=Term(海榆大道), roadNum=Term(4号), roadNumValue=4), 
	terms=[io.patamon.geocoding.similarity.MatchedTerm@4b6995df, io.patamon.geocoding.similarity.MatchedTerm@2fc14f68, io.patamon.geocoding.similarity.MatchedTerm@591f989e, io.patamon.geocoding.similarity.MatchedTerm@66048bfd, io.patamon.geocoding.similarity.MatchedTerm@61443d8f], 
	similarity=0.7152705001057788
)

多个匹配结果返回的问题

比如输入“南山区”,会有两个匹配,一个是黑龙江省的,一个是广东省的,但是目前只会返回第一个;
还有如果只输入一个镇,返回的只有null,这里该怎么改呢?

removeRedundancy时误删有用的POI字符串

你好,
感谢开源这么有用的工具。
Geocoding.normalizing 这个API,在匹配完四级行政区之后,为了处理省市区重复书写的情况,removeRedundancy() 函数会继续移除能够解析到的省市区/县 乡镇/街道 及其之前的字符串,方便专心处理POI字符串。 但当POI字符串中出现了正常的地名字符串后(如 浙江省杭州市西湖区**建设银河西湖支行),removeRedundancy() 函数会错误的将 POI中的信息删除,只剩下“支行”。

举个栗子:

print(Geocoding.normalizing("浙江省杭州市西湖区**建设银河西湖支行"))

[Out]
Address(
	provinceId=330000000000, province=浙江省, 
	cityId=330100000000, city=杭州市, 
	districtId=330106000000, district=西湖区, 
	streetId=null, street=null, 
	townId=null, town=null, 
	villageId=null, village=null, 
	road=null, 
	roadNum=null, 
	buildingNum=null, 
	text=支行
)

为什么打包成.exe就运行报错了,怎么解决呢

代码:
from GeocodingCHN import Geocoding

geocoding = Geocoding()

text = '山东青岛李沧区延川路116号绿城城园东区7号楼2单元802户'

address_nor = geocoding.normalizing(text)

print(address_nor)

错误:
Traceback (most recent call last):
File "main.py", line 3, in
File "GeocodingCHN\Geocoding.py", line 61, in init
File "jpype_jclass.py", line 99, in new
TypeError: Class org.bitlap.geocoding.GeocodingX is not found

region.dat 信息比较旧

大佬好,在解析 “重庆市开州区南门镇” 时,发现目前的 region.dat 的地址信息比较老,没有 开州区 这个区。
想问一下大佬 region.dat 这个文件是我们自己来维护吗,还是互联网上就能获取到呢?如果从互联网能获取的话能麻烦发一下链接吗?谢谢

分词方法segment解析【郫都区】问题

输入:四川省成都市郫都区西源大道1311号3栋4单元1楼102号
segment方法,seg_type = 'ik',
分词结果list为:['四川省', '成都市', '郫', '都', '西源大道', '1311号', '3栋', '4', '单元', '1楼', '102号']
期望结果list为:['四川省', '成都市', '郫都区', '西源大道', '1311号', '3栋', '4', '单元', '1楼', '102号']
请问有啥办法修正结果吗?感谢!

下载依赖失败

通过readme 下载github的repo依赖失败:

Failed to execute goal on project customer-experience-data-factory: Could not resolve dependencies for project com.treeyee.cloud:customer-experience-data-factory:jar:0.0.1-SNAPSHOT: io.patamon.geocoding:geocoding:jar:1.1.6 was not found in https://raw.github.com/icemimosa/maven/release/ during a previous attempt. This failure was cached in the local repository and resolution is not reattempted until the update interval of patamon.release.repository has elapsed or updates are forced -> [Help 1]

克隆项目本地编译生成jar包,将jar添加到项目也运行失败(本地项目是java项目):
java.lang.NoClassDefFoundError: kotlin/jvm/internal/Intrinsics
at io.patamon.geocoding.Geocoding.similarity(Geocoding.kt)

老哥你知道这是啥原因吗

normalizing: 标准化将数字没了

使用 normalizing: 标准化方法,输入地址:北京市海淀区西北旺东路10号院东区323102,发现返回数字323102 没有了
麻烦帮忙看看

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.