Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OSM-POI: Include brand property [DRAFT] #69

Draft
wants to merge 43 commits into
base: master
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from 5 commits
Commits
Show all changes
43 commits
Select commit Hold shift + click to select a range
86ef2c5
add name-suggestion-index download
IritaSee Jan 3, 2022
1c62f71
add repo checking
IritaSee Jan 3, 2022
b74d373
[undone] interating over json
IritaSee Jan 3, 2022
d6fb766
add brand_name_downloader
IritaSee Jan 5, 2022
ac14e90
add name downloader to main, add operator naming
IritaSee Jan 5, 2022
f5d401f
rename to more suitable function names
IritaSee Jan 6, 2022
600c3ef
add staticmethod for download_names
IritaSee Jan 6, 2022
017b99d
add staticmethod to download_names
IritaSee Jan 6, 2022
96ae427
fix variable names to add context
IritaSee Jan 6, 2022
69ac660
fix variable typo
IritaSee Jan 7, 2022
d2c2460
add debug venv folder to ignore
IritaSee Jan 7, 2022
f173ec5
change - to None as default value
IritaSee Jan 7, 2022
2ff61be
add operator:wikidata
IritaSee Jan 7, 2022
3c93476
add func to match brands and operators, then add to spark
IritaSee Jan 7, 2022
2f3896f
fix algorithm
IritaSee Jan 8, 2022
a572b30
remove unused code
IritaSee Jan 8, 2022
d2467a7
update fuzzywuzzy to thefuzz in osm-poi related
IritaSee Jan 8, 2022
e74f64b
fix nan processing
IritaSee Jan 8, 2022
1290708
fix: change search to brand and operator
IritaSee Jan 10, 2022
6230a6b
recreate matching function
IritaSee Jan 11, 2022
6519b76
apply withcolumn in main matching function
IritaSee Jan 11, 2022
e78b8f7
fix typo
IritaSee Jan 12, 2022
3e16563
remove is_operator
IritaSee Jan 13, 2022
027f2d6
join operator and brand name matching function
IritaSee Jan 13, 2022
73f719e
remove duplicate name/operator
IritaSee Jan 13, 2022
310e06c
rework function to simply match names and input
IritaSee Jan 13, 2022
a91e266
fix error
IritaSee Jan 13, 2022
32e767f
Merge branch 'master' into feature/include-brand-property
IritaSee Jan 15, 2022
848f984
add brand_matched operator_matched name_matched
IritaSee Jan 20, 2022
d101e2d
fix run_cli convert add extra cd
IritaSee Jan 22, 2022
724ac03
readjust dowloader to new temp folder
IritaSee Jan 22, 2022
c2bb65c
add default statement
IritaSee Jan 24, 2022
898945b
update temp dir
IritaSee Jan 24, 2022
f08ea40
add downloading message
IritaSee Jan 24, 2022
1c153fb
add empty as return
IritaSee Jan 27, 2022
1c4b9dc
fix missleadnig var name
IritaSee Jan 27, 2022
2489e94
revert irrelevant change to this branch
IritaSee Jan 27, 2022
c01806f
code cleanup
IritaSee Jan 27, 2022
a149f1b
delete reference repo, rename reference file
IritaSee Jan 27, 2022
f4e651b
add id sorting for reference file, change print to log
IritaSee Jan 27, 2022
fae5761
remove reference repo, ignore reference file
IritaSee Jan 27, 2022
70f7255
Merge branch 'master' into feature/include-brand-property
Feb 2, 2022
0661a62
Resolve formatting and linting errors; Remove name matching UDF from …
Feb 3, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
40 changes: 39 additions & 1 deletion kuwala/pipelines/osm-poi/src/Downloader.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,10 @@
import os
from python_utils.src.FileDownloader import download_file
from python_utils.src.FileSelector import select_osm_file

import urllib.request as req
import zipfile
import json
import pandas as pd

class Downloader:
@staticmethod
Expand All @@ -23,3 +26,38 @@ def start(args):
file_path += '.osm.pbf'

download_file(url=args.url or file['url'], path=file_path)

IritaSee marked this conversation as resolved.
Show resolved Hide resolved
def names_downloader():
IritaSee marked this conversation as resolved.
Show resolved Hide resolved
# here, instead of cloning the repository that recommended using extra library,
# we download the whole repo in zip, then extract it.
if not os.path.exists('../tmp/name-suggestion-index-main'):
download_link='https://github.com/osmlab/name-suggestion-index/archive/refs/heads/main.zip'
req.urlretrieve(download_link, "../tmp/main.zip")
with zipfile.ZipFile('../tmp/main.zip', 'r') as zip_ref:
zip_ref.extractall('../tmp/')
os.remove('../tmp/main.zip')

file_paths=['../tmp/name-suggestion-index-main/data/brands','../tmp/name-suggestion-index-main/data/operators']
data = {'id': [], 'display_name': [], 'wiki_data': []}
for file_path in file_paths:
for folders in os.listdir(file_path):
IritaSee marked this conversation as resolved.
Show resolved Hide resolved
if os.path.isdir(os.path.join(file_path,folders)):
for files in os.listdir(os.path.join(file_path,folders)):
IritaSee marked this conversation as resolved.
Show resolved Hide resolved
with open(os.path.join(file_path,folders,files)) as f:
file_content=json.load(f)
for a in file_content['items'] :
IritaSee marked this conversation as resolved.
Show resolved Hide resolved
wiki_data=id=display_name='-'
if ('id' in a.keys()):
id=(dict(a)['id'])
if ('displayName' in a.keys()):
display_name=(dict(a)['displayName'])
if ("tags" in a.keys()):
if ('brand:wikidata' in list(a['tags'].keys())):
wiki_data=(dict(a["tags"].items())['brand:wikidata'])

data['id'].append(id)
data['display_name'].append(display_name)
data['wiki_data'].append(wiki_data)

df=pd.DataFrame(data)
df.to_csv('../tmp/names.csv',index=False)
1 change: 1 addition & 0 deletions kuwala/pipelines/osm-poi/src/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,5 +25,6 @@

if action == 'download':
Downloader.start(args)
Downloader.names_downloader()
else:
Processor.start(args)