Create for research project in Research Center for Humanities and Social Sciences. We use elementary school in Taiwan as our central location (about 354) and scrapy food menu from the restaurant located near by the school.
[
'|-- foodpanda_crawler',
' |-- README.md',
' |-- foodpanda.py',
' |-- requirements.txt',
' |-- inputCentral',
' | |-- school_most.csv',
' | |-- school_nearest.csv',
' |-- meau_Foodpanda',
' | |-- foodpandaMenu_{date}.csv',
' |-- shopLst',
' |-- {date}'
' |-- shopLst_{City}_{School}_{date}
' |-- shopLst_most_{date}.csv
'
'
]
We recommend to creat an environment for our crawler. Creat a environment name spider
, and assign python version as 3.8
key in the fellowing command in command line:
conda create --name spider python=3.8
install the package we need
pip3 install -r requirements.txt
Input for foodpanda.py
is a list of central location, this file should contain 4 columns.
- latitude:
- latitude
- longitude of central location
- longitude
- float
- longitude of central location
- City
- city name, like
台北市
- city name, like
- School
- central location name, it should be a unique string
- In our project it is the name of elementary school, like
市立天母國小
Our input is ininputCentral
,school_most.csv
execute foodpanda.py
bash foodpanda.py
It will scan all the central location list in school_most.csv
.
[
'|-- foodpanda_crawler',
' |-- inputCentral',
' | |-- school_most.csv',
]
foodpanda.py will create a folder name by date
for each central location, foodpanda.py will output a csv file in date
[
'|-- uber爬蟲',
' |-- README.md',
' |-- foodpanda.py',
' |-- requirements.txt',
' |-- inputCentral',
' | |-- school_most.csv',
' |-- shopLst',
' |-- {date}'
' |-- shopLst_{City}_{School}_{date}
' |-- shopLst_most_{date}.csv
]
including all the restaurant near by the central location.
the file name will be like shopLst_{City}_{School}_{date}
date
means the date we craw the data
School
is the school columns in school_most.csv
After that, we concat all the resturant list together and get shopLst_most_{date}.csv
the columns in shopLst_most_{date}.csv
- shopName
- string, resturant name
- shopCode
- 4 digit string, a unique value
- example
y0yc
- budget
- int
- category
- list like string, means resturant category
- example
['歐美', '炸雞']
- pandaOnly
- bool value
- minFee
- int
- minOrder
- int
- minDelTime
- int
- minPickTime
- int
- distance
- float
- rateNum
- int
- number of people rate resturant
Finally, we creat foodpandaMenu_{date}.csv
according to the restuarant list in meau_Foodpanda
called foodpandaMenu_{date}.csv
[
'|-- foodpanda_crawler',
' |-- README.md',
' |-- foodpanda.py',
' |-- requirements.txt',
' |-- inputCentral',
' | |-- school_most.csv',
' |-- meau_Foodpanda',
' | |-- foodpandaMenu_{date}.csv',
' |-- shopLst',
' |-- {date}'
' |-- shopLst_{City}_{School}_{date}
' |-- shopLst_most_{date}.csv
]