Automatically Updating Apartment Map |
August 19th, 2013 |
boston, housing, map, tech |
There are two scripts involved:
$ crontab -l ... # fetch the data at 2:02am on the 18th of the month; update the maps # at 2:02 on the 19th. 02 02 18 * * python /home/jefftk/query_padmapper.py 02 02 19 * * python /home/jefftk/draw_heatmaps.shThe first,
query_padmapper.py
,
pulls apartment data from PadMapper and saves it in a timestamped file
like apts-1376823721.txt
. Unless Eric changes something,
this just does its job. If something goes wrong, cron
sends
me an email with the error message.
(I really like this method of doing background tasks. Unless they're critical, don't try to recover from errors. Crash, print something informative, and have it show up in my email.)
The second script is a wrapper around draw_heatmap.py
.
It's safe to run any time because it looks for apartment data dumps
that haven't been processed yet, but I intentionally only run it when
I know it needs ot do something. It is:
set -e # exit on error WORKING_DIR="/home/jefftk/jtk/apartment_prices/" function compute_dates_available() { echo '<script type="text/javascript">' echo 'var dates_available = [' ls $WORKING_DIR/*.boston.bedroom*.png \ | awk -F. '{print $(NF-1)}' \ | sort \ | while read line ; do echo ' "'$line'",' done echo ']' echo '</script>' } function update_index() { INDEX=$WORKING_DIR/index.html cp $INDEX $INDEX.pre_$(date +%F) cat $INDEX \ | grep '<!-- begin list of date files -->' -B 10000 \ > $INDEX.pre cat $INDEX \ | grep '<!-- end list of date files -->' -A 10000 \ > $INDEX.post compute_dates_available > $INDEX.middle cat $INDEX.pre $INDEX.middle $INDEX.post > $INDEX } for x in $WORKING_DIR/apts-1*.txt ; do if [ ! -e $x.started ] ; then YYYYMMDD=$(date --date=@$(echo $x | awk -F/ '{print $NF}' | sed \ s/apts-// | sed s/.txt//) +%F) touch $x.started for style in room bedroom ; do python /home/jefftk/code/apartment_prices/draw_heatmap.py $x $style mv $x.$style.1000.png $WORKING_DIR/apts.boston.$style.$YYYYMMDD.png done update_index touch $x.finished fi doneWhat's this all doing? The
for
loop at the bottom considers
every apartment data file. It only considers the ones where it hasn't
started work, which it tracks by creating a file like
apts-1376823721.txt.started
. It extracts the timestamp from
the filename and converts it to a date like 2013-08-18
. Then
it runs the real code, draw_heatmap.py
, which produces an
output file like apts-1376823721.txt.room.1000.png
. It
renames that to the format the UI expects, then calls
update_index
.
The update_index
function changes a small piece of index.html
which has a list of which dates have data available:
... <!-- begin list of date files --> <script type="text/javascript"> var dates_available = [ "2011-06-16", "2013-01-29", "2013-02-18", "2013-03-18", "2013-04-18", "2013-05-18", "2013-06-18", "2013-07-18", "2013-08-18", ] </script> <!-- end list of date files --> ...It figures out the available dates with
compute_dates_available
, formats that into a javascript array,
then uses some grep
to replace the portion of the file between
the marker comments with the newly calculated dates.
So now the page stays up to date without me doing anything. Or else I wake up to an error in my email and figure out why.
Comment via: google plus, facebook