I will be building on the Python code we wrote in Part 2: Query the API to move our data into a database.
Note: If I were to do this project again, I’d break the functions up into smaller functions to make testing easier. A common recommendation is that each function should only do one thing. If you are describing the functionality and you have to use the word ‘and’, you should break the function up.
## Pandas .to_sql Method
We will utilize the Pandas module to move our data into MySQL with the to_sql method. This method will operate on our dataframe and we will pass in a few parameters. First is our table name that we will create. Second is our engine that was created using SQLAlchemy. Third we specify to not create a column as the index. Last, we specify that if our table already exists in the database, add our data to the table without touching the existing data.
'counts_daily', con=engine, index=False, if_exists='append') df.to_sql(
Let’s work on adding the to_sql method to our code that pulls the Bikeometer details.
First, let’s define a helper function that will create a SQLAlchemy engine object. Let’s call the function ‘create_new_engine’. All we have to do is pass in the name of the database as an argument (bikeometers_db).
Check out Part 3 to learn how I connect to my database.
Below I define a function that will connect to my database.
import os
from sqlalchemy import create_engine
# Workaround to get this document to Knit
# Created temp user name that will be deleted once the post is uploaded.
def create_new_engine(db_name):
return create_engine(f'mysql+mysqldb://{os.environ["arlington_user"]}:{os.environ["arlington_password"]}@localhost/bikeometers_db', echo=False)
I’ve taken the get_bikeometer_details function from Part 2 as well as the GET request steps and made them into a single function bikeometer_to_sql.
import requests
import re
import xml.etree.ElementTree as ET
import pandas as pd
def bikeometer_to_sql():
'''
Makes a GET request to the Bike Arlington API using the GetAllCounters
method as a parameter. The API returns a response object that is first
converted to a string, cleaned, and converted to an XML object. The XML
object is then parsed for the relevant information, and added to a list.
Each list, representing a single Bikeometer, is converted to a tuple and added
to a final list which is converted to a dataframe and uploaded to the database.
'''
# Calls our helper function to create an engine.
= create_new_engine('bikeometers_db')
engine = 'http://webservices.commuterpage.com/counters.cfc?wsdl'
url # Defines the method in a dictionary used to make the request
= {'method': 'GetAllCounters'}
counter_reqest_methods # Assigns response object to variable 'response'
= requests.get(url, params=counter_reqest_methods)
response # Save the content of that request (string) to memory
= response.text
string_data # Clean the string
= re.sub(r'[\n|\t]', '', string_data)
clean_string_data # Convert string to XML object
= ET.fromstring(clean_string_data)
root # Create the empty list that will include [(Name, counterID, Lat, Long, Region, region_id)(...)]
= []
bikeometer_details # Iterate through the children, grandchildren, and great-grandchildren and grab the data
for child in root:
# From child 'counter' gets the attribute 'id' of the counter and adds it to single_list
= [child.get('id')]
single_list # Loops through the grandchildren of root
for grandchild in list(child):
if grandchild.text != None and grandchild.tag != 'description' and grandchild.tag != 'trail_id' and grandchild.tag != 'trail_name':
# If the grandchild is region, loop through region and grab the grandchildren data
if grandchild.tag == 'region':
for great_grandchild in list(grandchild):
single_list.append(great_grandchild.text)# If the grandchild is not region, the data is available in grandchild.text
else:
single_list.append(grandchild.text)# Cast the list into a tuple making it easier to migrate data to the database
= tuple(single_list)
single_tuple # Appends tuples to the list
bikeometer_details.append(single_tuple)# Establishes the names of the columns
= ('bikeometer_id', 'name', 'latitude', 'longitude', 'region', 'region_id')
columns # Creates a pandas data frame
= pd.DataFrame(data=bikeometer_details, columns = columns)
df # Assigns 'types' to each column
"name", "latitude", "longitude", "region", 'region_id']] = df[["name", "latitude", "longitude", "region", 'region_id']].astype('str')
df[["bikeometer_id"]] = df[["bikeometer_id"]].astype('int')
df[[# Moves data to MySQL
'bikeometer_details', con=engine, index=False, if_exists='replace') df.to_sql(
Now we run the function.
bikeometer_to_sql()
Let’s check to see what our new table looks like by executing a MySQL statement.
I’m going to run the MySQL commands from R so first I’ll establish a connection to the database through R. You can also run the MySQL commands in the MysQL Workbench or the MySQL Command Line.
library(DBI)
<- dbConnect(odbc::odbc(), .connection_string = "Driver={MySQL ODBC 8.0 Unicode Driver};",
con server = "localhost", db = "bikeometers_db", user = Sys.getenv("arlington_user"), password = Sys.getenv("arlington_password"))
SELECT * FROM bikeometer_details
bikeometer_id | name | latitude | longitude | region | region_id |
---|---|---|---|---|---|
33 | 110 Trail | 38.885315 | -77.065022 | Arlington | 1 |
30 | 14th Street Bridge | 38.874260 | -77.044610 | Arlington | 1 |
43 | 15th Street NW | 38.907470 | -77.034610 | DC | 3 |
24 | Ballston Connector | 38.882950 | -77.121235 | Arlington | 1 |
59 | Bluemont Connector | 38.880440 | -77.119290 | Arlington | 1 |
56 | BWP Counter | 38.933140 | -76.938320 | Prince George’s County | 5 |
47 | Capital Crescent Trail #1 | 38.979500 | -77.096760 | Montgomery County | 4 |
48 | Capital Crescent Trail #2 | 38.943070 | -77.115660 | Montgomery County | 4 |
10 | CC Connector | 38.857702 | -77.047373 | Arlington | 1 |
20 | Clarendon EB bike lane | 38.888780 | -77.090421 | Arlington | 1 |
Now that we have our bikeometer_details table in our database, we won’t need to run this code again unless a new Bikeometer is added.
Next, we will request the daily bike counts for every day since a counter was installed and save it in our database.
Again, we will build on the code from Part 2.
In Part 2, we created a function called api_counts_to_list which pulled the bicycle counts using an arbitrary hard-coded start date, end date, and a single Bikeometer. However, now we need to pull the data for all Bikeometers and all avaiable dates.
To make our life easier, let’s select the oldest Bikeometer and use it’s start date as our start date for all Bikeometers. Our function is written so that if there is no data for a requested date, our function just skips those dates and moves on without error.
We will call the built-in Bike Arlington API method: ‘getMinDates’ on all the Bikeometer IDs in the below bikeometer_id_list.
= ['33','30','43','24','59','56','47','48','10','20',
bikeometer_id_list '35','57','18','6', '3','58','61','62','38','44','14',
'60','5','42','37','27','26','8','7','51', '52',
'45','22','21','36','34','41','9','39','16','15',
'54','55','31','28','11','2','25','19']
Now we will create a simple for loop to call the getMinDates method on all the Bikeometers.
# Assign the url of the
= 'http://webservices.commuterpage.com/counters.cfc?wsdl'
url
# Defines the method in a dictionary used to make the request
for bikeometer_id in bikeometer_id_list:
= {'method': 'getMinDates', 'counterID': bikeometer_id}
counter_reqest_methods # Save the GetAllCounters request to memory
= requests.get(url, params=counter_reqest_methods)
response response.text
## "<wddxPacket version='1.0'><header/><data><struct><var name='STARTDATE'><string>07/22/2015</string></var></struct></data></wddxPacket>"
## "<wddxPacket version='1.0'><header/><data><struct><var name='STARTDATE'><string>11/10/2013</string></var></struct></data></wddxPacket>"
## "<wddxPacket version='1.0'><header/><data><struct><var name='STARTDATE'><string>10/22/2014</string></var></struct></data></wddxPacket>"
## "<wddxPacket version='1.0'><header/><data><struct><var name='STARTDATE'><string>12/06/2012</string></var></struct></data></wddxPacket>"
## "<wddxPacket version='1.0'><header/><data><struct><var name='STARTDATE'><string>01/01/2019</string></var></struct></data></wddxPacket>"
## "<wddxPacket version='1.0'><header/><data><struct><var name='STARTDATE'><string>03/19/2017</string></var></struct></data></wddxPacket>"
## "<wddxPacket version='1.0'><header/><data><struct><var name='STARTDATE'><string>04/25/2016</string></var></struct></data></wddxPacket>"
## "<wddxPacket version='1.0'><header/><data><struct><var name='STARTDATE'><string>04/25/2016</string></var></struct></data></wddxPacket>"
## "<wddxPacket version='1.0'><header/><data><struct><var name='STARTDATE'><string>09/02/2011</string></var></struct></data></wddxPacket>"
## "<wddxPacket version='1.0'><header/><data><struct><var name='STARTDATE'><string>11/27/2012</string></var></struct></data></wddxPacket>"
## "<wddxPacket version='1.0'><header/><data><struct><var name='STARTDATE'><string>10/11/2015</string></var></struct></data></wddxPacket>"
## "<wddxPacket version='1.0'><header/><data><struct><var name='STARTDATE'><string>07/05/2017</string></var></struct></data></wddxPacket>"
## "<wddxPacket version='1.0'><header/><data><struct><var name='STARTDATE'><string>11/28/2012</string></var></struct></data></wddxPacket>"
## "<wddxPacket version='1.0'><header/><data><struct><var name='STARTDATE'><string>03/06/2011</string></var></struct></data></wddxPacket>"
## "<wddxPacket version='1.0'><header/><data><struct><var name='STARTDATE'><string>12/08/2010</string></var></struct></data></wddxPacket>"
## "<wddxPacket version='1.0'><header/><data><struct><var name='STARTDATE'><string>04/16/2020</string></var></struct></data></wddxPacket>"
## "<wddxPacket version='1.0'><header/><data><struct><var name='STARTDATE'><string>01/01/2019</string></var></struct></data></wddxPacket>"
## "<wddxPacket version='1.0'><header/><data><struct><var name='STARTDATE'><string>01/01/2019</string></var></struct></data></wddxPacket>"
## "<wddxPacket version='1.0'><header/><data><struct><var name='STARTDATE'><string>11/18/2015</string></var></struct></data></wddxPacket>"
## "<wddxPacket version='1.0'><header/><data><struct><var name='STARTDATE'><string>10/22/2014</string></var></struct></data></wddxPacket>"
## "<wddxPacket version='1.0'><header/><data><struct><var name='STARTDATE'><string>12/04/2012</string></var></struct></data></wddxPacket>"
## "<wddxPacket version='1.0'><header/><data><struct><var name='STARTDATE'><string>01/01/2019</string></var></struct></data></wddxPacket>"
## "<wddxPacket version='1.0'><header/><data><struct><var name='STARTDATE'><string>04/01/2010</string></var></struct></data></wddxPacket>"
## "<wddxPacket version='1.0'><header/><data><struct><var name='STARTDATE'><string>03/30/2016</string></var></struct></data></wddxPacket>"
## "<wddxPacket version='1.0'><header/><data><struct><var name='STARTDATE'><string>11/18/2015</string></var></struct></data></wddxPacket>"
## "<wddxPacket version='1.0'><header/><data><struct><var name='STARTDATE'><string>10/17/2013</string></var></struct></data></wddxPacket>"
## "<wddxPacket version='1.0'><header/><data><struct><var name='STARTDATE'><string>06/19/2013</string></var></struct></data></wddxPacket>"
## "<wddxPacket version='1.0'><header/><data><struct><var name='STARTDATE'><string>08/12/2011</string></var></struct></data></wddxPacket>"
## "<wddxPacket version='1.0'><header/><data><struct><var name='STARTDATE'><string>08/11/2011</string></var></struct></data></wddxPacket>"
## "<wddxPacket version='1.0'><header/><data><struct><var name='STARTDATE'><string>06/03/2017</string></var></struct></data></wddxPacket>"
## "<wddxPacket version='1.0'><header/><data><struct><var name='STARTDATE'><string>06/01/2021</string></var></struct></data></wddxPacket>"
## "<wddxPacket version='1.0'><header/><data><struct><var name='STARTDATE'><string>10/22/2014</string></var></struct></data></wddxPacket>"
## "<wddxPacket version='1.0'><header/><data><struct><var name='STARTDATE'><string>11/21/2012</string></var></struct></data></wddxPacket>"
## "<wddxPacket version='1.0'><header/><data><struct><var name='STARTDATE'><string>03/31/2013</string></var></struct></data></wddxPacket>"
## "<wddxPacket version='1.0'><header/><data><struct><var name='STARTDATE'><string>12/04/2015</string></var></struct></data></wddxPacket>"
## "<wddxPacket version='1.0'><header/><data><struct><var name='STARTDATE'><string>05/15/2015</string></var></struct></data></wddxPacket>"
## "<wddxPacket version='1.0'><header/><data><struct><var name='STARTDATE'><string>12/04/2015</string></var></struct></data></wddxPacket>"
## "<wddxPacket version='1.0'><header/><data><struct><var name='STARTDATE'><string>09/01/2011</string></var></struct></data></wddxPacket>"
## "<wddxPacket version='1.0'><header/><data><struct><var name='STARTDATE'><string>12/04/2015</string></var></struct></data></wddxPacket>"
## "<wddxPacket version='1.0'><header/><data><struct><var name='STARTDATE'><string>12/01/2012</string></var></struct></data></wddxPacket>"
## "<wddxPacket version='1.0'><header/><data><struct><var name='STARTDATE'><string>11/21/2012</string></var></struct></data></wddxPacket>"
## "<wddxPacket version='1.0'><header/><data><struct><var name='STARTDATE'><string>06/03/2017</string></var></struct></data></wddxPacket>"
## "<wddxPacket version='1.0'><header/><data><struct><var name='STARTDATE'><string>06/03/2017</string></var></struct></data></wddxPacket>"
## "<wddxPacket version='1.0'><header/><data><struct><var name='STARTDATE'><string>11/10/2013</string></var></struct></data></wddxPacket>"
## "<wddxPacket version='1.0'><header/><data><struct><var name='STARTDATE'><string>04/01/2014</string></var></struct></data></wddxPacket>"
## "<wddxPacket version='1.0'><header/><data><struct><var name='STARTDATE'><string>10/14/2011</string></var></struct></data></wddxPacket>"
## "<wddxPacket version='1.0'><header/><data><struct><var name='STARTDATE'><string>02/01/2011</string></var></struct></data></wddxPacket>"
## "<wddxPacket version='1.0'><header/><data><struct><var name='STARTDATE'><string>12/10/2012</string></var></struct></data></wddxPacket>"
## "<wddxPacket version='1.0'><header/><data><struct><var name='STARTDATE'><string>11/21/2012</string></var></struct></data></wddxPacket>"
Skimming through the returned responses, we can see that the first date was 4/1/2010. This will be our first start date in our function.
Ultimately, we will have two functions when we are finished: one that pulls all the dates starting from 4/1/2010 (all_counts_by_date_to_sql) and one that pulls only the new dates.(new_counts_by_date_to_sql)
To help us out, we will create a helper function based on the api_counts_to_list function we made in Part 2.
The goal is to have the main functions all_counts_by_date_to_sql and new_counts_by_date_to_sql pass in the bikeometer_id, start date and end date to our helper function api_counts_to_list. If you have no data in your database, you would call the all_counts_by_date_to_sql function to make requests starting on 4/1/2010. If you already have data in your MySQL database, the new_counts_by_date_to_sql function will query your database and only request data for dates not in your database.
Note: As I write this up, I’m realizing you could write one function that would query your MySQL database and depending upon the response, it could call the appropriate function using an if/else statement, but, as they say, that’s not MVP.
Here is the helper function that will request data from the Bike Arlington API using the ‘GetCountInDateRange’ method. This function was modified from Part 2 to accept arguments for a bikeometer_id, start_date, end_date, and count_in_date_range_list (a list to hold all the count data). We will hard-code the mode=‘B’ to request bike data not pedestrian, interval=‘d’ to request counts by day, start_time and end_time to get data for the entire day, and direction=’’ to pull both the inbound and outbound directions that the Bikeometers can sense.
Note: The Bike Arlington documentation does not mention this, but some Bikeometers can’t distinguish between inbound or outbound and their counts instead use the label ‘A’. By leaving the direction field blank, our function will include all ‘A’ directions too.
We will be using a few functions from the datetime module so we start by importing them.
from datetime import date, datetime, timedelta
def api_counts_to_list(bikeometer_id,
start_date,
end_date,
count_in_date_range_list,='B',
mode='d',
interval='0:00',
start_time= '23:59',
end_time='') -> list:
direction= {'method': 'GetCountInDateRange',
request_parameters 'counterID': str(bikeometer_id),
'startDate': start_date,
'endDate': end_date,
'mode': str(mode),
'interval': str(interval),
'direction': direction}
= requests.get(url, params=request_parameters)
response #TODO: Add except for non-200 response
# Convert response to a string
= response.text
xml_data # Clean the data
= re.sub(r'[\n|\t]', '', xml_data)
clean_xml_data # Cast the string to an XML element
= ET.fromstring(clean_xml_data)
tree # Create the empty dictionary to put the data in
#CountInDateRangeDict = {}
# Parse the GetCountInDateRange XML file for the count and date and save CountInDateRangeDict
for type_tag in tree.findall('count'):
= type_tag.get('count')
count = type_tag.get('date')
date # Converts counter date to a date object
= datetime.strptime(date, '%m/%d/%Y').date()
date = date.year
year = date.month
month = date.day
day = f'{month}_{day}'
month_day = type_tag.get('direction')
direction if date.weekday() <= 4:
= 0
is_weekend else:
= 1
is_weekend = (bikeometer_id, date, direction, count, is_weekend, year, month, day, month_day)
single_tuple
count_in_date_range_list.append(single_tuple)return count_in_date_range_list
Now that we have our helper function complete, let’s create the all_counts_by_date_to_sql which will pull all the data starting on 4/1/2010.
When designing our function, it’s important to remember that the Bike Arlington API will only allow requests of 1 year or less. There are many ways to get the data into your database. I chose to pull all the necessary data before saving it all in my database in one chunk. You may choose to incrementally pull and save the data 1 year at a time. Both ways have their pros and cons and it’s up to you to decide and justify your decision.
def all_counts_by_date_to_sql():
# Creates the parameters to drive the connection
= create_new_engine('bikeometers_db')
engine # Tests the connection and throws an error if not established
connect()
engine.# Bike Arlington API only accepts query ranges of 1 year or less
= date(year=2010, month=4, day=1)
start_date = date(year=2011, month=4, day=1)
end_date # Create a list to hold count data
= []
count_in_date_range_list # We will pull data up to yesterday's date
while end_date <= date.today() - timedelta(days=1):
# Hard-coded the bikeometer_id list to not stress the server making unnecessary requests
= ['33','30','43','24','59','56','47','48','10','20',
bikeometer_id_list '35','57','18','6', '3','58','61','62','38','44','14',
'60','5','42','37','27','26','8','7','51', '52',
'45','22','21','36','34','41','9','39','16','15',
'54','55','31','28','11','2','25','19']
for bikeometer_id in bikeometer_id_list:
# Dates YYYY-MM-DD format converted to MM-DD-YYY and made into a string to search Counter API
= f'{start_date.month}/{start_date.day}/{start_date.year}'
clean_start_date = f'{end_date.month}/{end_date.day}/{end_date.year}'
clean_end_date # Calls the get_count_in_date_range function and passes in hard-coded
# parameters like the start_date, which is the first datapoint in
# Bike Arlington server. Optimally, these would not be hardcoded to
# make the porgram more future-proof.
\
api_counts_to_list(bikeometer_id, clean_start_date, clean_end_date, ='d')
count_in_date_range_list, interval# If end date is yesterday's date, congratulations, you pulled all available dates
# We won't pull today's date in case the data has not been uploaded to their servers yet
if end_date == date.today() - timedelta(days=1):
break
# Adds a year to start date and end date
= start_date.replace(year = start_date.year + 1)
start_date = end_date.replace(year = end_date.year + 1)
end_date # If end date is after yesterday's date, set the end date to yesterday and pull the data one last time
if end_date >= date.today():
= date.today() - timedelta(days=1)
end_date = ('bikeometer_id', 'date', 'direction', 'count', 'is_weekend', 'year', 'month', 'day', 'month_day')
columns = pd.DataFrame(count_in_date_range_list, columns=columns)
df # Makes bikeometer_id coulmn type int
"bikeometer_id"]] = df[["bikeometer_id"]].astype('int')
df[[# Replaces table counts, use if_exists='append' to insert new values into the table
'counts_daily', con=engine, index=False, if_exists='replace')
df.to_sql(# Queries your and returns a message to the user indicating the last day in their server
with engine.connect() as con:
= con.execute('SELECT MAX(Date) FROM counts_daily')
date_list for day in date_list:
= day[0]
last_day print(f'The newest date in counts_daily is {last_day}')
engine.dispose()
If you look at the last bit of code in the above function, after the data is saved to my database I make a query to my database to return the most recent date. If the date returned is yesterday’s date, it’s a good sign that the data was uploaded successfully.
Now that the function is defined, let’s run it.
all_counts_by_date_to_sql()
Let’s take a look at our table!
SELECT * FROM counts_daily GROUP BY bikeometer_id ORDER BY bikeometer_id asc;
bikeometer_id | date | direction | count | is_weekend | year | month | day | month_day |
---|---|---|---|---|---|---|---|---|
2 | 2011-02-01 | I | 6 | 0 | 2011 | 2 | 1 | 2_1 |
3 | 2010-12-08 | I | 16 | 0 | 2010 | 12 | 8 | 12_8 |
5 | 2010-04-01 | I | 344 | 0 | 2010 | 4 | 1 | 4_1 |
7 | 2011-08-11 | I | 527 | 0 | 2011 | 8 | 11 | 8_11 |
8 | 2011-08-12 | I | 447 | 0 | 2011 | 8 | 12 | 8_12 |
9 | 2011-09-01 | I | 1243 | 0 | 2011 | 9 | 1 | 9_1 |
10 | 2011-09-02 | I | 343 | 0 | 2011 | 9 | 2 | 9_2 |
11 | 2011-10-14 | I | 155 | 0 | 2011 | 10 | 14 | 10_14 |
14 | 2012-12-04 | A | 73 | 0 | 2012 | 12 | 4 | 12_4 |
15 | 2012-11-21 | A | 19 | 0 | 2012 | 11 | 21 | 11_21 |
Now that we have our all_counts_by_date_to_sql function, we need our new_counts_by_date_to_sql function which will only pull the new dates.
To only pull the new dates, let’s make a helper function that will return the most recent date in our MySQL database.
def last_sql_date_counts_daily(engine):
''' Returns the last date in the MySQL database as a datetime object'''
with engine.connect() as con:
= con.execute('SELECT MAX(Date) FROM counts_daily')
last_date for day in last_date:
= day[0]
date return date
With our helper function defined, I’ll use the previous function as a template and just change how the start date and end date variables are first defined.
I also added two checks in the beginning. The first is if the last_sql_date_counts_daily returns None, it most likely means there is no data in the database and the function stops. The second check is if yesterday’s date is already in the database, the function returns ‘No new data available’ to the user without making any API requests. Everything else should be the same.
def new_counts_by_day_to_sql():
= create_new_engine('counts')
engine # Establish connection to MySQL database before making all the API calls.
#TODO: If no data is in the MySQL database, custom except error
# This prevents the program from pulling today's date
= last_sql_date_counts_daily(engine) + timedelta(days=1)
start_date if start_date == None:
print('No data found in MySQL database.')
print('Please use all_counts_to_sql() function')
return None
if start_date >= date.today() - timedelta(days=1):
#TODO: throw an exception error instead of print
print('No new data avaiable')
return None
= start_date.replace(year = start_date.year + 1)
end_date if end_date >= date.today():
= date.today() - timedelta(days=1)
end_date ## Test db Connection##
# Creates the parameters to drive the connection
# Bike Arlington API only accepts query ranges of 1 year or less
= []
count_in_date_range_list while end_date <= date.today() - timedelta(days=1):
# Hardcoded the counterID list to not stress the server
= ['33','30','43','24','59','56','47','48','10','20',
bikeometer_id_list '35','57','18','6', '3','58','61','62','38','44','14',
'60','5','42','37','27','26','8','7','51', '52',
'45','22','21','36','34','41','9','39','16','15',
'54','55','31','28','11','2','25','19']
for bikeometer_id in bikeometer_id_list:
# Dates YYYY-MM-DD format converted to MM-DD-YYY and made into a string to search Counter API
= f'{start_date.month}/{start_date.day}/{start_date.year}'
clean_start_date = f'{end_date.month}/{end_date.day}/{end_date.year}'
clean_end_date # Calls the get_count_in_date_range function and passes in hard-coded
# parameters like the start_date, which is the first datapoint in
# Bike Arlington server. Optimally, these would not be hardcoded
# make the porgram more future-proof.
='d')
api_counts_to_list(bikeometer_id, clean_start_date, clean_end_date, count_in_date_range_list, interval# If end date is yesterday's date, congratulations, you pulled all available dates
# We won't pull today's date in case the data has not been uploaded to their servers yet
if end_date == date.today() - timedelta(days=1):
break
# Adds a year to start date and end date
= start_date.replace(year = start_date.year + 1)
start_date = end_date.replace(year = end_date.year + 1)
end_date # If end date is after yesterday's date, set the end date to yesterdayand pull the data one last time
if end_date >= date.today():
= date.today() - timedelta(days=1)
end_date = ('bikeometer_id', 'date', 'direction', 'count', 'is_weekend', 'year', 'month', 'day', 'month_day')
columns = pd.DataFrame(count_in_date_range_list, columns=columns)
df "bikeometer_id"]] = df[["bikeometer_id"]].astype('int')
df[[# Replaces table counts, use if_exists='append' to insert new values into the table
'counts_daily', con=engine, index=False, if_exists='append')
df.to_sql(# Reads the table
#df2 = pd.read_sql('counters', con=engine)
with engine.connect() as con:
= con.execute('SELECT MAX(Date) FROM counts_daily')
date_list for day in date_list:
= day[0]
last_day print(f'The newest date in counts_daily is {last_day}')
# Do we need engine.dispose?
engine.dispose()
new_counts_by_day_to_sql()
## No new data avaiable
I ran the previous function today so this function returns ‘No new data available’. Try it for yourself tomorrow and compare the time it takes to pull 1 day’s worth of data compared to over 10 years worth of data.
In the next post I’ll start creating visualizations of the data.