The Official Bike Arlington API is accessed through this URL with multiple endpoints/methods to get everything from the number of bikers or pedestrians passing a Bikeometer that day, longitude/latitude of each Bikeometer, to the weather that day.
We can read here about the different methods available when making requests to the Bike Arlington API.
First, let’s get some details on the Bikeometers in the database. There is a ‘Bike Arlington’ API method GetAllCounters that will return the details of the physical Bikeometers. I’ll use the Python ‘requests’ library to make a ‘GET’ request to the base Bike Arlington API URL and pass in the ‘GetAllCounters’ method as a parameter.
import requests
# Assign the url to the variable 'url'
= 'http://webservices.commuterpage.com/counters.cfc?wsdl'
url # Defines the method in a dictionary used to make the request
= {'method': 'GetAllCounters'}
counter_reqest_methods # Save the GetAllCounters request to memory
= requests.get(url, params=counter_reqest_methods)
response response
## <Response [200]>
Great! Response [200] means we got an OK response back. When making a GET request, a ‘response object’ will be returned. Let’s take a look at what’s inside our response object. Use the ‘.text’ method as shown below to look at the data inside our response object. The output is a mess of HTML so I wont print the results of the method rather I’ll leave it to you to take a look at it yourself.
response.text
We will use the ‘re’ module to remove some of the extraneous HTML. From the official Python documentation: “This module provides regular expression matching operations…”. A great tutorial on Regular Expressions (RegEx) can be found here.
The code below will substitute any ‘’ or ‘ with a blank string (’’), essentially deleting them. Again, take a look inside your clean_string_data object to see how the HTML looks.
import re
= response.text
string_data = re.sub(r'[\n|\t]', '', string_data) clean_string_data
Our data is now easier to read! The first part of the data mentions that the data is in XML format. We can use this to our advantage by converting the string to an XML to easily and efficiently pull out the data we are looking for.
We will use the Python module ‘xml.etree.ElementTree’ to convert the data from a string to XML format. By converting this to XML, finding the data we need will be faster because we don’t have to iterate over the entire string looking for the specific data we need, we can use the hierarchical structure of an XML file to drill down to the specific data we are looking for.
To make it easier to call, we will import the module as ‘ET’. We will then call the function ‘fromstring’ and pass in our ‘clean_string_data’ as an argument and assign the new XML object to the varaible root.
import xml.etree.ElementTree as ET
= ET.fromstring(clean_string_data)
root type(root)
## <class 'xml.etree.ElementTree.Element'>
Our object ‘root’ is now an ‘xml.etree.ElementTree.Element’ object that we can iterate over with a for loop or we can use the list function.
Using the list function allows us to take a look inside root to see its children.
list(root)
## [<Element 'counter' at 0x0000000061516810>, <Element 'counter' at 0x0000000061516C20>, <Element 'counter' at 0x0000000061518040>, <Element 'counter' at 0x00000000615183B0>, <Element 'counter' at 0x0000000061518770>, <Element 'counter' at 0x0000000061518B30>, <Element 'counter' at 0x0000000061518EF0>, <Element 'counter' at 0x0000000061519360>, <Element 'counter' at 0x0000000061519770>, <Element 'counter' at 0x0000000061519AE0>, <Element 'counter' at 0x0000000061519EA0>, <Element 'counter' at 0x000000006151B2C0>, <Element 'counter' at 0x000000006151B720>, <Element 'counter' at 0x000000006151BAE0>, <Element 'counter' at 0x000000006151BEA0>, <Element 'counter' at 0x000000006151D270>, <Element 'counter' at 0x000000006151D5E0>, <Element 'counter' at 0x000000006151D950>, <Element 'counter' at 0x000000006151DD10>, <Element 'counter' at 0x000000006151E0E0>, <Element 'counter' at 0x000000006151E4A0>, <Element 'counter' at 0x000000006151E810>, <Element 'counter' at 0x000000006151EBD0>, <Element 'counter' at 0x000000006151EF90>, <Element 'counter' at 0x0000000061521360>, <Element 'counter' at 0x0000000061521720>, <Element 'counter' at 0x0000000061521A90>, <Element 'counter' at 0x0000000061521E00>, <Element 'counter' at 0x00000000615231D0>, <Element 'counter' at 0x0000000061523540>, <Element 'counter' at 0x0000000061523950>, <Element 'counter' at 0x0000000061523D60>, <Element 'counter' at 0x0000000061525180>, <Element 'counter' at 0x0000000061525540>, <Element 'counter' at 0x0000000061525900>, <Element 'counter' at 0x0000000061525CC0>, <Element 'counter' at 0x00000000615260E0>, <Element 'counter' at 0x00000000615264A0>, <Element 'counter' at 0x0000000061526860>, <Element 'counter' at 0x0000000061526C20>, <Element 'counter' at 0x0000000061528040>, <Element 'counter' at 0x0000000061528400>, <Element 'counter' at 0x0000000061528810>, <Element 'counter' at 0x0000000061528C20>, <Element 'counter' at 0x000000006152A040>, <Element 'counter' at 0x000000006152A400>, <Element 'counter' at 0x000000006152A7C0>, <Element 'counter' at 0x000000006152AB30>, <Element 'counter' at 0x000000006152AEA0>]
We can see there are a lot of “Element ‘counter’…” objects, each representing a different Bikeometer with details of its own.
Now, we could go in blind and do lots of slicing to see the hierarchy of the file but realistically we have access to a web browser and can use the below URL to make the API call and see where the data that we need is located.
http://webservices.commuterpage.com/counters.cfc?wsdl&method=GetAllCounters
Let’s slice into the list of counters and assign the first counter to the variable ‘child’ then take a look inside.
= root[0]
child list(child)
## [<Element 'name' at 0x0000000061516900>, <Element 'description' at 0x0000000061516950>, <Element 'trail_id' at 0x00000000615169A0>, <Element 'trail_name' at 0x00000000615169F0>, <Element 'latitude' at 0x0000000061516A40>, <Element 'longitude' at 0x0000000061516A90>, <Element 'region' at 0x0000000061516AE0>]
We can see that ‘child’ contains many elements that we can access such as name, description, trail_id…etc.
However, we also know that ‘child’ contains the information for ‘id’. Let’s access that ‘id’ information.
We can use the items method to return a list of a single tuple.
child.items()
## [('id', '33')]
However, we really want to isolate the important information ‘33’.
We can actually use the keys method even though this is not a dictionary.
child.keys()
## ['id']
We know that the key to our value is id so we can use the get method and pass in id to get the value.
'id') child.get(
## '33'
Now we know that the Bikeometer ID for the first Bikeometer in our request is 33.
To access the ‘name’ element of the Bikometer we will slice into the list and access the first object. We will assign that object to the variable ‘grandchild’.
= child[0]
grandchild grandchild
## <Element 'name' at 0x0000000061516900>
Now that we have the ‘name’ object assigned, let’s see what the name of the first Bikeometer in our XML file is.
grandchild.text
## '110 Trail'
‘110 Trail’ is the name! If you want, try pulling some data yourself. Below, I describe how I automated pulling the data.
In order to automate pulling all this data, I’ll create a for loop to iterate through the XML data. Each Bikeometer’s data will be saved in a tuple and those tuples will be saved all together in a list.
First, I’ll create the empty list that will house the Bikeometer tuples.
= [] bikeometer_details
Next, I’ll create a function with a Russian nesting doll of a for loop to dive into the appropriate sections to pull the information I want. You’ll notice I don’t pull a field if it is ‘None’, ‘description’, ‘trail_id’, or ‘trail_name’ but you can pull this data if you want.
# Iterate through the children, grandchildren, and great-grandchildren and grab req data
def get_bikeometer_details():
for child in root:
# From child 'counter' gets the attribute 'id' of the counter and adds it to single_list
= [child.get('id')]
single_list # Loops through the grandchildren of root
for grandchild in list(child):
# Backslash is a linebreak to keep the line short
if grandchild.text != None and grandchild.tag != 'description' and \
!= 'trail_id' and grandchild.tag != 'trail_name':
grandchild.tag # If the grandchild is region, loop through region and grab the grandchildren data
if grandchild.tag == 'region':
for great_grandchild in list(grandchild):
single_list.append(great_grandchild.text)# If the grandchild is not region, the data is available in grandchild.text
else:
single_list.append(grandchild.text)# Cast the list into a tuple making it easier to migrate data to the database
= tuple(single_list)
single_tuple # Appends tuples to the list
bikeometer_details.append(single_tuple) get_bikeometer_details()
Taking a look inside our bikeometer_details object, we see a list of tuples, where each tuple is a separate Bikeometer.
Each tuple contains: (‘bikeometer_id’, ‘bikeometer_name’, ‘latitude’, ‘longitude’, ‘region’, ‘region_id’)
bikeometer_details
## [('33', '110 Trail', '38.885315', '-77.065022', 'Arlington', '1'), ('30', '14th Street Bridge', '38.874260', '-77.044610', 'Arlington', '1'), ('43', '15th Street NW', '38.907470', '-77.034610', 'DC', '3'), ('24', 'Ballston Connector', '38.882950', '-77.121235', 'Arlington', '1'), ('59', 'Bluemont Connector', '38.880440', '-77.119290', 'Arlington', '1'), ('56', 'BWP Counter', '38.933140', '-76.938320', "Prince George's County", '5'), ('47', 'Capital Crescent Trail #1', '38.979500', '-77.096760', 'Montgomery County', '4'), ('48', 'Capital Crescent Trail #2', '38.943070', '-77.115660', 'Montgomery County', '4'), ('10', 'CC Connector', '38.857702', '-77.047373', 'Arlington', '1'), ('20', 'Clarendon EB bike lane', '38.888780', '-77.090421', 'Arlington', '1'), ('35', 'Commonwealth Bike Lanes', '38.831730', '-77.059280', 'Alexandria', '2'), ('57', 'Cottage City Counter', '38.939700', '-76.944710', "Prince George's County", '5'), ('18', 'Crystal NB bike lane', '38.857315', '-77.049152', 'Arlington', '1'), ('3', 'Custis Bon Air Park', '38.879199', '-77.138420', 'Arlington', '1'), ('58', 'Custis Rosslyn', '38.897110', '-77.083410', 'Arlington', '1'), ('61', 'Eads NB', '38.857970', '-77.053040', 'Arlington', '1'), ('62', 'Eads SB', '38.859450', '-77.053500', 'Arlington', '1'), ('38', 'Eisenhower Trail', '38.802950', '-77.088350', 'Alexandria', '2'), ('44', 'Eye Street SW', '38.879260', '-77.015580', 'DC', '3'), ('14', 'Fairfax EB bike lane', '38.882767', '-77.104615', 'Arlington', '1'), ('60', 'Fairfax WB', '38.883330', '-77.103780', 'Arlington', '1'), ('5', 'Four Mile Run (piezo)', '38.843262', '-77.080860', 'Arlington', '1'), ('6', 'Four Mile Run (pyro)', '38.843353', '-77.080581', 'Arlington', '1'), ('42', 'Four Mile Trail', '38.840070', '-77.053290', 'Alexandria', '2'), ('37', 'Holmes Run Trail', '38.817830', '-77.123570', 'Alexandria', '2'), ('27', 'Joyce St NB', '38.867264', '-77.062829', 'Arlington', '1'), ('26', 'Joyce St SB', '38.867271', '-77.063054', 'Arlington', '1'), ('8', 'Key Bridge East', '38.900468', '-77.070656', 'Arlington', '1'), ('7', 'Key Bridge West', '38.900539', '-77.070900', 'Arlington', '1'), ('51', 'Matthew Henson Trail 1', '39.081660', '-77.047360', 'Montgomery County', '4'), ('52', 'Matthew Henson Trail 2', '39.058320', '-77.091600', 'Montgomery County', '4'), ('45', 'Metropolitan Branch Trail North', '38.923040', '-76.995700', 'DC', '3'), ('22', 'Military NB bike lane', '38.905509', '-77.109349', 'Arlington', '1'), ('21', 'Military SB bike lane', '38.905732', '-77.109698', 'Arlington', '1'), ('36', 'Mount Vernon Trail #1', '38.839570', '-77.046630', 'Alexandria', '2'), ('34', 'Mount Vernon Trail #2', '38.827310', '-77.042830', 'Alexandria', '2'), ('41', 'Mount Vernon Trail #3', '38.790290', '-77.050940', 'Alexandria', '2'), ('9', 'MVT Airport South', '38.844471', '-77.048923', 'Arlington', '1'), ('39', 'Potomac Yard Trail #1', '38.829850', '-77.047850', 'Alexandria', '2'), ('16', 'Quincy NB bike lane', '38.885457', '-77.108055', 'Arlington', '1'), ('15', 'Quincy SB bike lane', '38.884877', '-77.108075', 'Arlington', '1'), ('54', 'Rock Creek Trail 1', '39.016130', '-77.094200', 'Montgomery County', '4'), ('55', 'Rock Creek Trail 2', '39.090260', '-77.113820', 'Montgomery County', '4'), ('31', 'Roosevelt Bridge', '38.893822', '-77.065737', 'Arlington', '1'), ('28', 'Rosslyn Bikeometer', '38.899080', '-77.070980', 'Arlington', '1'), ('11', 'TR Island Bridge', '38.897960', '-77.067721', 'Arlington', '1'), ('2', 'W&OD Bon Air Park', '38.878949', '-77.138546', 'Arlington', '1'), ('25', 'W&OD Bon Air West', '38.879330', '-77.139250', 'Arlington', '1'), ('19', 'Wilson WB bike lane', '38.889855', '-77.090175', 'Arlington', '1')]
First, we will load that Pandas module so we can turn our list of tuples into a data frame.
import pandas as pd
Next, we will define the columns of the data frame according to the data in our tuples.
= ('bikeometer_id', 'name', 'latitude', 'longitude', 'region', 'region_id') columns
With our columns defined, we will create a dataframe using the Pandas DataFrame function, pass in our bikeometer_details list into the data parameter, then pass in our columns variable to the columns parameter.
= pd.DataFrame(data=bikeometer_details, columns = columns) df
Finally, we will assign a type to each column by using the .astype() method.
"name", "latitude", "longitude", "region", 'region_id']] = df[["name", "latitude", "longitude", "region", 'region_id']].astype('str')
df[["bikeometer_id"]] = df[["bikeometer_id"]].astype('int') df[[
df
## bikeometer_id ... region_id
## 0 33 ... 1
## 1 30 ... 1
## 2 43 ... 3
## 3 24 ... 1
## 4 59 ... 1
## 5 56 ... 5
## 6 47 ... 4
## 7 48 ... 4
## 8 10 ... 1
## 9 20 ... 1
## 10 35 ... 2
## 11 57 ... 5
## 12 18 ... 1
## 13 3 ... 1
## 14 58 ... 1
## 15 61 ... 1
## 16 62 ... 1
## 17 38 ... 2
## 18 44 ... 3
## 19 14 ... 1
## 20 60 ... 1
## 21 5 ... 1
## 22 6 ... 1
## 23 42 ... 2
## 24 37 ... 2
## 25 27 ... 1
## 26 26 ... 1
## 27 8 ... 1
## 28 7 ... 1
## 29 51 ... 4
## 30 52 ... 4
## 31 45 ... 3
## 32 22 ... 1
## 33 21 ... 1
## 34 36 ... 2
## 35 34 ... 2
## 36 41 ... 2
## 37 9 ... 1
## 38 39 ... 2
## 39 16 ... 1
## 40 15 ... 1
## 41 54 ... 4
## 42 55 ... 4
## 43 31 ... 1
## 44 28 ... 1
## 45 11 ... 1
## 46 2 ... 1
## 47 25 ... 1
## 48 19 ... 1
##
## [49 rows x 6 columns]
Now that our data is in a data frame, we have many options about what to do with it. We can pickle it for later, manipulate it with pandas, or do what I’m going to do in the next post: save it in my database.
Now that we can easily pull the Bikeometer Details data to usable format, we will pull the data for the actual daily bike counts.
Step 1: Make a request
Again, let’s look to the API documentation to find the method we need to request the Bike counts.
We will us the method ‘GetCountInDateRange’, pass in the dates that we are looking for, and pull out the data that we want: Bikeometer ID, Date, Direction (Inbound or Outbound), Count.
Note: The Bike Arlington API allows for queries of 1 year or less. This limitation can be managed by making requests in 1 year increments and adding them to the database or concatenating each 1 year request into a list.
Again, we’ll use the ‘requests’ library to make a ‘GET’ request to the base Bike Arlington URL and this time pass in the ‘GetCountInDateRange’ method as a parameter.
# Assign the url of the
= 'http://webservices.commuterpage.com/counters.cfc?wsdl'
url # Defines the method in a dictionary used to make the request
= {'method': 'GetCountInDateRange'}
counter_reqest_methods # Save the GetAllCounters request to memory
= requests.get(url, params=counter_reqest_methods)
response response
## <Response [200]>
Perfect, response [200]! Let’s take a look inside our ‘response object’ by using the .text method.
response.text
## '\r\n\t\t\t<error>\r\n\t \t\t\t<message>The COUNTERID parameter to the GetCountInDateRange function is required but was not passed in.</message>\r\n\t\t\t\t<detail></detail>\r\n\t\t\t</error>\r\n\t\t'
It looks like we’re missing some parameters… which makes sense. If we take a look at the documentation, it says:
"Request Fields:
CounterID – Number Required Field
startDate – mm/dd/yyyy -
endDate – mm/dd/yyyy -
direction – I, O (I: Inbound, O: outbound), Empty for both.
mode – B, P (B: bike, P: pesdestrian), empty for both.
startTime – HH:MM format
endtime – HH:MM format
interval – h (by the hourly), m (by the minutes), d (by the day)
Example with interval as d: http://webservices.commuterpage.com/counters.cfc?wsdl&method=GetCountInDateRange&counterid=1&startDate=12/1/2011&endDate=12/04/2011&direction=I&mode=B&interval=d"
Like with many parts of coding, there are multiple ways to achieve the same goal. We could create a function that concatenates a URL according to our desired ‘request fields’, or we can do what I do below: pass in each of the request fields to the GET function in the Requests module.
I’ll start with the first ‘bikeometer_id’ in the above Bikeometer Details data frame, so counterID = 33. To get the ‘start date’, let’s use one of the built-in Bike Arlington API methods: ‘getMinDates’
# Assign the url of the
= 'http://webservices.commuterpage.com/counters.cfc?wsdl'
url # Defines the method in a dictionary used to make the request
= {'method': 'getMinDates', 'counterID': '33'}
counter_reqest_methods # Save the GetAllCounters request to memory
= requests.get(url, params=counter_reqest_methods)
response response.text
## "<wddxPacket version='1.0'><header/><data><struct><var name='STARTDATE'><string>07/22/2015</string></var></struct></data></wddxPacket>"
Now that we know the first date in the database for Bikeometer 33 is 07/22/2015, we can use the GetCountInDateRange Bike Arlington API method.
To test, let’s make the end date the same as the start date, 07/22/2015. For ‘mode’, we have the option to request data for Bikers, Pedestrians, or Both. I’ll choose Bikers so the mode will be ‘B’. For ‘interval’ we can choose ‘minute’, ‘hourly’, or ‘daily’. I’ll choose ‘daily’. I’ll leave ‘direction’ blank to pull both the Inbound and Outbound direction.
We will organize all these parameters in a dictionary and pass that dictionary to the GET method.
= {'method': 'GetCountInDateRange',
request_parameters 'counterID': '33',
'startDate': '07/22/2015' ,
'endDate': '07/22/2015',
'mode': 'B',
'interval': 'D',
'direction': ''}
= requests.get(url, params=request_parameters)
response response.text
## '<?xml version="1.0" encoding="UTF-8"?>\n<counts counter="33" endDate="7/22/2015" startDate="7/22/2015"> \n\t\t<count count="384" date="07/22/2015" direction="I" mode="B"/>\n\t\t\n\t\t<count count="399" date="07/22/2015" direction="O" mode="B"/>\n\t\t\n\t</counts>'
It looks like on 7/22/15, there were 384 bikers in the Inbound direction and 399 bikers in the Outbound direction. Before we extract this data, let’s clean it up a little just to make our lives easier.
Again, we will use the ‘re’ module to remove some of the extraneous html.
The code below will substitute any ‘’ or ‘ with a blank string (’’), essentially deleting them.
import re
= response.text
string_data = re.sub(r'[\n|\t]', '', string_data)
clean_string_data clean_string_data
## '<?xml version="1.0" encoding="UTF-8"?><counts counter="33" endDate="7/22/2015" startDate="7/22/2015"> <count count="384" date="07/22/2015" direction="I" mode="B"/><count count="399" date="07/22/2015" direction="O" mode="B"/></counts>'
Like in when we requested the Bikeometer Details, the first part of this data mentions that the data is in XML format. We can use this to our advantage by converting the string to an XML to easily pull out the data we are looking for.
This step is the same as above but I’ll repeat it again for those that skipped right to this step.
We will use the module ‘xml.etree.ElementTree’ to convert the data from a string to XML format. By converting this to XML, finding the data we need will be faster because we don’t have to iterate over the entire string looking for the specific data we need, we can use the hierarchical structure of an XML file to drill down to the specific data we are looking for.
To make it easier to call, we will import the module as ‘ET’. We will then call the method ‘fromstring’ and pass in our ‘clean_string_data’ as an argument.
import xml.etree.ElementTree as ET
= ET.fromstring(clean_string_data)
root type(root)
## <class 'xml.etree.ElementTree.Element'>
Our object ‘root’ is now an ‘xml.etree.ElementTree.Element’ object that we can iterate over with a for loop or we can use the list function.
Using the list function allows us to take a look inside root to see its children.
list(root)
## [<Element 'count' at 0x000000006237E360>, <Element 'count' at 0x000000006237E3B0>]
We can see there are two Element ‘counter’… objects, one representing the Inbound direction and the other is the Outbound direction for our requested date.
Let’s slice into the list of counters and assign the first counter to the variable ‘child’ then take a look inside.
= root[0]
child child.items()
## [('count', '384'), ('date', '07/22/2015'), ('direction', 'I'), ('mode', 'B')]
We can see that ‘child’ contains the ‘count’, ‘date’, ‘direction’, and ‘mode’.
To isolate the important information we can use the get method and pass in ‘count’, ‘date’, ‘direction’, or ‘mode’ to get the value.
'count') child.get(
## '384'
'date') child.get(
## '07/22/2015'
'direction') child.get(
## 'I'
'mode') child.get(
## 'B'
Now that we know how to pull the data that we want, let’s automate it.
I’ll show you the function that I created to iterate through the XML data. Every ‘count’ and its details will be saved in a tuple and those tuples will be saved all together in a list.
First, I’ll create the empty list that will house the ‘count’ tuples.
= [] count_in_date_range_list
Next, I’ll create a list of the Bikeometer ID’s that I’d like to pull data for. My function will iterate over this list using a for loop.
= ['33','30','43','24','59','56','47','48','10','20',
bikeometer_id_list '35','57','18','3','58','61','62','38','44','14',
'60','5','6','42','37','27','26','8','7','51','52',
'45','22','21','36','34','41','9','39','16','15',
'54','55','31','28','11','2','25','19']
Next, I’ll create a Russian nesting doll of a for loop to dive into the appropriate sections to pull the information I want. I’ll use the datetime module to separate out the year, month, and day so they can be separate columns in my database.
from datetime import date, datetime, timedelta
def api_counts_to_list():
for bikeometer_id in bikeometer_id_list:
= {'method': 'GetCountInDateRange',
request_parameters 'counterID': bikeometer_id,
'startDate': '07/22/2015' ,
'endDate': '07/22/2015',
'mode': 'B',
'interval': 'D',
'direction': ''}
= requests.get(url, params=request_parameters)
response = response.text
string_data = re.sub(r'[\n|\t]', '', string_data)
clean_string_data = ET.fromstring(clean_string_data)
root for type_tag in root.findall('count'):
= type_tag.get('count')
count = type_tag.get('date')
date # Converts counter date to a date object
= datetime.strptime(date, '%m/%d/%Y').date()
date = date.year
year = date.month
month = date.day
day = f'{month}_{day}'
month_day = type_tag.get('direction')
direction if date.weekday() <= 4:
= 0
is_weekend else:
= 1
is_weekend = (bikeometer_id, date, direction, count, is_weekend, year, month, day, month_day)
single_tuple
count_in_date_range_list.append(single_tuple)return count_in_date_range_list
= api_counts_to_list()
all_id_list all_id_list
## [('33', datetime.date(2015, 7, 22), 'I', '384', 0, 2015, 7, 22, '7_22'), ('33', datetime.date(2015, 7, 22), 'O', '399', 0, 2015, 7, 22, '7_22'), ('30', datetime.date(2015, 7, 22), 'I', '1395', 0, 2015, 7, 22, '7_22'), ('30', datetime.date(2015, 7, 22), 'O', '1330', 0, 2015, 7, 22, '7_22'), ('43', datetime.date(2015, 7, 22), 'I', '1125', 0, 2015, 7, 22, '7_22'), ('43', datetime.date(2015, 7, 22), 'O', '1359', 0, 2015, 7, 22, '7_22'), ('24', datetime.date(2015, 7, 22), 'I', '210', 0, 2015, 7, 22, '7_22'), ('24', datetime.date(2015, 7, 22), 'O', '238', 0, 2015, 7, 22, '7_22'), ('10', datetime.date(2015, 7, 22), 'I', '467', 0, 2015, 7, 22, '7_22'), ('10', datetime.date(2015, 7, 22), 'O', '490', 0, 2015, 7, 22, '7_22'), ('20', datetime.date(2015, 7, 22), 'A', '256', 0, 2015, 7, 22, '7_22'), ('18', datetime.date(2015, 7, 22), 'A', '9', 0, 2015, 7, 22, '7_22'), ('3', datetime.date(2015, 7, 22), 'I', '758', 0, 2015, 7, 22, '7_22'), ('3', datetime.date(2015, 7, 22), 'O', '771', 0, 2015, 7, 22, '7_22'), ('44', datetime.date(2015, 7, 22), 'I', '275', 0, 2015, 7, 22, '7_22'), ('44', datetime.date(2015, 7, 22), 'O', '0', 0, 2015, 7, 22, '7_22'), ('14', datetime.date(2015, 7, 22), 'A', '213', 0, 2015, 7, 22, '7_22'), ('27', datetime.date(2015, 7, 22), 'I', '31', 0, 2015, 7, 22, '7_22'), ('27', datetime.date(2015, 7, 22), 'O', '6', 0, 2015, 7, 22, '7_22'), ('26', datetime.date(2015, 7, 22), 'I', '15', 0, 2015, 7, 22, '7_22'), ('26', datetime.date(2015, 7, 22), 'O', '24', 0, 2015, 7, 22, '7_22'), ('8', datetime.date(2015, 7, 22), 'I', '505', 0, 2015, 7, 22, '7_22'), ('8', datetime.date(2015, 7, 22), 'O', '993', 0, 2015, 7, 22, '7_22'), ('7', datetime.date(2015, 7, 22), 'I', '718', 0, 2015, 7, 22, '7_22'), ('7', datetime.date(2015, 7, 22), 'O', '204', 0, 2015, 7, 22, '7_22'), ('45', datetime.date(2015, 7, 22), 'I', '1551', 0, 2015, 7, 22, '7_22'), ('45', datetime.date(2015, 7, 22), 'O', '1397', 0, 2015, 7, 22, '7_22'), ('22', datetime.date(2015, 7, 22), 'A', '64', 0, 2015, 7, 22, '7_22'), ('21', datetime.date(2015, 7, 22), 'A', '55', 0, 2015, 7, 22, '7_22'), ('34', datetime.date(2015, 7, 22), 'I', '975', 0, 2015, 7, 22, '7_22'), ('34', datetime.date(2015, 7, 22), 'O', '937', 0, 2015, 7, 22, '7_22'), ('9', datetime.date(2015, 7, 22), 'I', '1053', 0, 2015, 7, 22, '7_22'), ('9', datetime.date(2015, 7, 22), 'O', '1012', 0, 2015, 7, 22, '7_22'), ('16', datetime.date(2015, 7, 22), 'A', '125', 0, 2015, 7, 22, '7_22'), ('15', datetime.date(2015, 7, 22), 'A', '110', 0, 2015, 7, 22, '7_22'), ('31', datetime.date(2015, 7, 22), 'I', '297', 0, 2015, 7, 22, '7_22'), ('31', datetime.date(2015, 7, 22), 'O', '328', 0, 2015, 7, 22, '7_22'), ('28', datetime.date(2015, 7, 22), 'I', '759', 0, 2015, 7, 22, '7_22'), ('28', datetime.date(2015, 7, 22), 'O', '1134', 0, 2015, 7, 22, '7_22'), ('11', datetime.date(2015, 7, 22), 'I', '925', 0, 2015, 7, 22, '7_22'), ('11', datetime.date(2015, 7, 22), 'O', '967', 0, 2015, 7, 22, '7_22'), ('2', datetime.date(2015, 7, 22), 'I', '463', 0, 2015, 7, 22, '7_22'), ('2', datetime.date(2015, 7, 22), 'O', '919', 0, 2015, 7, 22, '7_22'), ('25', datetime.date(2015, 7, 22), 'I', '1068', 0, 2015, 7, 22, '7_22'), ('25', datetime.date(2015, 7, 22), 'O', '1065', 0, 2015, 7, 22, '7_22'), ('19', datetime.date(2015, 7, 22), 'A', '328', 0, 2015, 7, 22, '7_22')]
Each tuple contains: (bikeometer_id, date, direction, count, is_weekend, year, month, day, month_day).
First, we will load that Pandas module so we can turn our list of tuples into a data frame.
import pandas as pd
Next, we will define the columns of the data frame according to the data in our tuples.
= ('bikeometer_id', 'date', 'direction', 'count', 'is_weekend', 'year', 'month', 'day', 'month_day') columns
With our columns defined, we will create a dataframe using the pandas DataFrame method, pass in our bikeometer_details list into the data parameter and pass in our columns variable to the columns parameter.
= pd.DataFrame(all_id_list, columns=columns)
df df
## bikeometer_id date direction count ... year month day month_day
## 0 33 2015-07-22 I 384 ... 2015 7 22 7_22
## 1 33 2015-07-22 O 399 ... 2015 7 22 7_22
## 2 30 2015-07-22 I 1395 ... 2015 7 22 7_22
## 3 30 2015-07-22 O 1330 ... 2015 7 22 7_22
## 4 43 2015-07-22 I 1125 ... 2015 7 22 7_22
## 5 43 2015-07-22 O 1359 ... 2015 7 22 7_22
## 6 24 2015-07-22 I 210 ... 2015 7 22 7_22
## 7 24 2015-07-22 O 238 ... 2015 7 22 7_22
## 8 10 2015-07-22 I 467 ... 2015 7 22 7_22
## 9 10 2015-07-22 O 490 ... 2015 7 22 7_22
## 10 20 2015-07-22 A 256 ... 2015 7 22 7_22
## 11 18 2015-07-22 A 9 ... 2015 7 22 7_22
## 12 3 2015-07-22 I 758 ... 2015 7 22 7_22
## 13 3 2015-07-22 O 771 ... 2015 7 22 7_22
## 14 44 2015-07-22 I 275 ... 2015 7 22 7_22
## 15 44 2015-07-22 O 0 ... 2015 7 22 7_22
## 16 14 2015-07-22 A 213 ... 2015 7 22 7_22
## 17 27 2015-07-22 I 31 ... 2015 7 22 7_22
## 18 27 2015-07-22 O 6 ... 2015 7 22 7_22
## 19 26 2015-07-22 I 15 ... 2015 7 22 7_22
## 20 26 2015-07-22 O 24 ... 2015 7 22 7_22
## 21 8 2015-07-22 I 505 ... 2015 7 22 7_22
## 22 8 2015-07-22 O 993 ... 2015 7 22 7_22
## 23 7 2015-07-22 I 718 ... 2015 7 22 7_22
## 24 7 2015-07-22 O 204 ... 2015 7 22 7_22
## 25 45 2015-07-22 I 1551 ... 2015 7 22 7_22
## 26 45 2015-07-22 O 1397 ... 2015 7 22 7_22
## 27 22 2015-07-22 A 64 ... 2015 7 22 7_22
## 28 21 2015-07-22 A 55 ... 2015 7 22 7_22
## 29 34 2015-07-22 I 975 ... 2015 7 22 7_22
## 30 34 2015-07-22 O 937 ... 2015 7 22 7_22
## 31 9 2015-07-22 I 1053 ... 2015 7 22 7_22
## 32 9 2015-07-22 O 1012 ... 2015 7 22 7_22
## 33 16 2015-07-22 A 125 ... 2015 7 22 7_22
## 34 15 2015-07-22 A 110 ... 2015 7 22 7_22
## 35 31 2015-07-22 I 297 ... 2015 7 22 7_22
## 36 31 2015-07-22 O 328 ... 2015 7 22 7_22
## 37 28 2015-07-22 I 759 ... 2015 7 22 7_22
## 38 28 2015-07-22 O 1134 ... 2015 7 22 7_22
## 39 11 2015-07-22 I 925 ... 2015 7 22 7_22
## 40 11 2015-07-22 O 967 ... 2015 7 22 7_22
## 41 2 2015-07-22 I 463 ... 2015 7 22 7_22
## 42 2 2015-07-22 O 919 ... 2015 7 22 7_22
## 43 25 2015-07-22 I 1068 ... 2015 7 22 7_22
## 44 25 2015-07-22 O 1065 ... 2015 7 22 7_22
## 45 19 2015-07-22 A 328 ... 2015 7 22 7_22
##
## [46 rows x 9 columns]
Now that we are able to automate pulling the data into a data frame, in the next post, we can set up our database which we will use to store the data for analysis.