enumerate()
is a generator that returns a running index as well as the actual element or item in the container. Optinal argument includes the start parameter (number the index should start at)
## [(0, 1), (1, 2), (2, 3)]
list(enumerate(range(3), start=10))
## [(10, 0), (11, 1), (12, 2)]
Fizz Buzz
Write a program that prints the numbers from 1 to 100 and for multiples of ‘3’ print “Fizz” instead of the number and for the multiples of ‘5’ print “Buzz”.
for i in range(1,101):
if i % 3 == 0:
if i % 5 == 0:
print('Fizz Buzz')
continue
print('Fizz')
continue
if i % 5 == 0:
if i % 3 == 0:
print('Fizz Buzz')
continue
print('Buzz')
continue
else:
print(i)
## 1
## 2
## Fizz
## 4
## Buzz
## Fizz
## 7
## 8
## Fizz
## Buzz
## 11
## Fizz
## 13
## 14
## Fizz Buzz
## 16
## 17
## Fizz
## 19
## Buzz
## Fizz
## 22
## 23
## Fizz
## Buzz
## 26
## Fizz
## 28
## 29
## Fizz Buzz
## 31
## 32
## Fizz
## 34
## Buzz
## Fizz
## 37
## 38
## Fizz
## Buzz
## 41
## Fizz
## 43
## 44
## Fizz Buzz
## 46
## 47
## Fizz
## 49
## Buzz
## Fizz
## 52
## 53
## Fizz
## Buzz
## 56
## Fizz
## 58
## 59
## Fizz Buzz
## 61
## 62
## Fizz
## 64
## Buzz
## Fizz
## 67
## 68
## Fizz
## Buzz
## 71
## Fizz
## 73
## 74
## Fizz Buzz
## 76
## 77
## Fizz
## 79
## Buzz
## Fizz
## 82
## 83
## Fizz
## Buzz
## 86
## Fizz
## 88
## 89
## Fizz Buzz
## 91
## 92
## Fizz
## 94
## Buzz
## Fizz
## 97
## 98
## Fizz
## Buzz
Write a program with the same requirements as above but mutates a list, not prints it. Use if and elif to not mutate the list so many times.
def fizz_buzz(numbers):
'''
Given a list of integers:
1. replace all integers that are evenly divisible by 3 with 'fizz'
2. replace all integers divisible by 5 with 'buzz'
3. replace all integers divisible by both 3 and 5 with 'fizzbuzz'
>>> numbers = [45, 22, 14, 65, 97, 72]
>>> fizz_buzz(numbers)
>>> numbers
['fizzbuzz', 22, 14, 'buzz', 97, 'fizz']
'''
for i in range(len(numbers)):
num = numbers[i]
if num % 3 == 0 and num % 5 == 0:
numbers[i] = 'fizzbuzz'
elif num % 3 == 0:
numbers[i] = 'fizz'
elif num % 5 == 0:
numbers[i] = 'buzz'
numbers = [1,2,3,4,5,6,7,8,9,10,15,45, 22, 14, 65, 97, 72]
fizz_buzz(numbers)
numbers
## [1, 2, 'fizz', 4, 'buzz', 'fizz', 7, 8, 'fizz', 'buzz', 'fizzbuzz', 'fizzbuzz', 22, 14, 'buzz', 97, 'fizz']
Can also use enumerate()
instead of range(len()). Difference is that eumerate()
takes an iterable while range(len()) takes a number.
def fizz_buzz(numbers):
'''
Given a list of integers:
1. replace all integers that are evenly divisible by 3 with 'fizz'
2. replace all integers divisible by 5 with 'buzz'
3. replace all integers divisible by both 3 and 5 with 'fizzbuzz'
>>> numbers = [45, 22, 14, 65, 97, 72]
>>> fizz_buzz(numbers)
>>> numbers
['fizzbuzz', 22, 14, 'buzz', 97, 'fizz']
'''
for i, num in enumerate(numbers):
num = numbers[i]
if num % 3 == 0 and num % 5 == 0:
numbers[i] = 'fizzbuzz'
elif num % 3 == 0:
numbers[i] = 'fizz'
elif num % 5 == 0:
numbers[i] = 'buzz'
numbers_1 = list(range(1,101))
fizz_buzz(numbers_1)
numbers_1
## [1, 2, 'fizz', 4, 'buzz', 'fizz', 7, 8, 'fizz', 'buzz', 11, 'fizz', 13, 14, 'fizzbuzz', 16, 17, 'fizz', 19, 'buzz', 'fizz', 22, 23, 'fizz', 'buzz', 26, 'fizz', 28, 29, 'fizzbuzz', 31, 32, 'fizz', 34, 'buzz', 'fizz', 37, 38, 'fizz', 'buzz', 41, 'fizz', 43, 44, 'fizzbuzz', 46, 47, 'fizz', 49, 'buzz', 'fizz', 52, 53, 'fizz', 'buzz', 56, 'fizz', 58, 59, 'fizzbuzz', 61, 62, 'fizz', 64, 'buzz', 'fizz', 67, 68, 'fizz', 'buzz', 71, 'fizz', 73, 74, 'fizzbuzz', 76, 77, 'fizz', 79, 'buzz', 'fizz', 82, 83, 'fizz', 'buzz', 86, 'fizz', 88, 89, 'fizzbuzz', 91, 92, 'fizz', 94, 'buzz', 'fizz', 97, 98, 'fizz', 'buzz']
Doctest
Doctest is a module that scans the Docstrings of functions looking for lines that look like input and output and uses them to test your function. The Doctest module can be either added into the script or run in concert with the script using command line arguments.
Adding to the script
if __name__ == "__main__":
import doctest
doctest.testmod()
Running from the command line
python -m doctest -v temp.py
The -v
switch will print the tests and results that Doctest is running. Without the switch, it will run but not print anything to the console unless there is an error found.
Examples
def factorial(n):
"""Return the factorial of n, an exact integer >= 0.
>>> [factorial(n) for n in range(6)]
[1, 1, 2, 6, 24, 120]
>>> factorial(30)
265252859812191058636308480000000
>>> factorial(-1)
Traceback (most recent call last):
...
ValueError: n must be >= 0
Factorials of floats are OK, but the float must be an exact integer:
>>> factorial(30.1)
Traceback (most recent call last):
...
ValueError: n must be exact integer
>>> factorial(30.0)
265252859812191058636308480000000
It must also not be ridiculously large:
>>> factorial(1e100)
Traceback (most recent call last):
...
OverflowError: n too large
"""
import math
if not n >= 0:
raise ValueError("n must be >= 0")
if math.floor(n) != n:
raise ValueError("n must be exact integer")
if n+1 == n: # catch a value like 1e300
raise OverflowError("n too large")
result = 1
factor = 2
while factor <= n:
result *= factor
factor += 1
return result
if __name__ == "__main__":
import doctest
doctest.testmod()
Differences between two lists
When comparing two lists
, you can convert them into sets
to eliminate any duplicate numbers. Also, lists can not be operated on using ‘-’ but sets
can. Lists can however use the ‘+’ which just combines the two lists
together.
list1 = [1,2,3,4]
list2 = [3,3,4,5,6,7,7]
set(list1) - set(list2)
## {1, 2}
list1 = [1,2,3,4]
list2 = [3,3,4,5,6,7,7]
set(list2) - set(list1)
## {5, 6, 7}
def diff(list1, list2):
return list(set(list1) - set(list2)) + list(set(list2) - set(list1))
list1 = [1,2,3,4]
list2 = [3,3,4,5,6,7,7]
diff(list1, list2)
## [1, 2, 5, 6, 7]
Without using set()
we can use list comprehension
def diff2(list1, list2):
list_dif = [i for i in list1 + list2 if i not in list1 or i not in list2]
return list_dif
list1 = [1,2,3,4]
list2 = [3,3,4,5,6,7,7]
list3 = diff2(list1, list2)
print(list3)
## [1, 2, 5, 6, 7, 7]
The difference between diff()
and diff2()
is that if you don’t convert the lists
to sets
then you keep both 7's
in list2
.
Underscore ’_’
The underscore can be used to ignore a single value
a, _, b = (1, 2, 3) # a = 1, b = 3
print(a, b)
## 1 3
It can also be used to ignore multiple values using the *(variable) used to assign multiple value to a variable as list while unpacking it’s called “Extended Unpacking”, only available in Python 3.x
a, *_, b = (7, 6, 5, 4, 3, 2, 1)
print(a, b)
## 7 1
Separating Digits Of Numbers
If you have a long digits number, you can separate the group of digits as you like for better understanding.
million = 1_000_000
List comprehension
lst = [1,2,-5,4]
def square(x):
return x*x
Instead of doing the following:
## [1, 4, 25, 16]
You can use list comprehension:
[square(num) for num in lst]
## [1, 4, 25, 16]
def is_odd(x):
return x % 2 == 1
We can use the filter()
function to filter out elements of our list according to if they meet certain criteria.
list(filter(is_odd, lst))
## [1, -5]
[x for x in lst if is_odd(x)]
## [1, -5]
Let’s create a 2x3 matrix filled with zeros and define it using lists.
Using for
loop
num_rows = 2
num_columns = 3
grid = []
for _ in range(num_rows):
curr_row = []
for _ in range(num_columns):
curr_row.append(0)
grid.append(curr_row)
grid
## [[0, 0, 0], [0, 0, 0]]
Using list comprehension
num_rows = 2
num_columns = 3
grid = []
grid = [[0 for _ in range(num_columns)] for _ in range(num_rows)]
grid
## [[0, 0, 0], [0, 0, 0]]
Compare the above to this where the brackets are in different places.
num_rows = 2
num_columns = 3
grid = []
grid = [[0 for _ in range(num_columns) for _ in range(num_rows)]]
grid
## [[0, 0, 0, 0, 0, 0]]
[] = list
() = tuple
{} = dictionary/set
The function max()
takes in some numbers and returns the max number
L = [1,2,3, -4]
t = (1,2,3, -4)
s = {1,2,3, -4}
## 3
## 3
## 3
## 3
max(L, key=lambda x: x*x)
## -4
min()
is the same as max but returns the min
any()
takes in an iterable and returns True
if any of the values in the iterable are True
and returns False
if none of the values are True
## True
## False
We can’t use any()
to see if any in our list is odd because it does not take arguments to key
any(L, key=lambda x: x % 2 == 1)
Instead, we have to use list comprehension. You can pass in an argument (num) by adding it after the lambda function.
[(lambda x:x % 2 == 1)(num) for num in L]
## [True, False, True, False]
any([(lambda x:x % 2 == 1)(num) for num in L])
## True
all()
is the same as any()
but only returns True
if **all* of the items are True
all([(lambda x:x % 2 == 1)(num) for num in L])
## False
F-strings when creating a new class
class A(object):
def __init__(self,name,age):
self.name = name
self.age = age
def __repr__(self):
return f"""
My name is {self.name}.
I am {self.age + 5} years old
"""
name = 'Bob'
age = 15
print(A(name,age))
##
## My name is Bob.
## I am 20 years old
##
##
## My name is Nathan.
## I am 36 years old
##
Sorting
Can sort a list alphabetically.
animals = ["cat", "dog", "cheetah", "rhino"]
sorted(animals)
## ['cat', 'cheetah', 'dog', 'rhino']
sorted(animals, reverse=True)
## ['rhino', 'dog', 'cheetah', 'cat']
Here, we have a list of animals defined by a dictionary.
animals = [
{'type': 'cat', 'name': 'Stephanie', 'age': 8},
{'type': 'dog', 'name': 'Devon', 'age': 3},
{'type': 'rhino', 'name': 'Moe', 'age': 5},
]
You can’t sort a dictionary but you can define a lambda
to sort by.
sorted(animals, key = lambda animal: animal['age'])
## [{'type': 'dog', 'name': 'Devon', 'age': 3}, {'type': 'rhino', 'name': 'Moe', 'age': 5}, {'type': 'cat', 'name': 'Stephanie', 'age': 8}]
If you wanted to return the oldest animal, pass in the reverse=True
parameter and slice into the first item in the dictionary.
sorted(animals, key = lambda animal: animal['age'], reverse=True)[0]
## {'type': 'cat', 'name': 'Stephanie', 'age': 8}
You can also use the .sort()
method, you can do the same thing but mutating the dictionary
animals.sort(key = lambda animal: animal['age'], reverse=True)
animals
## [{'type': 'cat', 'name': 'Stephanie', 'age': 8}, {'type': 'rhino', 'name': 'Moe', 'age': 5}, {'type': 'dog', 'name': 'Devon', 'age': 3}]
Set()
- Sets are unordered
- Set elements are unique (no duplicates)
- A set may be modified but the elements contained in the set must be of an immutable type
## ['q', 'u', 'u', 'x']
## {'q', 'u', 'x'}
x = {'foo', 'bar', 'baz', 'foo', 'qux'}
x
## {'foo', 'bar', 'qux', 'baz'}
## {'o', 'f'}
## <class 'dict'>
## <class 'set'>
Don’t forget that set elements must be immutable. For example, a tuple may be included in a set:
x = {42, 'foo', (1, 2, 3), 3.14159}
x
## {'foo', 42, (1, 2, 3), 3.14159}
But lists and dictionaries are mutable, so they can’t be set elements:
a = [1, 2, 3]
{a}
d = {'a': 1, 'b': 2}
{d}
x = {'foo', 'bar', 'baz'}
len(x)
## 3
## True
## False
In Python, set union can be performed with the | operator:
x1 = {'foo', 'bar', 'baz'}
x2 = {'baz', 'qux', 'quux'}
x1 | x2
## {'foo', 'baz', 'bar', 'quux', 'qux'}
Set union can also be obtained with the .union() method. The method is invoked on one of the sets, and the other is passed as an argument
## {'foo', 'baz', 'bar', 'quux', 'qux'}
When you use the | operator, both operands must be sets. The .union() method, on the other hand, will take any iterable as an argument, convert it to a set, and then perform the union.
x1.union(('baz', 'qux', 'quux'))
## {'baz', 'bar', 'qux', 'foo', 'quux'}
More than two sets may be specified with either the operator or the method:
a = {1, 2, 3, 4}
b = {2, 3, 4, 5}
c = {3, 4, 5, 6}
d = {4, 5, 6, 7}
a.union(b, c, d)
## {1, 2, 3, 4, 5, 6, 7}
## {1, 2, 3, 4, 5, 6, 7}
The resulting set contains only elements that are present in all of the specified sets.
a = {1, 2, 3, 4}
b = {2, 3, 4, 5}
c = {3, 4, 5, 6}
d = {4, 5, 6, 7}
a.intersection(b, c, d)
## {4}
## {4}
x1.difference(x2) and x1 - x2 return the set of all elements that are in x1 but not in x2:
x1 = {'foo', 'bar', 'baz'}
x2 = {'baz', 'qux', 'quux'}
x1.difference(x2)
## {'foo', 'bar'}
## {'foo', 'bar'}
Frozen Sets
Srozenset, which is in all respects exactly like a set, except that a frozenset is immutable. You can perform non-modifying operations on a frozenset but methods that attempt to modify a frozenset fail
How to Use Generators and yield in Python
Have you ever had to work with a dataset so large that it overwhelmed your machine’s memory? Or maybe you have a complex function that needs to maintain an internal state every time it’s called, but the function is too small to justify creating its own class. In these cases and more, generators and the Python yield statement are here to help.
Using Generators
Introduced with PEP 255, generator functions are a special kind of function that return a lazy iterator. These are objects that you can loop over like a list. However, unlike lists, lazy iterators do not store their contents in memory.
This is a reasonable explanation, but would this design still work if the file is very large? What if the file is larger than the memory you have available? To answer this question, let’s assume that csv_reader() just opens the file and reads it into an array:
def csv_reader(file_name):
file = open(file_name)
result = file.read().split("\n")
return result
This function opens a given file and uses file.read() along with .split() to add each line as a separate element to a list. If you were to use this version of csv_reader() in the row counting code block you saw further up, then you’d get the following output: MemoryError
In this case, open() returns a generator object that you can lazily iterate through line by line. However, file.read().split() loads everything into memory at once, causing the MemoryError.
However you can turn csv_reader() into a generator function:
def csv_reader(file_name):
for row in open(file_name, "r"):
yield row
This version opens a file, loops through each line, and yields each row instead of returning it.
You can also define a generator expression (also called a generator comprehension), which has a very similar syntax to list comprehensions. In this way, you can use the generator without calling a function:
csv_gen = (row for row in open(file_name))
- Using yield will result in a generator object.
- Using return will result in the first line of the file only.
Generating an Infinite Sequence
In Python, to get a finite sequence, you call range() and evaluate it in a list context:
## [0, 1, 2, 3, 4]
Generating an infinite sequence, however, will require the use of a generator, since your computer memory is finite:
def infinite_sequence():
num = 0
while True:
yield num
num += 1
Detecting Palindromes
def is_palindrome(num):
# Skip single-digit inputs
# Floor division returns not the remainder like % but the whole number.
if num // 10 == 0:
return False
temp = num
reversed_num = 0
while temp != 0:
reversed_num = (reversed_num * 10) + (temp % 10)
temp = temp // 10
if num == reversed_num:
return num
else:
return False
Don’t worry too much about understanding the underlying math in this code. Just note that the function takes an input number, reverses it, and checks to see if the reversed number is the same as the original. Now you can use your infinite sequence generator to get a running list of all numeric palindromes:
for i in infinite_sequence():
pal = is_palindrome(i)
if pal:
print(pal)
Understanding Generators
yield indicates where a value is sent back to the caller, but unlike return, you don’t exit the function afterward.
Instead, the state of the function is remembered. That way, when next() is called on a generator object (either explicitly or implicitly within a for loop), the previously yielded variable num is incremented, and then yielded again.
You can create Generator Expressions without building and holding the entire object in memory before iteration. In other words, you’ll have no memory penalty when you use generator expressions. Take this example of squaring some numbers.
nums_squared_lc = [num**2 for num in range(5)]
nums_squared_lc
## [0, 1, 4, 9, 16]
nums_squared_gc = (num**2 for num in range(5))
nums_squared_gc
## <generator object <genexpr> at 0x0000000060651AC0>
Understanding the Python Yield Statement
When you call a generator function or use a generator expression, you return a special iterator called a generator. You can assign this generator to a variable in order to use it. When you call special methods on the generator, such as next(), the code within the function is executed up to yield.
When the Python yield statement is hit, the program suspends function execution and returns the yielded value to the caller. (In contrast, return stops function execution completely.) When a function is suspended, the state of that function is saved. This includes any variable bindings local to the generator, the instruction pointer, the internal stack, and any exception handling.
This allows you to resume function execution whenever you call one of the generator’s methods. In this way, all function evaluation picks back up right after yield. You can see this in action by using multiple Python yield statements:
def multi_yield():
yield_str = "This will print the first string"
yield yield_str
yield_str = "This will print the second string"
yield yield_str
multi_obj = multi_yield()
print(next(multi_obj))
print(next(multi_obj))
print(next(multi_obj))
The following is the output, which returns a traceback in the end.
'This will print the first string'
'This will print the second string'
*StopIteration:*
Take a closer look at that last call to next(). You can see that execution has blown up with a traceback. This is because generators, like all iterators, can be exhausted. Unless your generator is infinite, you can iterate through it one time only. Once all values have been evaluated, iteration will stop and the for loop will exit. If you used next(), then instead you’ll get an explicit StopIteration exception.
Using Advanced Generator Methods
- .send()
- .throw()
- .close()
Creating Data Pipelines With Generators
Data pipelines allow you to string together code to process large datasets or streams of data without maxing out your machine’s memory. Imagine that you have a large CSV file:
Let’s think of a strategy:
- Read every line of the file.
- Split each line into a list of values.
- Extract the column names.
- Use the column names and lists to create a dictionary.
- Filter out the rounds you aren’t interested in.
- Calculate the total and average values for the rounds you are interested in.
Normally, you can do this with a package like pandas, but you can also achieve this functionality with just a few generators. You’ll start by reading each line from the file with a generator expression:
file_name = "data/TechCrunchcontinentalUSA.csv"
lines = (line for line in open(file_name))
# Then, you’ll use another generator expression in concert with the previous one to split each line into a list:
list_line = (s.rstrip().split(",") for s in lines)
Here, you created the generator list_line, which iterates through the first generator lines. This is a common pattern to use when designing generator pipelines. Next, you’ll pull the column names out of techcrunch.csv. Since the column names tend to make up the first line in a CSV file, you can grab that with a short next() call:
To sum this up, you first create a generator expression lines to yield each line in a file. Next, you iterate through that generator within the definition of another generator expression called list_line, which turns each line into a list of values. Then, you advance the iteration of list_line just once with next() to get a list of the column names from your CSV file.
To help you filter and perform operations on the data, you’ll create dictionaries where the keys are the column names from the CSV:
company_dicts = (dict(zip(cols, data)) for data in list_line)
This generator expression iterates through the lists produced by list_line. Then, it uses zip() and dict() to create the dictionary as specified above. Now, you’ll use a fourth generator to filter the funding round you want and pull raisedAmt as well:
funding = (
int(company_dict["raisedAmt"])
for company_dict in company_dicts
if company_dict["round"] == "a"
)
In this code snippet, your generator expression iterates through the results of company_dicts and takes the raisedAmt for any company_dict where the round key is “a”.
Remember, you aren’t iterating through all these at once in the generator expression. In fact, you aren’t iterating through anything until you actually use a for loop or a function that works on iterables, like sum(). In fact, call sum() now to iterate through the generators:
total_series_a = sum(funding)
print(f"Total series A fundraising: ${total_series_a}")
## Total series A fundraising: $4376015000
file_name = "techcrunch.csv"
lines = (line for line in open(file_name))
list_line = (s.rstrip()split(",") for s in lines)
cols = next(list_line)
company_dicts = (dict(zip(cols, data)) for data in list_line)
funding = (
int(company_dict["raisedAmt"])
for company_dict in company_dicts
if company_dict["round"] == "a"
)
total_series_a = sum(funding)
print(f"Total series A fundraising: ${total_series_a}")
Putting this all together, you’ll produce the following script:
This script pulls together every generator you’ve built, and they all function as one big data pipeline. Here’s a line by line breakdown:
- Line 2 reads in each line of the file.
- Line 3 splits each line into values and puts the values into a list.
- Line 4 uses next() to store the column names in a list.
- Line 5 creates dictionaries and unites them with a zip() call:
- The keys are the column names cols from line 4.
- The values are the rows in list form, created in line 3.
- Line 6 gets each company’s series A funding amounts. It also filters out any other raised amount.
- Line 11 begins the iteration process by calling sum() to get the total amount of series A funding found in the CSV.
When you run this code on techcrunch.csv, you should find a total of $4,376,015,000 raised in series A funding rounds.
Note: The methods for handling CSV files developed in this tutorial are important for understanding how to use generators and the Python yield statement. However, when you work with CSV files in Python, you should instead use the csv module included in Python’s standard library. This module has optimized methods for handling CSV files efficiently.
Using zip() in Python
Python’s zip() function is defined as zip(*iterables). The function takes in iterables as arguments and returns an iterator. This iterator generates a series of tuples containing elements from each iterable. zip() can accept any type of iterable, such as files, lists, tuples, dictionaries, sets, and so on.
Passing n Arguments
If you use zip() with n arguments, then the function will return an iterator that generates tuples of length n. To see this in action, take a look at the following code block:
numbers = [1, 2, 3]
letters = ['a', 'b', 'c']
zipped = zip(numbers, letters)
zipped # Holds an iterator object
## <zip object at 0x0000000060792E00>
## <class 'zip'>
## [(1, 'a'), (2, 'b'), (3, 'c')]
Here, you use zip(numbers, letters) to create an iterator that produces tuples of the form (x, y). In this case, the x values are taken from numbers and the y values are taken from letters. Notice how the Python zip() function returns an iterator. To retrieve the final list object, you need to use list() to consume the iterator.
If you’re working with sequences like lists, tuples, or strings, then your iterables are guaranteed to be evaluated from left to right. This means that the resulting list of tuples will take the form [(numbers[0], letters[0]), (numbers[1], letters[1]),…, (numbers[n], letters[n])]. However, for other types of iterables (like sets), you might see some weird results:
s1 = {2, 3, 1}
s2 = {'b', 'a', 'c'}
list(zip(s1, s2))
## [(1, 'b'), (2, 'c'), (3, 'a')]
