Pandas Vectorization#
Before solving these exercises you should have read High-Level Data Management with Pandas.
import pandas as pd
For these exercises we use the dataset obtained in the Cafeteria project.
data = pd.read_csv('meals.csv', names=['date', 'category', 'name', 'students', 'staff', 'guests'])
data
Preprocessing#
Data Types#
Convert 'date' and 'category' columns to Timestamp and category, respectively (see Advanced Pandas exercises).
Solution:
# your solution
Count Categories#
For each category give the number of meals.
Solution:
# your solution
Remove Categories#
Remove Categories which do not contain full-fledged meals descriptions (e.g., 'Salat Bar').
Solution:
# your solution
Remove Allergens and Additives#
Most meals contain information on alergenes and additives (numbers in parantheses). Remove the information to get more readably meal descriptions. Implement the removal procedure twice: without and with vectorized string operations. Get and compare execution times.
Solution:
data['name_backup'] = data['name']
%%timeit
# your solution without vectorized string operations
data['name'] = data['name_backup']
data = data.drop(columns=['name_backup'])
%%timeit
# your solution with vectorized string operations
Simplify Meal Descriptions#
Create a new column 'simple' from the 'name' column by removing all lower-case words, all punctuation marks and so on. Only words starting with an upper-case letter are allowed.
Solution:
# your solution
All Meals with…#
Given a key word (e.g., 'Kartoffel') get the number of meals containing the keyword and print all meal descriptions.
Solution:
# your solution
Meal Plot#
For each day get the number of meals containing some keyword (e.g., 'Kartoffel'). Call Series.plot to visualize the result.
Solution:
# your solution