Primitive Examples to see Datetime operations and Parser functionalities
In [4]:
from datetime import datetime
now = datetime.now()
print now
In [5]:
from dateutil.parser import parse
parse('December 23rd 1994')
Out[5]:
Using a Sample Dataset to check for graphing abilities while iterating over Datetime objects
Labelling also checked and sample graphs plotted.
In [6]:
from pandas import DataFrame, Series
import pandas as pd
import numpy as np
dates = [datetime(2011, 1, 2), datetime(2011, 1, 5), datetime(2011, 1, 7), \
datetime(2011, 1, 8), datetime(2011, 1, 10), datetime(2011, 1, 12)]
sample_series = Series(np.random.randn(len(dates)), index=dates)
In [12]:
%matplotlib inline
import matplotlib.pyplot as plt
print sample_series
sample_series.plot(kind="line")
Out[12]:
Using the UCI Machine Learning Dataset on Power consumptions for many time series operations.
Owing to the large dataset size (>= 2000000) we slice the dataset and only consider the top 1000 instances
In [29]:
house_data = pd.read_csv("/home/mridul/nilmtk/household_power_consumption.txt", sep = ';', names=['Date', 'Time','Global Active Power','Global Reactive Power', 'Voltage', 'Global Intensity', 'Sub-metering 1', 'Sub-metering 2', 'Sub-metering 3'])[1:1001]
print house_data[:5]
In [54]:
house_data['Timestamp'] = pd.DatetimeIndex(pd.to_datetime(house_data['Date']+' '+house_data['Time']))
In [73]:
len(global_ap['2006-12-16'])
Out[73]:
Comparing the overall Active vs Reactive power through graphical representation.
In [103]:
global_ap = Series(house_data['Global Active Power'].astype('float64').tolist(), index=pd.DatetimeIndex(house_data['Timestamp']),)
global_ap.name = 'Global Active Power'
ax1 = global_ap.plot(kind="area",color="red")
global_rp = Series(house_data['Global Reactive Power'].astype('float64').tolist(), index=pd.DatetimeIndex(house_data['Timestamp']))
global_rp.name = 'Global Reactive Power'
print global_ap.name,'vs', global_rp.name
global_rp.plot(kind="area",color="green")
#Combining 2 Series Area graphs requires custom labels to be made
handles, labels = ax1.get_legend_handles_labels()
ax1.legend([handle for i,handle in enumerate(handles)],\
[label for label in list(house_data.columns.values)[2:4]])
Out[103]:
Resampling with different periods and frequencies (hour, half hour, day etc.) to arrive at different datasets.
In [113]:
temp_ap = global_ap.resample('H')
temp_ap.plot(kind="bar")
Out[113]:
In [114]:
temp2_ap = global_ap.resample('D')
temp2_ap.plot(kind="bar",color='green')
Out[114]:
Some primitive operations on the Datasets to understand filtering and splitting based on numerous Timestamp and Datatime inputs.
In [116]:
from pandas.tseries.offsets import Hour, Minute
minute = Minute(30)
minute
Out[116]:
In [129]:
time_temp = pd.date_range('2006-12-16','2006-12-17',freq = minute)
time_local = pd.date_range('2006-12-16','2006-12-17',freq = 'D', tz='Asia/Kolkata')
print time_temp
print time_local
In [132]:
time_temp = time_temp.tz_convert('US/Eastern')
print time_temp
Time Zone operations - Localize, Convert, Perform operations, Concatinate etc.
In [138]:
sample_timestamp = time_local[0] + 3*minute
time_local[0], sample_timestamp
Out[138]:
In [143]:
time_result = time_temp[0].tz_convert('Asia/Kolkata') - time_local[0]
result
Out[143]:
Trying to calculate the means for different periods of frequencies for the given Datasets and drawing conclusions
We have calculated the mean Active Power consumption for an hourly period with uniform frequency. The sudden power surge in the end of the plot possibly suggests excessive load consumption by a faulty device.
Also, the density of the graphs is more in the middle region which possibly suggests lot of fluctuations in power consumption as compared to the other timestamps.
In [186]:
#plot_data[['Global Active Power','Global Reactive Power', 'Voltage']] = plot_data[['Global Active Power','Global Reactive Power', 'Voltage']].astype('float64')
#print plot_data['Global Active Power'].dtype
#print plot_data['Global Reactive Power'].dtype
#print plot_data['Voltage'].dtype
#print plot_data['Datetime'].dtype
i='Global Active Power'
temp_data = Series(house_data[i].astype('float64').tolist(), index=pd.DatetimeIndex(house_data['Timestamp']))
temp_mean = temp_data.resample('H')
temp_data.plot(kind='line')
temp_mean.plot(kind='line', linewidth=4,color='red', title=i+": Actual and mean values")
#global_ap_mean = global_ap.resample('H')
#print global_ap_mean
#plot_data['Global Active Power'].plot(kind='line')
#plot_data_hour_mean['Global Active Power'].plot(kind='line', color='red')
Out[186]:
Similar Approach for the Reactive Power
In [187]:
i='Global Reactive Power'
temp_data = Series(house_data[i].astype('float64').tolist(), index=pd.DatetimeIndex(house_data['Timestamp']))
temp_mean = temp_data.resample('H')
temp_data.plot(kind='line',color='green')
temp_mean.plot(kind='line', linewidth=4,color='red', title=i+": Actual and mean values")
Out[187]:
The rise in the voltage usage by the appliances is clearly visible in the graph.
In [189]:
i='Voltage'
temp_data = Series(house_data[i].astype('float64').tolist(), index=pd.DatetimeIndex(house_data['Timestamp']))
temp_mean = temp_data.resample('H')
temp_data.plot(kind='line',color='purple')
temp_mean.plot(kind='line', linewidth=4,color='red', title=i+": Actual and mean values")
Out[189]: