Big Data Applications in Energy and Power Systems

Download Report

Transcript Big Data Applications in Energy and Power Systems

Big data, smart grids, and smart
meters
Ke Wang
Simon Fraser University
www.cs.sfu.ca/~wangk
Smart meters
• From 1 read a month to smart meter reading
every 5 minutes.
• Better understanding of customer
segmentation, behavior and how pricing
influences usage
– time-of-use pricing saves money and reduce
energy generation.
• Improve efficiency of electrical generation and
scheduling
Power signatures of three different
residential appliance categories
• Kettle and light bulb are mostly resistive (阻性负载)
• Motors (e.g., fans, heaters) are inductive (电感负载)
• Devices containing a power supply (e.g., laptops) are
capacitive (电容性负载)
• The problem: breakdown the total power
demand p(t) measured by a smart meter into
various components pi(t) that are attributed
to specific appliances i:
P(t)=P1(t)+P2(t)+….+Pn(t).
From big data to big value
• Increased profitability, reduced carbon
footprint, increased safety, enhanced
regulatory interaction and improved customer
satisfaction.
A wide range of forecasts using smart
meter data
• When and where equipment downtime and power
failures are most likely to occur
• Which customers are most likely to feed energy back to
the grid, and under what circumstances
• Which customers are most likely to respond to energy
conservation and demand reduction incentives
• How much excess energy will be available, when to sell
it
KDD Process
Social Media Mining
Data
Measures
Mining
and
Essentials
Metrics
88
Data Mining
The process of discovering hidden patterns in large
data sets
It utilizes methods at the intersection of artificial intelligence, machine learning,
statistics, and database systems
• Extracting or “mining” knowledge from large
amounts of data, or big data
• Data-driven discovery and modeling of hidden
patterns in big data
• Extracting implicit, previously unknown,
unexpected, and potentially useful
information/knowledge from data
Social Media Mining
Data
Measures
Mining
and
Essentials
Metrics
99
Supervised Learning –
Classification/prediction
F(x): true class function (usually not known)
 Input: D, training example (x,F(x))
57,M,195,0,125,95,39,25,0,1,0,0,0,1,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0
0

78,M,160,1,130,100,37,40,1,0,0,0,1,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0
69,F,180,0,115,85,40,22,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0
18,M,165,0,110,80,41,30,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
54,F,135,0,115,95,39,35,1,1,0,0,0,1,0,0,0,1,0,0,0,0,1,0,0,0,1,0,0,0,0

Output: G(x), a class model learned from D
71,M,160,1,130,105,38,20,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0

1
0
0
1
?
Goal: minimize error E[(F(x)-G(x))2] for future
examples x drawn from same distribution as in D.
Classification
Tid
Attrib1
Attrib2
Attrib3
Class
1
Yes
Large
125K
No
2
No
Medium
100K
No
3
No
Small
70K
No
4
Yes
Medium
120K
No
5
No
Large
95K
Yes
6
No
Medium
60K
No
7
Yes
Large
220K
No
8
No
Small
85K
Yes
9
No
Medium
75K
No
10
No
Small
90K
Yes
Learning
algorithm
Induction
Learn
Model
Model
10
Training Set
Tid
Attrib1
Attrib2
11
No
Small
55K
?
12
Yes
Medium
80K
?
13
Yes
Large
110K
?
14
No
Small
95K
?
15
No
Large
67K
?
10
Test Set
Attrib3
Apply
Model
Class
Deduction
Linear Regression
In linear regression, the class attribute y
is a continuous label. We use a linear
function to model the relation between y
and feature set x:
where w represents the vector of
regression coefficients
• We search for the w and e using the
provided dataset and the labels y
– The least squares is often used to solve the
problem
Social Media Mining
Data
Measures
Mining
and
Essentials
Metrics
12
12
Unsupervised Learning - clustering
• Clustering is a form of
unsupervised learning
– Unlike supervised learning,
examples do not labeled
classes, i.e., unlabeled data
• The goal is to group
together similar examples
and group dissimilar
examples apart
– Need a similarity measure
for a pair of examples
Social Media Mining
Data
Measures
Mining
and
Essentials
Metrics
13
13
Four potential values of data analytics
1. Managing smart meter data
2. Monitoring the distribution grid
3. Optimizing unit commitment
4. Forecasting and scheduling loads
1. Managing smart meter data
• Challenges
– Data storage costs can explode due to increased
data volumes
– Deal with corrupted and noisy data
– Report generation and analytics can be slow
– Ensure privacy
2. Monitoring the distribution grid
• Identify abnormal conditions and take action
to prevent power delivery disruptions and
optimize overall grid reliability.
• Challenges:
– Real time monitoring involves large volumes of
high-velocity data.
– Correlations between network events and
network failures
– Pinpoint fault locations and identify solutions
3. Optimizing unit commitment
• Optimize the scheduling of generation assets.
– Wind and solar energy sources are heavily weatherdependent and intermittent, requiring analysis of
large weather data sets to forecast output
• Challenges
– Predict which units need to be operational to meet
but not exceed demand.
– Optimize its energy source mix and avoid both
unanticipated excess capacity and costly market
purchasing.
4. Forecasting and scheduling loads
• Accurate demand forecasting is essential to
energy planning and trading.
• Challenges
– understand which parameters—weather, day of
the week or month, holidays, prior usage, price
incentives and others—actually drive demand.
– large volumes of historical information must be
analyzed and correlations identified
Case Studies at BC Hydro
• Load curve data cleansing
• Identify contaminated equipment
• Outage prediction
Case Study 1: Load curve data
cleansing
• Power consumption data collected at high
frequency, the heartbeat of power utilities and
valuable assets for smart meter applications
• Often missing, corrupted, and noisy due to
transmission error, device faulty, random events,
unknown factors, etc.
• Corrupted data do not repeat in future, thus, do
not represent “patterns” and should be repaired.
Smoothing curve techniques
References
• [1] Jiyi Chen, Wenyuan Li, Adriel Lau, Jiguo Cao
and Ke Wang. Automated Load Curve Data
Cleansing in Power Systems. IEEE Transactions on
Smart Grid, September 2010, Vol. 1, No. 2, pp
213-221.
• [2] Zhihui Guo, Wenyuan Li, Adriel Lau, Tito IngaRojas, and Ke Wang. Detecting X-Outliers in Load
Curve Data in Power Systems. IEEE Transactions
on Power Systems, Vol. 27, No. 2, May 2012, 875884.
Case Study 2: Identify contaminated
transformers
• OLYCHLORINATED biphenyal (PCB) based
dielectric fluids were used for heat insulation in
old transformers.
• PCB was later known as harmful to humans and
environments
• The problem: identify PCB contaminated
transformers and replace their oil.
• Oil sampling is very expensive
– Hermetically sealed bushing structure without
drainage valve.
– Shut down power transmission of affected areas
Solution 1
• Given a set of transformers, sample a few
transformers and build a classifier to predict
the PCB status of remaining transformers.
– Active learning: interactively request sampling
carefully chosen transformers.
• Must minimize the sum of false positive cost
(i.e., sampling cost) and false negative cost
(i.e., leaving contaminated objects
unidentified).
Sealed bushing structure without
drainage valve
……..
Active learning
• Step 1: Get initial labeled data L and unlabeled
data U
• Step 2: build a classifier M using L
• Step 3: If M is satisfied, done
• Step 4: Choose most uncertain examples from
U, label them, and update L and U, go to Step
2.
Solution 1
• Drawback: it is hard to specify the false
positive cost and false negative cost.
Solution 2
• Clearance threshold
– Instead of exact cost, specify a maximum allowed
probability that a transformer is PCB when it is cleared as
non-PCB by a method
– E.g., at most 1 hazard in 100 transformers.
• Given a collection of transformers with known PCB
status, and a clearance threshold t, we want to clear
many transformers by dividing them into groups.
– A group of transformers is cleared if any random case
chosen from the group has less than t probability of having
PCB.
Recursive grouping
Does (N=0,n=1) and (N=100,n=10) give the same
estimation?
References
• [3] Yin Chu Yeh, Wenyuan Li, Adriel Lau, and
Ke Wang. Identifying PCB Contaminated
Transformers through Active Learning. IEEE
Transactions on Power Systems, Vol. 28, No. 4,
Nov. 2013, 3999-4006
• [4] Ryan McBride, Ke Wang, Wenyuan Li.
Classification by CUT: Clearance Under
Threshold. IEEE ICDM 2014 conference,
December 2014.
Case Study 3: Outage Prediction
• Every winter, greater Vancouver area will
experience some power outages caused by
storms.
• Outage Prediction
– Build software to predict if feeders/equipment are
at a high-risk of outages during a forecasted
storm, by
• Accessing historical data about outages and feeders.
• Accessing relevant external data (e.g., weather
conditions during previous outages).
35
General Approach
Use BC Hydro data and External data to derive records of
tree/storm related outages:
BC Hydro Data
External Data
Entity
Length
Result
Vegetation
Wind Speed
In Storm
Feeder F1
5 km
Outage
0.2 NDVI
25 m/s N
Yes
Feeder F2
15 km
NonOutage
0.4 NDVI
20 m/s W
Yes
Feeder F2
24 km
NonOutage
0.5 NDVI
10 m/s E
No
• Then build a model to examine a new case such as:
Entity
Length
Result
Vegetation
Wind Speed
In Storm
Feeder F3
4 km
?
0.4 NDVI
50 m/s N
Yes
• to infer the chance of an outage in ? to better distribute
resources, mitigate risk.
36
Conclusion
• Many opportunities for big data analytics to
help in power and energy industries.
• The keys: historical data, “know how” in
domain applications, and techniques in data
analytics.
• Collaboration between data scientists and
engineers in power industries is crucial.
Acknowledgements
• 3 CRD grants supported by NSERC and BC Hydro,
Canada.
• Collaboration between BC Hydro and Simon Fraser
University from 2009 to present.
• 3 MSc student graduated, and 1 PhD student.
Employed by Shanghai Stock Exchange, Microsoft
(Seattle), and Google.
• Publications in IEEE Transactions on Smart Grids and
IEEE Transactions on Power Systems, International
Conference on Data Mining, and several other papers.
• Software being used by BC Hydro.