Skip to content

Commit 6330c50

Browse files
author
Marta Vanin
committed
more docs and new version tag
1 parent bde5427 commit 6330c50

File tree

2 files changed

+88
-13
lines changed

2 files changed

+88
-13
lines changed

Project.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
name = "PowerModelsDistributionStateEstimation"
22
uuid = "d0713e65-ce0c-4a8e-a1da-2ed737d217c5"
33
authors = ["Marta Vanin", "Tom Van Acker"]
4-
version = "0.4.1"
4+
version = "0.4.2"
55

66
[deps]
77
CSV = "336ed68f-0bac-5ca0-87d4-7b16caf5d00b"

docs/src/bad_data.md

Lines changed: 87 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -11,21 +11,96 @@ The LAV residual analysis can be done with all previous versions of the package
1111
needs to pass `wlav` or `rwlav` as a state estimation criterion, and assign a unitary standard deviation for all weights or all measurements. Now it is sufficient to
1212
pass `lav` as a state estimation criterion.
1313

14-
All these three techniques are very standard techniques, and a thorough theoretical discussion can be found in the well-known book: "Power system state estimation - Theory and implementation" by A. Abur and A. G. Exposito. Below, a more concise introduction.
14+
All these three techniques are very standard techniques, and a thorough theoretical discussion can be found in the well-known book: "Power system state estimation - Theory and implementation" by A. Abur and A. G. Exposito. Furthermore, numerous papers also address in which circumstances the different techniques are more or less effective.
15+
16+
Below, just a functional introduction.
17+
18+
First of all,
19+
- Bad data *detection* consists of answering the yes/no question: "is there bad data"?
20+
- Bad data *identification* consists of locating which data points are bad (to subsequently correct them or delete them).
21+
22+
All the presented techniques require the user to first run a state estimation algorithm, as they are based on the analysis of its residuals.
1523

1624
## Chi-square Analysis
1725

26+
Chi-squares ($\Chi^2$) analysis is a bad data *detection* method. If bad data are detected, these still need to be identified.
27+
28+
The method is based on the following assumptions: if all measurement errors follow a Normal distribution, and there are no bad data, then the sum of the weighted squared residuals follows a Chi-square distributions with *m-n* degrees of freedom, where *m* is the number of measurements and *n* of the system variables.
29+
30+
The function `exceeds_chi_squares_threshold` takes as input the solution of a state estimation calculation and the data dictionary. It calculates the degrees of freedom and the sum of the weighted square residuals (where the weights are the inverse of each measurement's variance). If the state estimation that was run was a `wls` estimation with no weight rescaler, this sum corresponds to the objective value. However, the function always calculates the sum, to allow the user to use Chi-square calculations in combination with measurement rescalers or other state estimation criteria.
31+
```@docs
32+
PowerModelsDistributionStateEstimation.exceeds_chi_squares_threshold(sol_dict::Dict, data::Dict; prob_false::Float64=0.05, suppress_display::Bool=false)
33+
```
34+
The function returns a boolean that states whether bad data are suspected, the value of the sum of the residuals and the threshold value above which bad data are suspected.
35+
The threshold value depends on the degrees of freedom and the detection confidence probability, that cab=n be set by the user. The default value of the latter is 0.05, as this is often the choice in the literature.
36+
37+
## Largest Normalized Residuals
38+
39+
Normalized residuals can be used for both bad data *detection* and *identification*. Let the residuals be $r_i = h_i(\mathbf{x}) - \mathbf{z}$, where $h$ are the measurement functions, $\mathbf{x}$ are the system variables and $\mathbf{z}$ is the measurement vector. This is often the standard notation, e.g., in the book by Abur and Exposito.
40+
The normalized residuals $r^N_i$ are:
41+
42+
```math
43+
\begin{align}
44+
&r_i^N = \frac{|r_i|}{\sqrt{\Omega_{ii}}} = \frac{|r_i|}{\sqrt{R_{ii}S_{ii}}}
45+
\end{align}
46+
```
47+
The largest $r^N$ is compared to a threshold, typically 3.0 in the literature. If its value exceeds the threshold, bad data are suspected, and the bad data point is identified as the measurement that corresponds to the largest $r^N$ itself.
48+
This package contains different functions that allow to build the measurement matrix (H), the measurement error covariance matrix (R), the gain matrix (G), the hat matrix (K), the sensitivity matrix (S) and the residual covariance matrix ($\Omega$):
49+
```@docs
50+
PowerModelsDistributionStateEstimation.build_H_matrix(functions::Vector, state::Array)::Matrix{Float64}
51+
```
52+
```@docs
53+
build_G_matrix(H::Matrix, R::Matrix)::Matrix{Float64}
54+
```
55+
```@docs
56+
build_R_matrix(data::Dict)::Matrix{Float64}
57+
```
58+
```@docs
59+
build_omega_matrix(R::Matrix{Float64}, H::Matrix{Float64}, G::Matrix{Float64})
60+
```
61+
```@docs
62+
build_omega_matrix(S::Matrix{Float64}, R::Matrix{Float64})
63+
```
64+
```@docs
65+
build_S_matrix(K::Matrix{Float64})
66+
```
67+
```@docs
68+
build_K_matrix(H::Matrix{Float64}, G::Matrix{Float64}, R::Matrix{Float64})
69+
```
70+
$\Omega$ can then be used in the function `normalized_residuals`, which calculates all $r_i$, returns the highest $r^N$ and indicates whether its value exceeds the threshold or not.
71+
Again, the $r_i$ calculation is independent of the chosen state estimation criterion or weight rescaler.
72+
```@docs
73+
PowerModelsDistributionStateEstimation.normalized_residuals(data::Dict, se_sol::Dict, Ω::Matrix; t::Float64=3.0)
74+
```
75+
Finally, a simplified version of the largest normalized residuals is available: `simple_normalized_residuals`, that instead of calculating the $\Omega$ matrix, calculates the normalized residuals as:
1876
```math
1977
\begin{align}
20-
%
21-
\mbox{sets:} & \nonumber \\
22-
& \mathcal{N} \mbox{ - buses}\nonumber \\
23-
& \mathcal{R} \mbox{ - references buses}\nonumber \\
24-
& \mathcal{E}, \mathcal{E}_i \mbox{ - branches, branches to and from bus $i$} \nonumber \\
25-
& \mathcal{G}, \mathcal{G}_i \mbox{ - generators and generators at bus $i$} \nonumber \\
26-
& \mathcal{L}, \mathcal{L}_i \mbox{ - loads and loads at bus $i$} \nonumber \\
27-
& \mathcal{S}, \mathcal{S}_i \mbox{ - shunts and shunts at bus $i$} \nonumber \\
28-
& \Phi. \Phi_{ij} \mbox{ - conductors, conductors of branch $(ij)$} \nonumber \\
29-
%
78+
&r_i^N = \frac{|r_i|}{\sqrt{\Omega_{ii}}} = \frac{|r_i|}{\sqrt{R_{ii}^2}}
3079
\end{align}
31-
```
80+
```
81+
```@docs
82+
PowerModelsDistributionStateEstimation.simple_normalized_residuals(data::Dict, se_sol::Dict, R::Matrix)
83+
```
84+
85+
## LAV Estimator Residual Analysis
86+
87+
The LAV estimator is known to be inherently robust to bad data, as it is basically a linear regression.
88+
Thus, it is sufficient to run it and then check its residuals as in the piece of code below. The residuals do not even need to be calculated, because in a `LAV` estimation, they are by default reported as `res` in the solution dictionary. As such, the user only needs to sort the residuals in descending orders, see what their magnitude is, and whether some residuals are much higher than the others. The latter, in general, points out the bad data.
89+
90+
```julia
91+
bad_data["se_settings"] = Dict{String,Any}("criterion" => "lav",
92+
"rescaler" => 1)
93+
94+
se_result_bd_lav = _PMDSE.solve_acp_red_mc_se(bad_data, solver)
95+
residual_tuples = [(m, maximum(meas["res"])[1]) for (m, meas) in se_result_bd_lav["solution"]["meas"]]
96+
sorted_tuples = sort(residual_tuples, by = last, rev = true)
97+
measurement_index_of_largest_residual = first(sorted_tuples[1])
98+
magnitude_of_largest_residual = last(sorted_tuples[1])
99+
ratio12 = (last(sorted_tuples[1])/last(sorted_tuples[2])) # ratio between the first and the second largest residuals.
100+
```
101+
102+
## Other Notes
103+
104+
Virtually all bad data detection and identification methods from the literature are done "a-posteriori", i.e., after running a state estimation, by performing statistical considerations on the measurement residuals, or "a priori" doing some measurement data pre-processing (these are not discussed but could be, e.g., removing missing data or absurd measurements like negative or zero voltage). Thus, it is easy for the user to use this framework to just run the state estimation calculations and then add customized bad data handling methods that take as input the input measurement dictionary and/or the output solution dictionary.
105+
106+
An example on how to use this package to perform bad data detection and identification can be found at this [link](https://github.com/MartaVanin/SE_framework_paper_results): see both its readme and the file `src/scripts/case_study_E.jl`.

0 commit comments

Comments
 (0)