Revision 82dd582045b2ec890b46e78d5419389be4916146 (click the page title to view the current version)

Experiments/BreakDownStudies/Reports/13Odstrcilove/13MiOd/Report_MO

Changes from beginning to 82dd582045b2ec890b46e78d5419389be4916146

---
format: latex
toc: yes
title:  GOLEM breakdown report 
categories: Reports
...



The learning machines and other algorithms described in  the previous sections were applied on data set based on database of shots from tokamak Golem   but 
 some algorithms for the feature elimination were used only for the SVM algorithm. %because they are not implemented for MatLab.

The main aim of this section is to test methods introduced in the previous sections on a relatively small data set with simply comparable results and quite with well known physical background. This simple model shows problems  such as many unimportant dimensions, hidden (unknown) dimensions such as amount of plasma impurities and outliers --  machine failure. 

 \includegraphics[scale = 0.8]{./images/graf.gif}


The Golem database was used to determine probability of the plasma breakdown based on parameters that were set up before the shot. These available parameters are 

\begin{center}
\begin{tabular}{lll}
Shortcut & Description &  Range \\\hline 
H2 &  Enabled H2 filling & 0,1\\
$U_{b}$& charge of capacitors of magnetic field  & 0-800 V\\
$U_{cd}$ &  charge of capacitors of current drive & 0-1200 V\\
$U_{bd}$ & charge of capacitors of breakdown field & 0-500 V\\
$U_{st}$& charge of capacitors of vertical stabilisation field & 0-500 V\\
$P$ & pressure of H2 & 0-250\,mPa\\
$T_{cd}$& delay of  current drive trigger to main trigger& 0-10\,ms\\
$T_{bd}$&  delay of  breakdown trigger to main trigger&  0-10\,ms\\
$T_{st}$ &  delay of  stabilisation field trigger to main trigger &  0-10\,ms\\
PreIonisation & Preionisation electrode &  0,1\\
\end{tabular}
\end{center}


% The main reason, why this model was used, is that it shows many problems described in previous chapters while the problem can be quite easily analysed because it is less complex. 

\section{Preprocessing}

% 
% normalizace  -  eq:normalisation
% zlogaritmování
% vylepšení magnetického pole
The raw data had to be slightly preprocessed in order to improve prediction and add a priori knowledge. The first problem is a high number of the dimensions and their correlations. It is important to remove less important dimensions to avoid overfitting. Here, it could be done from a priori knowledge. The first property $H2$ filling is necessary condition to make breakdown thus it can be removed and also all shots when there was reached  breakdown without the filling gas, because it is only a minor unimportant group -- outliers. 

The next step was to join some dimensions to ensure the independence of variables. In this case $U_{b}$ plus $T_{cd}$ or $U_{b}$ plus $T_{bd}$ gives magnetic field in time of maximal current drive or breakdown field. Furthermore, instead of absolute values of $T_{cd}$ and $T_{bd}$ only their time shift is important, thus only the difference should be used. 


 Finally, the gas pressure was transformed using the logarithmic substitution in order to allow faster changes near to the ``vacuum'' pressure -- $\approx5$\,mPa. 



The results of the SVM algorithm with RBF kernel before this  feature elimination using  a priori knowledge are:

\begin{tabular}{r|rrrrrr}
Group	& Precision	& Recall	& F1-score	&CV score&	N$_{SV}$&	N$_\text{VEC}$ \\ \hline   
0 &	 0.834 &	 0.627 &	 0.648 	& 0.338 	& 265 &	 367 \\  
1 &	 0.885 	& 0.990 &	 0.467 &	 0.786 &	 230 	& 1064 \\   \hline  
 Total: & 0.858 &  &	&	&	 	&  \\  
Total SV: &34.591\%   &  &	&	&	 	&  \\  
Best parameters: &C=1000   &       G=0.005   &	&	&	 	&  \\  
\end{tabular}


and after the  these changes were the results


\begin{tabular}{r|rrrr}
 Group     &       Precision   & Recall&  F1-score  & N$_{SV}$ \\\hline 
          0  &     0.93   &   0.67 &     0.78  &     302\\ 
          1   &    0.91 &     0.99 &     0.95  &    1034\\ \hline   
avg / total   &    0.92   &   0.91  &    0.91  &    1336\\ 
\end{tabular}



It is important to notice two things: firstly, the dimensions cut-off can lead to worse results in the total precision. This was expected and it even quite successful result when the dimensionality of the problem was decreased by 40\% and the prediction success rests almost the same.  The second issue is a common property of the described learning algorithms: they usually give worse results to the smaller class. In this case, it was group 0 ({\it no breakdown}), because breakdowns were usually requested by operators. %Consequently, the data sets have significantly different number of the data points.
 The number of shots with breakdown is 1034 and shots without breakdown is only 302. This disproportion should be compensated. It is possible to use two ways: change threshold (bias) or change weight of the classes. Moreover, a weighted probability loss function was used.

Results for the same dataset using  the linear kernel and SVM method are much worse.
The total precision decreased to mere 80\%  but it is important to note that value 50\% is only a random guess and value 77\% if all thee points would be predicted as breakdown. This problem is caused by linear non-separability of the data, as is shown in Fig. \ref{fig:main_result}.



\section{Random probes}

The method of random probes introduced in Section \ref{sec:validation} was used. Three probes were created: the first probe {\it test\_1} was random permutation of numbers from 1 to number of samples, the second {\it test\_2} was random permutation of U$_b$ and the last {\it test\_3} was random permutation of U$_cd$.  These probes should have the weight much lower than the rest of the valid variables.

% The results with all variables and also random probes included is for SVM with RBF kernel
%  
% =============REPORT======= SVM kernel: 2
% group	 precis	 recall	 F1scor	CVscore	nSV	nVEC
% 0 	 0.913 	 0.636 	 0.375 	 0.317 	 233 	 346 
% 1 	 0.891 	 0.980 	 0.467 	 0.784 	 225 	 1056 
% Total: 0.808
% Total SV: 32.668 \%  N\_SV: 458
% 



\section{Univariate feature elimination}

The first tested method was univariate feature elimination based on statistical ANOVA test (see Section \ref{sec:dim_reduc}). The  ANOVA filter returns weight corresponding to statistical difference between groups for given variable. The great advantage is almost immediate speed of the filtering. Disadvantage is quite poor performance compared to other methods. The resulting weights are in fig. \ref{fig:ANOVA}. According to the ANOVA, the random probes are  one of the most important variables and, on the other hand, the  pressure -- theoretically the crucial variable  -- is negligible. This is significantly different from the expected results. It is caused because the classes in database are not linearly separable variable in each dimension. For example, the breakdown fails for too high pressure and also for too low pressure.

% The weights for random probes could be caused by different number of members in each class. 

 The unexpected  weights of the random probes could be caused by violation of of the ANOVA model assumptions, i.e. independence of cases, normality of the residuals, homogeneity -- variance of the groups should be the same.

\begin{figure}
 \centering
 \includegraphics[scale = 0.8]{./images/UFS-univariate.pdf.png}
 % UFS-univariate.pdf.png: 347x295 pixel, 72dpi, 12.24x10.41 cm, bb=0 0 347 295
\caption{Results of univariate feature elimination with ANOVA method applied on data from GOLEM tokamak.}
\label{fig:ANOVA}
\end{figure}





\section{Recursive feature elimination with linear SVM}
 
The next tested method is based on the Recursive Feature Elimination (RFE) with linear SVM, but the results are similar for all the described methods with the linear kernel. The solving speed was slower than the  ANOVA filter but still it took only a few minutes. The weights are normalized absolute values of the normal vector of the SVM linear hyperplane. It is possible to make prediction for all variables and use only $n$ of the most important or iteratively remove only the least important dimension and perform the training and prediction/pruning  again. The first way is faster but the second way gives result that should be less affected by random variable that were removed in previous steps. 

The results  are shown  in Fig. \ref{fig:RFElinearSVM}. The results are significantly different from the ANOVA weights,  the linear weighting successfully ignored random probes. Also it is important to note that the weight stayed very  similar during elimination, the only exception is $\Delta T$ variable. 

% !! zkontrolovat pořadí veličin !!!



\begin{figure}
 \centering
 \includegraphics[scale = 0.8]{./images/FE-linear-SVM.pdf.png}
 % FE-linear-SVM.pdf.png: 347x295 pixel, 72dpi, 12.24x10.41 cm, bb=0 0 347 295
 \caption{Weights for input dimension used in the GOLEM tokamak when the recursive feature elimination with linear SVM was applied. The black bars are weights for all variables. The red bars are weights for  seven the most important variables.}
 \label{fig:RFElinearSVM}
\end{figure}

Basic statistical characteristics of the fitted model are shown in the following table. Recall for the class 0 is very low; it means that linear model almost ignored the smaller class. However, it was expected since the model is not linearly separable. 


\begin{center}
\begin{tabular}{r|rrrr}
Group	& Precision&	 Recall	& F1-score&	CV-score\\\hline 
0 	 & 0.77 &	  0.29 	&  0.21 &	  0.39\\ 
1 	 & 0.80 &	  0.97 	&  0.44 &	  0.78\\ 
Total: &0.80
\end{tabular}
\end{center}


The next step is to use the weights from linear hyperplane estimate the influence of each dimension.
\begin{center}
\begin{tabular}{r|rrrrrrrr}

 keys: & Bfield &	 PreIon 	 &$P$ 	& $\Delta T$ 	& $T_{bd}$ 	& $T_{cd}$ &	 $U_{bd}$ &	 $U_{cd}$ 	 \\  \hline       
weights: &  0.34 &	  0.34 &	  0.33 &	  0.12 	&  0.08 &	 -0.13 	& -0.20 &	  0.53  
\label{tab:lin_weights1}
\end{tabular}
\end{center}
 Almost all variables increased breakdown probability with increasing variable size, the only interesting exception is voltage in breakdown coils -- U$_{bd}$. However, the breakdown field should always improve the chance to reach breakdown. This was probably a  non-physical effect caused by the fact that the highest fields were tested in the cas of machine failure. % I could not find any explanation when it should be negative.



\section{Recursive feature elimination with RBF kernel and SVM}

This method was  described in the  Section \ref{sec:wrappers}. The weights correspond to increase of loss function when the belonging variable was removed.  Full cross-validation was performed in order to find the best parameters for each variable combination -- each parameters combination was solved 5 times. The results from the cross-validation  were used to  estimate errorbars of the weights. 

The advantage of the method is that it is fully nonlinear method and thus the weights are not biased by assumption of linear separability, the disadvantage are high computational demands. 

The results are shown in Fig. \ref{fig:nonlinRFE-weights}. The results fit very well to the expected behaviour. Weights of the random probes are not zero, but within the errorbars they are correct. Also, on the contrary to the other method, pressure was determined as a very important variable together with the  magnetic field and current drive. It is also quite interesting that $\Delta T$, delay between breakdown and current drive, have no importance although linear SVM and ANOVA filter predicted quite high importance to this variable.




In Fig. \ref{fig:nonlinRFE-order} is plotted the order of the eliminated variables. The variables eliminated at the  beginning have very similar weights, so the order is not reliable. Interesting is that probe variable {\it test\_3} stayed quite long. 

Finally, the evolution of the predictive probability for each step of the RFE algorithm is in Fig. \ref{fig:nonlinRFE-rate}. During the first four steps,  the prediction is almost constant then the classification rate is  decreased although the {\it test\_3} variable was still in training set, so it was still overfitting. 

The optimal number of variables is 5-6.%, it depends on allowing of T$_{st}$, U$_{st}$ although they have no real information about probability.







\begin{figure}
 \centering
 \includegraphics[scale = 0.8]{./images/nonlinRFE-weights.pdf.png}
 % nonlinRFE-class_rate.pdf.png: 482x297 pixel, 72dpi, 17.00x10.48 cm, bb=0 0 482 297
\caption{ Recursive Feature Elimination (RFE) using the nonlinear SVM predictor. Errorbars are estimated from the  cross-validation}
 \label{fig:nonlinRFE-weights}

\end{figure}


% !!!! vyměnit ten graf za správný ?? !! podle kterého to vyhazovalo 

\begin{figure}
 \centering
 \includegraphics[scale = 0.8]{./images/RFE-nonlinearSVM-order.pdf.png}
 % nonlinRFE-class_rate.pdf.png: 482x297 pixel, 72dpi, 17.00x10.48 cm, bb=0 0 482 297
\caption{Elimination order of variables using  the Recursive Feature Elimination (RFE) and nonlinear SVM. The lower number, the less important variable. }
 \label{fig:nonlinRFE-order}

\end{figure}



\begin{figure}
 \centering
 \includegraphics[scale = 0.8]{./images/RFE-nonlinearSVM-rate.pdf.png}
 % nonlinRFE-class_rate.pdf.png: 482x297 pixel, 72dpi, 17.00x10.48 cm, bb=0 0 482 297
\caption{The classification score for a different number of inputs selected by the  nonlinear RFE}
 \label{fig:nonlinRFE-rate}

\end{figure}



\begin{table}



\begin{tabular}{ll}
% \multicolumn{2}{c}{} \\

 \includegraphics[scale=0.5]{./images/graf_Ub10001750.pdf.png} &  \includegraphics[scale=0.5]{./images/graf_Ub10000850.pdf.png} \\

 \includegraphics[scale=0.5]{./images/graf_Ub10001400.pdf.png}  &  \includegraphics[scale=0.5]{./images/graf_Ub10000500.pdf.png}\\

\end{tabular}

\caption{Cuts through the predicted probability of breakdown. The probability prediction is the output of the  SVM algorithm LibSVM. Black points are shots without breakdown and white points are with breakdown. Contour lines denote 30\% and 80\% decision boundary. It should be noted that the boundary shape is very similar to the Paschen's curve. }

 \label{fig:main_result}

\end{table}

Results for different learning machines when only four dimensions were used are  in following table. The differences between the misclassification are not significant for the  different algorithms because random changes between each run can produce different results. One important property is sparsity of the model. The sparsest and thus usually the less complex model is the RVM. Interesting is that the linear SVM have less sparse model but it is caused by linear non-separability of the model and thus by high misclassification rate and the plane can be still described by 5 parameters.

\begin{center}
\begin{tabular}{r|rr}
Machine		&	Total Error		&	Main vectors	\\\hline	
{\bf RBF kernel - 4 dimensions} &				&	\\
SVM		&	0.941 			&	34\%	\\
LogReg L1	&	0.926 			&	100\%	\\
LogReg L2	&	0.959			&	100\%	\\
RVM		&	0.912 			&	5\%	\\
{\bf linear kernel - all dimensions} &				&	\\
SVM		&	0.67 			&	46\%	\\
\end{tabular}
\end{center}

\section{Code}

Example of use

\begin{verbatim}
./predict_breakdown.py --train --plot
./predict_breakdown.py --test --Ub=400 --Ucd=500  --pressure=27.84  --Tcd=0.01
./predict_breakdown.py --outliers
\end{verbatim}

Links to files 

\begin{itemize}
\item \href{predict_breakdown.py}{predict\_breakdown.py} - the algorithm itself
\item \href{data_object.py}{data\_object.py} - class Data 
\item \href{data_breakdown.npz}{data\_breakdown.npz} - data from shot 5000 to 10700
\end{itemize}