GOLEM breakdown report

The learning machines and other algorithms described in the previous sections were applied on data set based on database of shots from tokamak Golem but some algorithms for the feature elimination were used only for the SVM algorithm.

The main aim of this section is to test methods introduced in the previous sections on a relatively small data set with simply comparable results and quite with well known physical background. This simple model shows problems such as many unimportant dimensions, hidden (unknown) dimensions such as amount of plasma impurities and outliers – machine failure.

image

The Golem database was used to determine probability of the plasma breakdown based on parameters that were set up before the shot. These available parameters are

Shortcut Description Range
H2 Enabled H2 filling 0,1
\(U_{b}\) charge of capacitors of magnetic field 0-800 V
\(U_{cd}\) charge of capacitors of current drive 0-1200 V
\(U_{bd}\) charge of capacitors of breakdown field 0-500 V
\(U_{st}\) charge of capacitors of vertical stabilisation field 0-500 V
\(P\) pressure of H2 0-250 mPa
\(T_{cd}\) delay of current drive trigger to main trigger 0-10 ms
\(T_{bd}\) delay of breakdown trigger to main trigger 0-10 ms
\(T_{st}\) delay of stabilisation field trigger to main trigger 0-10 ms
PreIonisation Preionisation electrode 0,1

Preprocessing

The raw data had to be slightly preprocessed in order to improve prediction and add a priori knowledge. The first problem is a high number of the dimensions and their correlations. It is important to remove less important dimensions to avoid overfitting. Here, it could be done from a priori knowledge. The first property \(H2\) filling is necessary condition to make breakdown thus it can be removed and also all shots when there was reached breakdown without the filling gas, because it is only a minor unimportant group – outliers.

The next step was to join some dimensions to ensure the independence of variables. In this case \(U_{b}\) plus \(T_{cd}\) or \(U_{b}\) plus \(T_{bd}\) gives magnetic field in time of maximal current drive or breakdown field. Furthermore, instead of absolute values of \(T_{cd}\) and \(T_{bd}\) only their time shift is important, thus only the difference should be used.

Finally, the gas pressure was transformed using the logarithmic substitution in order to allow faster changes near to the “vacuum” pressure – \(\approx5\) mPa.

The results of the SVM algorithm with RBF kernel before this feature elimination using a priori knowledge are:

Group Precision Recall F1-score CV score N\(_{SV}\) N\(_\text{VEC}\)
0 0.834 0.627 0.648 0.338 265 367
1 0.885 0.990 0.467 0.786 230 1064
Total: 0.858
Total SV: 34.591%
Best parameters: C=1000 G=0.005

and after the these changes were the results

Group Precision Recall F1-score N\(_{SV}\)
0 0.93 0.67 0.78 302
1 0.91 0.99 0.95 1034
avg / total 0.92 0.91 0.91 1336

It is important to notice two things: firstly, the dimensions cut-off can lead to worse results in the total precision. This was expected and it even quite successful result when the dimensionality of the problem was decreased by 40% and the prediction success rests almost the same. The second issue is a common property of the described learning algorithms: they usually give worse results to the smaller class. In this case, it was group 0 (no breakdown), because breakdowns were usually requested by operators. The number of shots with breakdown is 1034 and shots without breakdown is only 302. This disproportion should be compensated. It is possible to use two ways: change threshold (bias) or change weight of the classes. Moreover, a weighted probability loss function was used.

Results for the same dataset using the linear kernel and SVM method are much worse. The total precision decreased to mere 80% but it is important to note that value 50% is only a random guess and value 77% if all thee points would be predicted as breakdown. This problem is caused by linear non-separability of the data, as is shown in Fig. [fig:main_result].

Random probes

The method of random probes introduced in Section [sec:validation] was used. Three probes were created: the first probe test_1 was random permutation of numbers from 1 to number of samples, the second test_2 was random permutation of U\(_b\) and the last test_3 was random permutation of U\(_cd\). These probes should have the weight much lower than the rest of the valid variables.

Univariate feature elimination

The first tested method was univariate feature elimination based on statistical ANOVA test (see Section [sec:dim_reduc]). The ANOVA filter returns weight corresponding to statistical difference between groups for given variable. The great advantage is almost immediate speed of the filtering. Disadvantage is quite poor performance compared to other methods. The resulting weights are in fig. [fig:ANOVA]. According to the ANOVA, the random probes are one of the most important variables and, on the other hand, the pressure – theoretically the crucial variable – is negligible. This is significantly different from the expected results. It is caused because the classes in database are not linearly separable variable in each dimension. For example, the breakdown fails for too high pressure and also for too low pressure.

The unexpected weights of the random probes could be caused by violation of of the ANOVA model assumptions, i.e. independence of cases, normality of the residuals, homogeneity – variance of the groups should be the same.

Results of univariate feature elimination with ANOVA method applied on data from GOLEM tokamak.
Results of univariate feature elimination with ANOVA method applied on data from GOLEM tokamak.

Recursive feature elimination with linear SVM

The next tested method is based on the Recursive Feature Elimination (RFE) with linear SVM, but the results are similar for all the described methods with the linear kernel. The solving speed was slower than the ANOVA filter but still it took only a few minutes. The weights are normalized absolute values of the normal vector of the SVM linear hyperplane. It is possible to make prediction for all variables and use only \(n\) of the most important or iteratively remove only the least important dimension and perform the training and prediction/pruning again. The first way is faster but the second way gives result that should be less affected by random variable that were removed in previous steps.

The results are shown in Fig. [fig:RFElinearSVM]. The results are significantly different from the ANOVA weights, the linear weighting successfully ignored random probes. Also it is important to note that the weight stayed very similar during elimination, the only exception is \(\Delta T\) variable.

Weights for input dimension used in the GOLEM tokamak when the recursive feature elimination with linear SVM was applied. The black bars are weights for all variables. The red bars are weights for seven the most important variables.
Weights for input dimension used in the GOLEM tokamak when the recursive feature elimination with linear SVM was applied. The black bars are weights for all variables. The red bars are weights for seven the most important variables.

Basic statistical characteristics of the fitted model are shown in the following table. Recall for the class 0 is very low; it means that linear model almost ignored the smaller class. However, it was expected since the model is not linearly separable.

Group Precision Recall F1-score CV-score
0 0.77 0.29 0.21 0.39
1 0.80 0.97 0.44 0.78
Total: 0.80

The next step is to use the weights from linear hyperplane estimate the influence of each dimension.

keys: Bfield PreIon \(P\) \(\Delta T\) \(T_{bd}\) \(T_{cd}\) \(U_{bd}\) \(U_{cd}\)
weights: 0.34 0.34 0.33 0.12 0.08 -0.13 -0.20 0.53 [tab:lin_weights1]

Almost all variables increased breakdown probability with increasing variable size, the only interesting exception is voltage in breakdown coils – U\(_{bd}\). However, the breakdown field should always improve the chance to reach breakdown. This was probably a non-physical effect caused by the fact that the highest fields were tested in the cas of machine failure.

Recursive feature elimination with RBF kernel and SVM

This method was described in the Section [sec:wrappers]. The weights correspond to increase of loss function when the belonging variable was removed. Full cross-validation was performed in order to find the best parameters for each variable combination – each parameters combination was solved 5 times. The results from the cross-validation were used to estimate errorbars of the weights.

The advantage of the method is that it is fully nonlinear method and thus the weights are not biased by assumption of linear separability, the disadvantage are high computational demands.

The results are shown in Fig. [fig:nonlinRFE-weights]. The results fit very well to the expected behaviour. Weights of the random probes are not zero, but within the errorbars they are correct. Also, on the contrary to the other method, pressure was determined as a very important variable together with the magnetic field and current drive. It is also quite interesting that \(\Delta T\), delay between breakdown and current drive, have no importance although linear SVM and ANOVA filter predicted quite high importance to this variable.

In Fig. [fig:nonlinRFE-order] is plotted the order of the eliminated variables. The variables eliminated at the beginning have very similar weights, so the order is not reliable. Interesting is that probe variable test_3 stayed quite long.

Finally, the evolution of the predictive probability for each step of the RFE algorithm is in Fig. [fig:nonlinRFE-rate]. During the first four steps, the prediction is almost constant then the classification rate is decreased although the test_3 variable was still in training set, so it was still overfitting.

The optimal number of variables is 5-6.

 Recursive Feature Elimination (RFE) using the nonlinear SVM predictor. Errorbars are estimated from the cross-validation
Recursive Feature Elimination (RFE) using the nonlinear SVM predictor. Errorbars are estimated from the cross-validation
Elimination order of variables using the Recursive Feature Elimination (RFE) and nonlinear SVM. The lower number, the less important variable.
Elimination order of variables using the Recursive Feature Elimination (RFE) and nonlinear SVM. The lower number, the less important variable.
The classification score for a different number of inputs selected by the nonlinear RFE
The classification score for a different number of inputs selected by the nonlinear RFE
Cuts through the predicted probability of breakdown. The probability prediction is the output of the SVM algorithm LibSVM. Black points are shots without breakdown and white points are with breakdown. Contour lines denote 30% and 80% decision boundary. It should be noted that the boundary shape is very similar to the Paschen’s curve.
image image
image image

[fig:main_result]

Results for different learning machines when only four dimensions were used are in following table. The differences between the misclassification are not significant for the different algorithms because random changes between each run can produce different results. One important property is sparsity of the model. The sparsest and thus usually the less complex model is the RVM. Interesting is that the linear SVM have less sparse model but it is caused by linear non-separability of the model and thus by high misclassification rate and the plane can be still described by 5 parameters.

Machine Total Error Main vectors
RBF kernel - 4 dimensions
SVM 0.941 34%
LogReg L1 0.926 100%
LogReg L2 0.959 100%
RVM 0.912 5%
linear kernel - all dimensions
SVM 0.67 46%

Code

Example of use

./predict_breakdown.py --train --plot
./predict_breakdown.py --test --Ub=400 --Ucd=500  --pressure=27.84  --Tcd=0.01
./predict_breakdown.py --outliers

Links to files