# GOLEM breakdown report

The learning machines and other algorithms described in the previous sections were applied on data set based on database of shots from tokamak Golem but some algorithms for the feature elimination were used only for the SVM algorithm.

The main aim of this section is to test methods introduced in the previous sections on a relatively small data set with simply comparable results and quite with well known physical background. This simple model shows problems such as many unimportant dimensions, hidden (unknown) dimensions such as amount of plasma impurities and outliers – machine failure.

The Golem database was used to determine probability of the plasma breakdown based on parameters that were set up before the shot. These available parameters are

Shortcut | Description | Range |
---|---|---|

H2 | Enabled H2 filling | 0,1 |

\(U_{b}\) | charge of capacitors of magnetic field | 0-800 V |

\(U_{cd}\) | charge of capacitors of current drive | 0-1200 V |

\(U_{bd}\) | charge of capacitors of breakdown field | 0-500 V |

\(U_{st}\) | charge of capacitors of vertical stabilisation field | 0-500 V |

\(P\) | pressure of H2 | 0-250 mPa |

\(T_{cd}\) | delay of current drive trigger to main trigger | 0-10 ms |

\(T_{bd}\) | delay of breakdown trigger to main trigger | 0-10 ms |

\(T_{st}\) | delay of stabilisation field trigger to main trigger | 0-10 ms |

PreIonisation | Preionisation electrode | 0,1 |

# Preprocessing

The raw data had to be slightly preprocessed in order to improve prediction and add a priori knowledge. The first problem is a high number of the dimensions and their correlations. It is important to remove less important dimensions to avoid overfitting. Here, it could be done from a priori knowledge. The first property \(H2\) filling is necessary condition to make breakdown thus it can be removed and also all shots when there was reached breakdown without the filling gas, because it is only a minor unimportant group – outliers.

The next step was to join some dimensions to ensure the independence of variables. In this case \(U_{b}\) plus \(T_{cd}\) or \(U_{b}\) plus \(T_{bd}\) gives magnetic field in time of maximal current drive or breakdown field. Furthermore, instead of absolute values of \(T_{cd}\) and \(T_{bd}\) only their time shift is important, thus only the difference should be used.

Finally, the gas pressure was transformed using the logarithmic substitution in order to allow faster changes near to the “vacuum” pressure – \(\approx5\) mPa.

The results of the SVM algorithm with RBF kernel before this feature elimination using a priori knowledge are:

Group | Precision | Recall | F1-score | CV score | N\(_{SV}\) | N\(_\text{VEC}\) |
---|---|---|---|---|---|---|

0 | 0.834 | 0.627 | 0.648 | 0.338 | 265 | 367 |

1 | 0.885 | 0.990 | 0.467 | 0.786 | 230 | 1064 |

Total: | 0.858 | |||||

Total SV: | 34.591% | |||||

Best parameters: | C=1000 | G=0.005 |

and after the these changes were the results

Group | Precision | Recall | F1-score | N\(_{SV}\) |
---|---|---|---|---|

0 | 0.93 | 0.67 | 0.78 | 302 |

1 | 0.91 | 0.99 | 0.95 | 1034 |

avg / total | 0.92 | 0.91 | 0.91 | 1336 |

It is important to notice two things: firstly, the dimensions cut-off can lead to worse results in the total precision. This was expected and it even quite successful result when the dimensionality of the problem was decreased by 40% and the prediction success rests almost the same. The second issue is a common property of the described learning algorithms: they usually give worse results to the smaller class. In this case, it was group 0 (*no breakdown*), because breakdowns were usually requested by operators. The number of shots with breakdown is 1034 and shots without breakdown is only 302. This disproportion should be compensated. It is possible to use two ways: change threshold (bias) or change weight of the classes. Moreover, a weighted probability loss function was used.

Results for the same dataset using the linear kernel and SVM method are much worse. The total precision decreased to mere 80% but it is important to note that value 50% is only a random guess and value 77% if all thee points would be predicted as breakdown. This problem is caused by linear non-separability of the data, as is shown in Fig. [fig:main_result].

# Random probes

The method of random probes introduced in Section [sec:validation] was used. Three probes were created: the first probe *test_1* was random permutation of numbers from 1 to number of samples, the second *test_2* was random permutation of U\(_b\) and the last *test_3* was random permutation of U\(_cd\). These probes should have the weight much lower than the rest of the valid variables.

# Univariate feature elimination

The first tested method was univariate feature elimination based on statistical ANOVA test (see Section [sec:dim_reduc]). The ANOVA filter returns weight corresponding to statistical difference between groups for given variable. The great advantage is almost immediate speed of the filtering. Disadvantage is quite poor performance compared to other methods. The resulting weights are in fig. [fig:ANOVA]. According to the ANOVA, the random probes are one of the most important variables and, on the other hand, the pressure – theoretically the crucial variable – is negligible. This is significantly different from the expected results. It is caused because the classes in database are not linearly separable variable in each dimension. For example, the breakdown fails for too high pressure and also for too low pressure.

The unexpected weights of the random probes could be caused by violation of of the ANOVA model assumptions, i.e. independence of cases, normality of the residuals, homogeneity – variance of the groups should be the same.

# Recursive feature elimination with linear SVM

The next tested method is based on the Recursive Feature Elimination (RFE) with linear SVM, but the results are similar for all the described methods with the linear kernel. The solving speed was slower than the ANOVA filter but still it took only a few minutes. The weights are normalized absolute values of the normal vector of the SVM linear hyperplane. It is possible to make prediction for all variables and use only \(n\) of the most important or iteratively remove only the least important dimension and perform the training and prediction/pruning again. The first way is faster but the second way gives result that should be less affected by random variable that were removed in previous steps.

The results are shown in Fig. [fig:RFElinearSVM]. The results are significantly different from the ANOVA weights, the linear weighting successfully ignored random probes. Also it is important to note that the weight stayed very similar during elimination, the only exception is \(\Delta T\) variable.

Basic statistical characteristics of the fitted model are shown in the following table. Recall for the class 0 is very low; it means that linear model almost ignored the smaller class. However, it was expected since the model is not linearly separable.

Group | Precision | Recall | F1-score | CV-score |
---|---|---|---|---|

0 | 0.77 | 0.29 | 0.21 | 0.39 |

1 | 0.80 | 0.97 | 0.44 | 0.78 |

Total: | 0.80 |

The next step is to use the weights from linear hyperplane estimate the influence of each dimension.

keys: | Bfield | PreIon | \(P\) | \(\Delta T\) | \(T_{bd}\) | \(T_{cd}\) | \(U_{bd}\) | \(U_{cd}\) |
---|---|---|---|---|---|---|---|---|

weights: | 0.34 | 0.34 | 0.33 | 0.12 | 0.08 | -0.13 | -0.20 | 0.53 [tab:lin_weights1] |

Almost all variables increased breakdown probability with increasing variable size, the only interesting exception is voltage in breakdown coils – U\(_{bd}\). However, the breakdown field should always improve the chance to reach breakdown. This was probably a non-physical effect caused by the fact that the highest fields were tested in the cas of machine failure.

# Recursive feature elimination with RBF kernel and SVM

This method was described in the Section [sec:wrappers]. The weights correspond to increase of loss function when the belonging variable was removed. Full cross-validation was performed in order to find the best parameters for each variable combination – each parameters combination was solved 5 times. The results from the cross-validation were used to estimate errorbars of the weights.

The advantage of the method is that it is fully nonlinear method and thus the weights are not biased by assumption of linear separability, the disadvantage are high computational demands.

The results are shown in Fig. [fig:nonlinRFE-weights]. The results fit very well to the expected behaviour. Weights of the random probes are not zero, but within the errorbars they are correct. Also, on the contrary to the other method, pressure was determined as a very important variable together with the magnetic field and current drive. It is also quite interesting that \(\Delta T\), delay between breakdown and current drive, have no importance although linear SVM and ANOVA filter predicted quite high importance to this variable.

In Fig. [fig:nonlinRFE-order] is plotted the order of the eliminated variables. The variables eliminated at the beginning have very similar weights, so the order is not reliable. Interesting is that probe variable *test_3* stayed quite long.

Finally, the evolution of the predictive probability for each step of the RFE algorithm is in Fig. [fig:nonlinRFE-rate]. During the first four steps, the prediction is almost constant then the classification rate is decreased although the *test_3* variable was still in training set, so it was still overfitting.

The optimal number of variables is 5-6.

[fig:main_result]

Results for different learning machines when only four dimensions were used are in following table. The differences between the misclassification are not significant for the different algorithms because random changes between each run can produce different results. One important property is sparsity of the model. The sparsest and thus usually the less complex model is the RVM. Interesting is that the linear SVM have less sparse model but it is caused by linear non-separability of the model and thus by high misclassification rate and the plane can be still described by 5 parameters.

Machine | Total Error | Main vectors |
---|---|---|

RBF kernel - 4 dimensions |
||

SVM | 0.941 | 34% |

LogReg L1 | 0.926 | 100% |

LogReg L2 | 0.959 | 100% |

RVM | 0.912 | 5% |

linear kernel - all dimensions |
||

SVM | 0.67 | 46% |

# Code

Example of use

```
./predict_breakdown.py --train --plot
./predict_breakdown.py --test --Ub=400 --Ucd=500 --pressure=27.84 --Tcd=0.01
./predict_breakdown.py --outliers
```

Links to files

predict_breakdown.py - the algorithm itself

data_object.py - class Data

data_breakdown.npz - data from shot 5000 to 10700