Measuring electron temperature with a swept Langmuir probe

Written by Katerina Hromasova with input from Martina Lauerova, Georgiy Sarancha, Jan Stockel, Vojtech Svoboda, Michael Komm and others.

Theory of swept probe measurements

The following figure shows the ideal Langmuir probe $I$-$V$ characteristic.

The ion branch of the curve (left half of the plot) can be described by a three-parameter exponential function.

$I(V) = I_{sat} \left( 1 - \exp \left( -\frac{V - V_f}{T_e} \right)\right)$

The three parameters are the ion saturated current $I_{sat}$, the probe floating potential $V_f$ and the electron temperature $T_e$ [eV]. The shape of the characteristic changes depending on these parameters, and by fitting an experimental $I$-$V$ characteristic with an exponential function, one may retrieve their values.

To collect the whole $I$-$V$ characteristic in experiment, the biasing voltage $V$ on the probe is swept (i.e. varied periodically). The exact voltage shape is irrelevant, though we most often encounter the sawtooth (zig-zag) shape and the sine shape. The biasing voltage $V$ is then plotted against the current $I$ flowing from the probe to the ground and the curve is fitted with the exponential.

This notebook performs $I$-$V$ characteristic fitting throughout the current discharge. It documents the process step by step and concludes with drawing the temporal evolution of the ion saturated current $I_{sat}$, the probe floating potential $V_f$ and the electron temperature $T_e$.

Note: All the time variables are given in seconds.

Import the basic libraries

First we import basic libraries: Numpy and Matplotlib. We will import more libraries throughout the notebook as needed.

Access the diagnostics data

The Langmuir probe we shall be working with is placed on the PetiProbe.

(The Langmuir probe is the small metal pin on the right.)

The data directory of the PetiProbe is http://golem.fjfi.cvut.cz/shots/{shot}/Diagnostics/PetiProbe/. Here, we write the function get_data to download the data.

The biasing voltage $V$ is collected under the name U_bias. The voltage proportional to the probe current is called U_current. The probe current can be calculated as $I = V/R$, where $R=46.7 \, \Omega$ is the measuring resistor resistance.

In the following, we load this data for the current shot, calculate the probe current $I$ and plot the time evolution of $I$ and $V$. Notice that at the discharge beginning, the current isn't flat zero. This is the effect of the parasitic current, which we will discuss shortly.

Remove the parasitic current

The parasitic current appears due to the capacity of the data collection system. At high sweeping frequencies, the wires behave like capacitors and cause current oscillations proportional to the time derivative of the biasing voltage. This parasitic current adds up with the probe current, distorting it.

$I_{total}(V) = I_{probe}(V) + c \cdot \frac{dV}{dt}$

Since the biasing voltage is largely independent of the plasma parameters, $V(t)$ is periodically constant throughout the discharge and so is the parasitic current. We use this in the parasitic signal reconstruction and removal.

First, we sample the parasitic current at the beginning of the discharge, where $I_{probe}=0$ and $I_{total}=c \cdot \frac{dV}{dt}$. This is the time period between the opening of the $B_t$ capacitor banks and the opening of the current drive capacitor banks.

We want to "clone" this sample and cover the rest of the discharge with it. To do that, we need to know exactly how long its period is. We load this from the database, where the sweeping frequency f_fg is stored.

Next, we pick a few whole periods of the parasitic signal from the discharge beginning and clone the entire parasitic signal from them. Finally, we subtract the parasitic current from the total current, retrieving the probe current alone.

Cut the probe signal into individual $I-V$ characteristics

The probe current $I$ and voltage $V$ are now ready to be plotted into the $I-V$ characteristic. However, we can't mix $I-V$ characteristics from different parts of the discharge - the plasma paramaters are different and so are the $I-V$ characteristics. We need to treat them separately, and that means breaking up the signal into individual periods of the sweeping voltage.

In the following, we create a list of voltage peaks maxima and valleys minima. Specifically, we detect the first peak position in $V$ and "predict" the following peaks based on the sweeping period.

Plot a sample $I-V$ characteristic

As an $I-V$ characteristic example, we take the first sweeping voltage period starting after $t = 7$ ms. We plot the $I$-$V$ characteristics separately for the voltage ramp up and ramp down to show any potential hysteresis.

Apply the bin average to the $I-V$ characteristic

$I$-$V$ characteristics often contain a lot of fluctuations. This can mean that the exponential fit will not converge. In the past, when fitting techniques were slow, this was alleviated by applying the bin average to the data.

Bin averaging is breaking the data into individual "bin" and averaging them within that bin. Typically, the x axis (here the biasing voltage $V$) is split into even parts and all the samples within a given part (bin) are averaged. Each average is given an errorbar, calculated as the standard deviation of the averaged data. The errorbars can then be used as weights during the characteristic fitting.

Today's fitting techniques are, however, much more powerful than they used to be. Bin averaging no longer provides faster result but, on the contrary, distorts the results. This is becuase its errorbars, pretty as they are, are not very representative of the actual uncertainties in the signal. It is much better to fit the $I-V$ characteristic as we collect it, sample by sample.

We will demonstrate the difference between fitting the full and the bin-averaged $I-V$ characteristic in this notebook. Thereafter, we will use bin averaging to get a good first estimate of the plasma parameters. This can improve the fit quality of the real data.

In the following, we calculate the bin average of the two $I$-$V$ characteristics shown in the figure above.

Fit the bin-averaged $I-V$ characteristic

Next, we fit this binned $I$-$V$ characteristic by the exponential function and print the resulting plasma parameters.

Notice that only a part of the curve is used as fit input, in particular the data points whose probe current value is above $-2 I_{sat}$. This improves the fit stability by disregarding the more volatile datapoints near the electron branch of the $I$-$V$ characteristic.

Fit the full $I-V$ characteristic

We use the fit of the bin-averaged data as initial guesses for the fit of the full data.

Investigate the fit result errorbars with the covariance matrix

The Python fitting function scipy.optimization.curve_fit returns, beside the fit result values popt, also the so-called covariance matrix pcov. This is a 2D matrix whose diagonal contains the squares of the "fit error". They serve as an estimate of the fit results errorbars.

Notice that the values obtained by fitting the bin-averaged $I-V$ characteristics may not fall within these errorbars.

These errorbars, however, may not very representative of the uncertainty of the fit results. To get a real sense for the $I_{sat}$, $V_f$ and $T_e$ uncertainty due to the data fluctuation, we employ so-called bootstrapping.

Investigate the fit result errorbars with bootstraping

Bootstraping (Wikipedia article)) is a simple and flexible tool for calculating errorbars. The general idea is such:

  1. Calculate your quantity (here $I_{sat}$, $V_f$ and $T_e$) from your dataset (here $I-V$ characteristic).
  2. Create a large number of synthetic datasets, based on the original one.
  3. Calculate your quantity for each of the synthetic datasets.
  4. Look how your quantity varies between these datasets.

In other words, the quantity uncertainty ($I_{sat}$, $V_f$ and $T_e$ errobars) are gauged based on how much the "synthetic" $I_{sat}$, $V_f$ and $T_e$ vary across the synthetic datasets. Bootstrapping has a lot of advantages, some of which will be showed later in the notebook. Among them is that it can be applied to any dataset and any quantity you calculate from it. It makes no assumptions on the distribution function of the data (which, in other methods, in frequently assumed to be Gaussian) and you can adjust its precision easily by changing the number of synthetic datasets. Its major drawback it that it takes a lot of time, particularly if calculating your quantity is complicated (or, God forbid, cannot be automatised). But in the case of our $I-V$ characteristics, the time needed is not that long. (In the current tests, the entire notebook takes about 30 seconds to execute on a personal computer.) Plus, the synthetic dataset are by nature independent, so the calculation can be easily parallelised.

The following function creates a number of synthetic $I-V$ characteristics (by default 100), fits them and returns the resulting 100 samples of synthetic $I_{sat}$, $V_f$ and $T_e$.

First we visualise the different fits by plotting them onto the $I-V$ characteristic.

Notice that the original fit (yellow) remains in the middle of all the synthetic data fits (blue). This is a general feature of bootstrapping.

Next, we investigate the fit result variability directly by plotting histograms of the synthetic values $I_{sat}$, $V_f$ and $T_e$.

Notice that the probability distributions of the synthetic fit results are Gaussian and their means are equal to the original fit values. This is another general feature of bootstrapping.

To capture the variability within these probability distributions, one can simply use the standard deviation. The distributions are Gaussian, so their variability is symmetric and a symmetric standard deviation captures it well. However, sometimes it is more informative to use an alternative - the 95% confidence interval.

A confidence interval (Wikipedia page) is a measure of variability which says "I am 95 % certain that the real value is somewhere within this interval". In other words, "ignoring the top 2.5 % and the bottom 2.5 % of values, this is the minimum and maximum value I expect from the variable". Ignoring the top and bottom 2.5 % removes the outliers (the really far-fetched variable samples) yet still captures most of the variable variability. We will test confidence intervals shortly.

Compare various ways to calculate $I_{sat}$, $V_f$ and $T_e$ and their errobars

In the following, we compare several ways of determining the $I_{sat}$, $V_f$ and $T_e$ values and errorbars.

Which method you pick depends, generally, on the kind of data you have. Can you afford to make a lot of synthetic bootstrap datasets? What kind of plots do you want to make with the data? How much do you need to trust the results? In this notebook, we will go with the third option and take the standard deviation of the bootstrapped $I_{sat}$, $V_f$ and $T_e$ as the errorbars. This is, in part, because I'm a bit too lazy to deal with asymmetric Y axis errors at the moment. (Plus the symmetric standard deviation is easier to read.)

Check correlations between the fit results

There is one more thing to consider when judging the fit results uncertainty. Since $I_{sat}$, $V_f$ and $T_e$ come from a single fit, there can be trade-offs between them. For instance, a decrease in $I_{sat}$ may be partially compensated by a decrease in $T_e$. This correlation increases the overall uncertainty. Sadly, at the moment I don't know how to calculate exactly how much. So let's just plot the scatterplots and see if $I_{sat}$, $V_f$ and $T_e$ are correlated within a single $I-V$ characteristic and its synthetic replicates. The red star denotes the results of the original fit of the full data.

We can see that the fit results are, indeed, correlated. In particular, an increase in $T_e$ can be compensated by an increase in $I_{sat}$ or a decrease in $V_f$. The latter two, $I_{sat}$ and $V_f$, are not strongly correlated. This means that the variation in $T_e$ is, in part, not caused by the probe data fluctuation but by the variation of $I_{sat}$ and $V_f$. This finding requires more in-depth analysis to properly acknowledge in the errobars used in this notebook, so we leave it here as a point of interest and future research.

Fit all $I-V$ characteristics throughout the discharge

Finally, we perform the fitting procedure described above throughout the discharge. The process is fully automatic. Occasionally the fit doesn't converge; this will be treated later.

To improve the data quality, we remove the results where the fit evidently failed.

We save the fit results for later use.

Finally, let's plot the time evolution of the fit results - edge plasma parameters $I_{sat}$, $V_f$ and $T_e$ - during the whole discharge.