{ "cells": [ { "cell_type": "markdown", "metadata": { "golem": {} }, "source": [ "# Measuring electron temperature with a swept Langmuir probe\n", "\n", "Written by Katerina Hromasova with input from Martina Lauerova, Georgiy Sarancha, Jan Stockel, Vojtech Svoboda, Michael Komm and others.\n", "\n", "\n", "## Theory of swept probe measurements\n", "\n", "The following figure shows the ideal Langmuir probe $I$-$V$ characteristic.\n", "\n", "\n", "\n", "The ion branch of the curve (left half of the plot) can be described by a three-parameter exponential function.\n", "\n", "$I(V) = I_{sat} \\left( 1 - \\exp \\left( -\\frac{V - V_f}{T_e} \\right)\\right)$\n", "\n", "The three parameters are the ion saturated current $I_{sat}$, the probe floating potential $V_f$ and the electron temperature $T_e$ \\[eV\\]. The shape of the characteristic changes depending on these parameters, and by fitting an experimental $I$-$V$ characteristic with an exponential function, one may retrieve their values.\n", "\n", "To collect the whole $I$-$V$ characteristic in experiment, the biasing voltage $V$ on the probe is swept (i.e. varied periodically). The exact voltage shape is irrelevant, though we most often encounter the sawtooth (zig-zag) shape and the sine shape. The biasing voltage $V$ is then plotted against the current $I$ flowing from the probe to the ground and the curve is fitted with the exponential.\n", "\n", "This notebook performs $I$-$V$ characteristic fitting throughout the current discharge. It documents the process step by step and concludes with drawing the temporal evolution of the ion saturated current $I_{sat}$, the probe floating potential $V_f$ and the electron temperature $T_e$.\n", "\n", "**Note: All the time variables are given in seconds.**\n", "\n", "## Import the basic libraries\n", "\n", "First we import basic libraries: Numpy and Matplotlib. We will import more libraries throughout the notebook as needed." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%matplotlib inline\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "\n", "#Pick the discharge to analyse\n", "shot_no = 35902 #35428 #test discharge for which the notebook will definitely work\n", "shot = shot_no" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Access the diagnostics data\n", "\n", "The Langmuir probe we shall be working with is placed on the PetiProbe.\n", "\n", "\n", "\n", "(The Langmuir probe is the small metal pin on the right.)\n", "\n", "The data directory of the PetiProbe is `http://golem.fjfi.cvut.cz/shots/{shot}/Diagnostics/PetiProbe/`. Here, we write the function `get_data` to download the data." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from urllib.error import HTTPError # recognise the error stemming from missing data\n", "import pandas as pd # for reading csv files\n", "\n", "#Define an exception which will be raised if the data is missing and stop the notebook execution\n", "class StopExecution(Exception):\n", " def _render_traceback_(self):\n", " pass\n", "\n", "def get_data(shot, identifier):\n", " URL = \"http://golem.fjfi.cvut.cz/shots/{shot}/Diagnostics/PetiProbe/{identifier}.csv\"\n", " url = URL.format(shot=shot, identifier=identifier)\n", " try:\n", " df = pd.read_csv(url, names=['time', identifier], index_col='time')\n", " except HTTPError:\n", " print('File not found at %s. Aborting notebook execution.' % url)\n", " raise StopExecution\n", " t = np.array(df.index)\n", " data = np.transpose(np.array(df))[0]\n", " return t, data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The biasing voltage $V$ is collected under the name `U_bias`. The voltage proportional to the probe current is called `U_current`. The probe current can be calculated as $I = V/R$, where $R=46.7 \\, \\Omega$ is the measuring resistor resistance.\n", "\n", "In the following, we load this data for the current shot, calculate the probe current $I$ and plot the time evolution of $I$ and $V$. Notice that at the discharge beginning, the current isn't flat zero. This is the effect of the parasitic current, which we will discuss shortly." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#Load the probe data\n", "t, U_bias = get_data(shot, 'U_bias')\n", "t, U_current = get_data(shot, 'U_current')\n", "I = U_current/46.7 #probe current\n", "V = U_bias # probe voltage\n", "\n", "#Prepare the figure\n", "fig = plt.figure('Time evolution of the raw probe data', figsize=(12,5))\n", "fig.tight_layout()\n", "\n", "#Plot the probe current\n", "ax1 = plt.subplot(121)\n", "ax1.set_title('Raw probe current')\n", "plt.plot(t*1000, I*1000)\n", "plt.xlabel('$t$ [ms]')\n", "plt.ylabel('$I$ [mA]')\n", "plt.grid(True)\n", "plt.axhline(c='k')\n", "\n", "#Plot the probe voltage\n", "ax2 = plt.subplot(122)\n", "ax2.set_title('Raw probe voltage')\n", "plt.plot(t*1000, V)\n", "plt.xlabel('$t$ [ms]')\n", "plt.ylabel('$V$ [V]')\n", "plt.grid(True)\n", "plt.axhline(c='k')\n", "\n", "# Cut all the signals so that they start at t=0 (for later calculation purposes)\n", "I = I[t>=0]\n", "V = V[t>=0]\n", "t = t[t>=0]\n", "\n", "# Remove infinities and NaNs from the signals\n", "def remove_bad_numbers(data):\n", " n_bad = np.sum(np.isnan(data) | np.isinf(data))\n", " if np.isnan(data[0]) or np.isinf(data[0]):\n", " data[0] = 0\n", " for i in range(1, data.size):\n", " if np.isnan(data[i]) or np.isinf(data[i]):\n", " data[i] = data[i-1]\n", " return data, n_bad\n", "\n", "I, n_bad = remove_bad_numbers(I)\n", "print('Probe current: %s infinities removed' % n_bad)\n", "V, n_bad = remove_bad_numbers(V)\n", "print('Probe voltage: %s infinities removed' % n_bad)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Remove the parasitic current\n", "\n", "The parasitic current appears due to the capacity of the data collection system. At high sweeping frequencies, the wires behave like capacitors and cause current oscillations proportional to the time derivative of the biasing voltage. This parasitic current adds up with the probe current, distorting it.\n", "\n", "$I_{total}(V) = I_{probe}(V) + c \\cdot \\frac{dV}{dt}$\n", "\n", "Since the biasing voltage is largely independent of the plasma parameters, $V(t)$ is periodically constant throughout the discharge and so is the parasitic current. We use this in the parasitic signal reconstruction and removal.\n", "\n", "First, we sample the parasitic current at the beginning of the discharge, where $I_{probe}=0$ and $I_{total}=c \\cdot \\frac{dV}{dt}$. This is the time period between the opening of the $B_t$ capacitor banks and the opening of the current drive capacitor banks." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from urllib.request import urlopen\n", "\n", "#Define a function for downloading single values from the GOLEM database\n", "def get_parameter(url, shot, silent=False):\n", " URL = 'http://golem.fjfi.cvut.cz/shots/%i/%s' % (shot, url)\n", " if not silent:\n", " print(URL)\n", " f = urlopen(URL)\n", " try:\n", " return np.loadtxt(f)\n", " except ValueError: # the data couldn't be converted to a row of numbers\n", " return np.array([np.nan])\n", "\n", "#Load the time of the thyristor opening\n", "Tcd = get_parameter('Production/Parameters/Tcd', shot) * 1e-6 #s\n", "TBt = get_parameter('Production/Parameters/TBt', shot) * 1e-6 #s\n", "T0 = Tcd - TBt #time before which the parasitic signal is sampled\n", "if T0 == 0: #set SOME value so the script doesn't fail\n", " print('Warning: T0 set arbitrarily.')\n", " T0 = 0.001\n", "print('T0 = %.3f ms' % (T0*1000))\n", "\n", "#Plot the parasitic signal\n", "plt.figure('Parasitic signal sample', figsize=(8,6))\n", "t_sample = t[t7e-3]\n", "\n", "#Pick samples for the first I-V characteristic\n", "sample_t1 = t[sample_ext[0] : sample_ext[1]]\n", "sample_I1 = I[sample_ext[0] : sample_ext[1]]\n", "sample_V1 = V[sample_ext[0] : sample_ext[1]]\n", "\n", "#Pick samples for the second I-V characteristic\n", "sample_t2 = t[sample_ext[1] : sample_ext[2]]\n", "sample_I2 = I[sample_ext[1] : sample_ext[2]]\n", "sample_V2 = V[sample_ext[1] : sample_ext[2]]\n", "\n", "#Plot the time evolution of the probe voltage\n", "plt.figure('One period of probe voltage', figsize=(8,4))\n", "plt.title('One period of probe voltage')\n", "plt.plot(sample_t1*1000, sample_V1, 'r.')\n", "plt.plot(sample_t2*1000, sample_V2, 'b.')\n", "plt.xlabel('$t$ [ms]')\n", "plt.ylabel('$V$ [V]')\n", "plt.grid(True)\n", "plt.axhline(c='k')\n", "\n", "#Plot the time evolution of the probe current\n", "plt.figure('One period of probe current', figsize=(8,4))\n", "plt.title('One period of probe current')\n", "plt.plot(sample_t1*1000, sample_I1*1000, 'r.')\n", "plt.plot(sample_t2*1000, sample_I2*1000, 'b.')\n", "plt.xlabel('$t$ [ms]')\n", "plt.ylabel('$I$ [mA]')\n", "plt.grid(True)\n", "plt.axhline(c='k')\n", "\n", "#Plot the two I-V characteristics\n", "plt.figure('Sample of the I-V characteristic', figsize=(8,4))\n", "plt.title('Sample of the I-V characteristic')\n", "plt.plot(sample_V1, sample_I1*1000, 'r.')\n", "plt.plot(sample_V2, sample_I2*1000, 'b.')\n", "plt.xlabel('$V$ [V]')\n", "plt.ylabel('$I$ [mA]')\n", "plt.grid(True)\n", "plt.axhline(c='k')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Apply the bin average to the $I-V$ characteristic\n", "\n", "$I$-$V$ characteristics often contain a lot of fluctuations. This can mean that the exponential fit will not converge. In the past, when fitting techniques were slow, this was alleviated by applying the *bin average* to the data.\n", "\n", "Bin averaging is breaking the data into individual \"bin\" and averaging them within that bin. Typically, the x axis (here the biasing voltage $V$) is split into even parts and all the samples within a given part (bin) are averaged. Each average is given an errorbar, calculated as the standard deviation of the averaged data. The errorbars can then be used as weights during the characteristic fitting.\n", "\n", "Today's fitting techniques are, however, much more powerful than they used to be. Bin averaging no longer provides faster result but, on the contrary, distorts the results. This is becuase its errorbars, pretty as they are, are not very representative of the actual uncertainties in the signal. It is much better to fit the $I-V$ characteristic as we collect it, sample by sample.\n", "\n", "We will demonstrate the difference between fitting the full and the bin-averaged $I-V$ characteristic in this notebook. Thereafter, we will use bin averaging to get a good first estimate of the plasma parameters. This can improve the fit quality of the real data.\n", "\n", "In the following, we calculate the bin average of the two $I$-$V$ characteristics shown in the figure above." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#Define a function which performs bin averaging\n", "def bin_average(x, y, N_bins=100):\n", " dx = (np.nanmax(x) - np.nanmin(x)) / N_bins\n", " bins = np.linspace(np.nanmin(x) + dx/2, np.nanmax(x) - dx/2, N_bins)\n", " y_binned = []\n", " yerr = []\n", " for bin_centre in bins:\n", " y_samples = y[(bin_centre-dx/2 < x) & (x <= bin_centre+dx/2)]\n", " y_binned.append(np.mean(y_samples))\n", " yerr.append(np.std(y_samples))\n", " return bins, np.array(y_binned), np.array(yerr)\n", "\n", "#Select data from both the above I-V characteristics and apply bin averaging to them\n", "mask = slice(sample_ext[0], sample_ext[2])\n", "V_binned, I_binned, Ierr = bin_average(V[mask], I[mask], N_bins=40)\n", "\n", "#Plot the bin-averaged I-V characteristic\n", "plt.figure('Bin-averaged I-V characteristic', figsize=(8,6))\n", "plt.title('Bin-averaged I-V characteristic')\n", "plt.errorbar(V_binned, I_binned, yerr=Ierr, color='r', marker='o', ls='None', capsize=3, elinewidth=0.5)\n", "plt.grid(True)\n", "plt.axhline(c='k')\n", "plt.xlabel('$V$ [V]')\n", "plt.ylabel('$I$ [mA]')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Fit the bin-averaged $I-V$ characteristic\n", "\n", "Next, we fit this binned $I$-$V$ characteristic by the exponential function and print the resulting plasma parameters.\n", "\n", "Notice that only a part of the curve is used as fit input, in particular the data points whose probe current value is above $-2 I_{sat}$. This improves the fit stability by disregarding the more volatile datapoints near the electron branch of the $I$-$V$ characteristic." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import scipy.optimize as optimization # for fitting\n", "\n", "#Define the 3-parameter fit function; its arguments are a=Isat, b=Vf and c=Te\n", "def fit3par(x,a,b,c):\n", " return a*(1-np.exp((x-b)/c))\n", "\n", "#Trim the I-V characteristic\n", "Isat_estimate = np.mean(I_binned[:20])\n", "V_for_fit = V_binned[I_binned > -2*Isat_estimate]\n", "I_for_fit = I_binned[I_binned > -2*Isat_estimate]\n", "Ierr_for_fit = Ierr[I_binned > -2*Isat_estimate]\n", "\n", "#Fit the I-V characteristic\n", "p0 = [0.005, -5, 15] #initial guess of plasma parameters\n", "popt, pcov = optimization.curve_fit(fit3par, V_for_fit, I_for_fit, p0, sigma=Ierr_for_fit)\n", "\n", "#Keep a copy of the results for later\n", "popt_bin = popt\n", "pcov_bin = pcov\n", "\n", "#Plot the fitted I-V characteristic\n", "plt.figure('3-parameter fit of the binned I-V characteristic', figsize=(8,6))\n", "plt.title('3-parameter fit of the binned I-V characteristic')\n", "plt.errorbar(V_binned, I_binned, yerr=Ierr, color='k', marker='o', ls='None', capsize=3,\n", " elinewidth=0.5, alpha=0.3, label='disregarded samples')\n", "plt.errorbar(V_for_fit, I_for_fit, yerr=Ierr_for_fit, color='r', marker='o', ls='None',\n", " capsize=3, elinewidth=0.5, label='samples used for fitting')\n", "x = np.linspace(V_for_fit.min(), V_for_fit.max(), 100)\n", "plt.plot(x, fit3par(x,*popt), color = 'b')\n", "plt.grid(True)\n", "plt.legend()\n", "plt.axhline(c='k')\n", "plt.xlabel('$V$ [V]')\n", "plt.ylabel('$I$ [mA]')\n", "\n", "#Print the fit results\n", "print('Isat = %.1f mA' % (popt[0]*1000))\n", "print('V_fl = %.1f V' % popt[1])\n", "print('Te = %.1f eV' % popt[2])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Fit the full $I-V$ characteristic\n", "\n", "We use the fit of the bin-averaged data as initial guesses for the fit of the full data." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#Name the data to fit\n", "V_sample = V[mask]\n", "I_sample = I[mask]\n", "\n", "#Trim the I-V characteristic\n", "Isat_estimate = popt[0] #Isat of the bin-averaged I-V characteristic\n", "V_for_fit = V_sample[I_sample > -2*Isat_estimate]\n", "I_for_fit = I_sample[I_sample > -2*Isat_estimate]\n", "\n", "#Fit the I-V characteristic\n", "p0 = popt_bin #initial guess of plasma parameters\n", "popt, pcov = optimization.curve_fit(fit3par, V_for_fit, I_for_fit, p0)\n", "\n", "#Keep a copy of the results for later\n", "popt_orig = popt\n", "pcov_orig = pcov\n", "\n", "#Plot the fitted I-V characteristic\n", "plt.figure('3-parameter fit of the full I-V characteristic', figsize=(8,6))\n", "plt.title('3-parameter fit of the full I-V characteristic')\n", "plt.errorbar(V_sample, I_sample, color='k', marker='o', ls='None', alpha=0.3,\n", " label='disregarded samples', zorder=0)\n", "plt.errorbar(V_for_fit, I_for_fit, color='r', marker='o', ls='None',\n", " label='samples used for fitting', zorder=0)\n", "x = np.linspace(V_for_fit.min(), V_for_fit.max(), 100)\n", "plt.plot(x, fit3par(x,*popt), color = 'b', zorder=5, label='fit')\n", "plt.grid(True)\n", "plt.legend()\n", "plt.axhline(c='k')\n", "plt.xlabel('$V$ [V]')\n", "plt.ylabel('$I$ [mA]')\n", "\n", "#Print the fit results\n", "print('Isat = %.1f mA (bin-averaged value: %.1f mA)' % (popt[0]*1000, p0[0]*1000))\n", "print('V_fl = %.1f V (bin-averaged value: %.1f V)' % (popt[1], p0[1]))\n", "print('Te = %.1f eV (bin-averaged value: %.1f eV)' % (popt[2], p0[2]))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Investigate the fit result errorbars with the covariance matrix\n", "\n", "The Python fitting function `scipy.optimization.curve_fit` returns, beside the fit result values `popt`, also the so-called covariance matrix `pcov`. This is a 2D matrix whose diagonal contains the squares of the \"fit error\". They serve as an estimate of the fit results errorbars." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#Extract the diagonal components of the matrix and apply square root to them\n", "errs = np.sqrt(pcov.diagonal())\n", "\n", "#Print the fit results including the covariance matrix errorbars\n", "print('Isat = %.1f +- %.1f mA (bin-averaged value: %.1f mA)' % (popt[0]*1000, errs[0]*1000, popt_bin[0]*1000))\n", "print('V_fl = %.1f +- %.1f V (bin-averaged value: %.1f V)' % (popt[1], errs[1], popt_bin[1]))\n", "print('Te = %.1f +- %.1f eV (bin-averaged value: %.1f V)' % (popt[2], errs[2], popt_bin[2]))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Notice that the values obtained by fitting the bin-averaged $I-V$ characteristics may not fall within these errorbars.\n", "\n", "These errorbars, however, may not very representative of the uncertainty of the fit results. To get a real sense for the $I_{sat}$, $V_f$ and $T_e$ uncertainty due to the data fluctuation, we employ so-called *bootstrapping*.\n", "\n", "## Investigate the fit result errorbars with bootstraping\n", "\n", "Bootstraping ([Wikipedia article](https://en.wikipedia.org/wiki/Bootstrapping_(statistics))) is a simple and flexible tool for calculating errorbars. The general idea is such:\n", "\n", "1. Calculate your quantity (here $I_{sat}$, $V_f$ and $T_e$) from your dataset (here $I-V$ characteristic).\n", "2. Create a large number of synthetic datasets, based on the original one.\n", "3. Calculate your quantity for each of the synthetic datasets.\n", "4. Look how your quantity varies between these datasets.\n", "\n", "In other words, the quantity uncertainty ($I_{sat}$, $V_f$ and $T_e$ errobars) are gauged based on how much the \"synthetic\" $I_{sat}$, $V_f$ and $T_e$ vary across the synthetic datasets. Bootstrapping has a lot of advantages, some of which will be showed later in the notebook. Among them is that it can be applied to *any* dataset and *any* quantity you calculate from it. It makes no assumptions on the distribution function of the data (which, in other methods, in frequently assumed to be Gaussian) and you can adjust its precision easily by changing the number of synthetic datasets. Its major drawback it that it takes a lot of time, particularly if calculating your quantity is complicated (or, God forbid, cannot be automatised). But in the case of our $I-V$ characteristics, the time needed is not that long. (In the current tests, the entire notebook takes about 30 seconds to execute on a personal computer.) Plus, the synthetic dataset are by nature independent, so the calculation can be easily parallelised.\n", "\n", "The following function creates a number of synthetic $I-V$ characteristics (by default 100), fits them and returns the resulting 100 samples of synthetic $I_{sat}$, $V_f$ and $T_e$." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from arch import bootstrap\n", "\n", "def create_bootstrap_fit_replicates(I, V, p0=[0.005, -15, 15], bss=100, b=7):\n", " '''\n", " Return three arrays of bootstrap replicates of I_sat, V_f and T_e of given I-V characteristic.\n", " \n", " p0 ... initial parameter estimates\n", " bss ... number of bootstrap replicates to use for errorspan calculation\n", " b ... average size of bootstrap block (if 0, normal sample-by-sample bootstrap is used)\n", " '''\n", "\n", " #Fit the original I-V characteristic\n", " popt, pcov = optimization.curve_fit(fit3par, V, I, p0)\n", " p0 = popt #initial guess for all the replicates\n", "\n", " #Prepare variables and the bootstrap iterator\n", " indexes = np.arange(V.size)\n", " if b:\n", " bs = bootstrap.StationaryBootstrap(b, indexes)\n", " #something that will provide a number of bootstrapped indexes rather quickly\n", " else:\n", " bs = bootstrap.IIDBootstrap(indexes)\n", " \n", " #Perform bootstrapping\n", " synthetic_results = []\n", " for i_bs, ((bs_indexes,),_) in enumerate(bs.bootstrap(bss)):\n", " #print('bootstrap', i_bs+1)\n", " I_bs = I[bs_indexes]\n", " V_bs = V[bs_indexes]\n", " try:\n", " popt, pcov = optimization.curve_fit(fit3par, V_bs, I_bs, p0)\n", " except RuntimeError: #the fitting goes on for too long; is probably failing\n", " popt = np.array([np.nan]*3)\n", " synthetic_results.append(popt)\n", " \n", " return np.transpose(np.array(synthetic_results))\n", "\n", "# Create the bootstrap replicates\n", "synth_Isat, synth_Vf, synth_Te = create_bootstrap_fit_replicates(\n", " I_for_fit, V_for_fit, p0=popt, bss=1000, b=5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "First we visualise the different fits by plotting them onto the $I-V$ characteristic." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Prepare the figure\n", "fig = plt.figure('Bootstrap 3-parameter fits of the real I-V characteristic', figsize=(8,6))\n", "fig.tight_layout()\n", "plt.title('Bootstrap 3-parameter fit of the real I-V characteristic')\n", "\n", "# Plot the probe data\n", "plt.errorbar(V_for_fit, I_for_fit, color='r', marker='o', ls='None',\n", " label='samples used for fitting', zorder=0)\n", "\n", "# Plot the synthetic fits\n", "x = np.linspace(V_for_fit.min(), V_for_fit.max(), 100)\n", "for i in range(synth_Isat.size):\n", " p = [synth_Isat[i], synth_Vf[i], synth_Te[i]]\n", " plt.plot(x, fit3par(x, *p), color='b', zorder=5, alpha=0.05)\n", " \n", "# Plot the original fit and label the plot\n", "plt.plot(x, fit3par(x, *popt_orig), color='y', zorder=10, label='original fit')\n", "plt.grid(True)\n", "plt.legend()\n", "plt.axhline(c='k')\n", "plt.xlabel('$V$ [V]')\n", "plt.ylabel('$I$ [mA]')\n", "plt.ylim(I_for_fit.min()-0.001, I_for_fit.max()+0.001)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Notice that the original fit (yellow) remains in the middle of all the synthetic data fits (blue). This is a general feature of bootstrapping.\n", "\n", "Next, we investigate the fit result variability directly by plotting histograms of the synthetic values $I_{sat}$, $V_f$ and $T_e$." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Set the number of bins\n", "N = int(np.sqrt(synth_Isat.size))\n", "\n", "#Prepare the figure\n", "fig, axs = plt.subplots(1, 3, num='Synthetic data histograms', figsize=(12,5), sharey=True)\n", "fig.tight_layout()\n", "fig.subplots_adjust(wspace=0)\n", "\n", "# Draw Isat histogram\n", "axs[0].set_title('Synthetic $I_{sat}$')\n", "axs[0].hist(synth_Isat*1000, bins=N)\n", "axs[0].axvline(np.nanmean(synth_Isat)*1000, c='k')\n", "axs[0].set_xlabel('$I_{sat}$ [mA]')\n", "axs[0].grid(True)\n", "axs[0].set_ylabel('sample count')\n", "\n", "# Draw Vf histogram\n", "axs[1].set_title('Synthetic $V_f$')\n", "axs[1].hist(synth_Vf, bins=N)\n", "axs[1].axvline(np.nanmean(synth_Vf), c='k')\n", "axs[1].set_xlabel('$V_f$ [V]')\n", "axs[1].grid(True)\n", "\n", "# Draw Te histogram\n", "axs[2].set_title('Synthetic $T_e$')\n", "axs[2].hist(synth_Te, bins=N)\n", "axs[2].axvline(np.nanmean(synth_Te), c='k')\n", "axs[2].set_xlabel('$T_e$ [eV]')\n", "axs[2].grid(True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Notice that the probability distributions of the synthetic fit results are Gaussian and their means are equal to the original fit values. This is another general feature of bootstrapping.\n", "\n", "To capture the variability within these probability distributions, one can simply use the standard deviation. The distributions are Gaussian, so their variability is symmetric and a symmetric standard deviation captures it well. However, sometimes it is more informative to use an alternative - the 95% *confidence interval*.\n", "\n", "A confidence interval ([Wikipedia page](https://en.wikipedia.org/wiki/Confidence_interval)) is a measure of variability which says \"I am 95 % certain that the real value is somewhere within this interval\". In other words, \"ignoring the top 2.5 % and the bottom 2.5 % of values, this is the minimum and maximum value I expect from the variable\". Ignoring the top and bottom 2.5 % removes the outliers (the really far-fetched variable samples) yet still captures most of the variable variability. We will test confidence intervals shortly.\n", "\n", "## Compare various ways to calculate $I_{sat}$, $V_f$ and $T_e$ and their errobars\n", "\n", "In the following, we compare several ways of determining the $I_{sat}$, $V_f$ and $T_e$ values and errorbars." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Results of bin averaging\n", "errs = np.sqrt(pcov_bin.diagonal())\n", "print('Data bin-averaged, covariance matrix errorbars:')\n", "print('Isat = %.1f +- %.1f mA' % (popt_bin[0]*1000, errs[0]*1000))\n", "print('V_fl = %.1f +- %.1f V' % (popt_bin[1], errs[1]))\n", "print('Te = %.1f +- %.1f eV' % (popt_bin[2], errs[2]))\n", "\n", "# Results of the whole data fit\n", "errs = np.sqrt(pcov_orig.diagonal())\n", "print('\\nFull data, covariance matrix errorbars:')\n", "print('Isat = %.1f +- %.1f mA' % (popt_orig[0]*1000, errs[0]*1000))\n", "print('V_fl = %.1f +- %.1f V' % (popt_orig[1], errs[1]))\n", "print('Te = %.1f +- %.1f eV' % (popt_orig[2], errs[2]))\n", "\n", "# Results of bootstrap with standard deviation errorbars\n", "print('\\nFull data, bootstrap fits with standard deviation errorbars:')\n", "print('Isat = %.1f +- %.1f mA' % (synth_Isat.mean()*1000, synth_Isat.std()*1000))\n", "print('V_fl = %.1f +- %.1f V' % (synth_Vf.mean(), synth_Vf.std()))\n", "print('Te = %.1f +- %.1f eV' % (synth_Te.mean(), synth_Te.std()))\n", "\n", "# Results of bootstrap with 95% confidence intervals\n", "print('\\nFull data, bootstrap fits with 95% confidence intervals:')\n", "print('Isat = %.1f mA (%.1f to %.1f mA)' % (synth_Isat.mean()*1000,\n", " np.percentile(synth_Isat, 2.5)*1000, np.percentile(synth_Isat, 97.5)*1000))\n", "print('V_fl = %.1f V (%.1f to %.1f V)' % (synth_Vf.mean(),\n", " np.percentile(synth_Vf, 2.5), np.percentile(synth_Vf, 97.5)))\n", "print('Te = %.1f eV (%.1f to %.1f eV)' % (synth_Te.mean(),\n", " np.percentile(synth_Te, 2.5), np.percentile(synth_Te, 97.5)))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Which method you pick depends, generally, on the kind of data you have. Can you afford to make a lot of synthetic bootstrap datasets? What kind of plots do you want to make with the data? How much do you need to trust the results? In this notebook, we will go with the third option and take the standard deviation of the bootstrapped $I_{sat}$, $V_f$ and $T_e$ as the errorbars. This is, in part, because I'm a bit too lazy to deal with asymmetric Y axis errors at the moment. (Plus the symmetric standard deviation is easier to read.)\n", "\n", "## Check correlations between the fit results\n", "\n", "There is one more thing to consider when judging the fit results uncertainty. Since $I_{sat}$, $V_f$ and $T_e$ come from a single fit, there can be trade-offs between them. For instance, a decrease in $I_{sat}$ may be partially compensated by a decrease in $T_e$. This correlation increases the overall uncertainty. Sadly, at the moment I don't know how to calculate exactly how much. So let's just plot the scatterplots and see if $I_{sat}$, $V_f$ and $T_e$ are correlated within a single $I-V$ characteristic and its synthetic replicates. The red star denotes the results of the original fit of the full data." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Prepare the figure\n", "fig, axs = plt.subplots(1, 3, num='Synthetic data scatterplots', figsize=(12,5), constrained_layout=True)\n", "#fig.tight_layout()\n", "#fig.subplots_adjust(wspace=0.3)\n", "\n", "# Draw Isat x Te scatterplot\n", "axs[0].set_title('Synthetic $I_{sat}$ vs $T_e$')\n", "c0 = axs[0].scatter(synth_Isat*1000, synth_Te, marker='v', c=synth_Vf)\n", "axs[0].plot([popt_orig[0]*1000], [popt_orig[2]], 'r*')\n", "axs[0].set_xlabel('$I_{sat}$ [mA]')\n", "axs[0].set_ylabel('$T_e$ [eV]')\n", "axs[0].grid(True)\n", "cbar = fig.colorbar(c0, ax=axs[0], orientation='horizontal')\n", "cbar.set_label('$V_f$ [V]')\n", "\n", "# Draw Vf x Te histogram\n", "axs[1].set_title('Synthetic $V_f$ vs $T_e$')\n", "c1 = axs[1].scatter(synth_Vf, synth_Te, marker='v', c=synth_Isat*1000)\n", "axs[1].plot([popt_orig[1]], [popt_orig[2]], 'r*')\n", "axs[1].set_xlabel('$V_f$ [V]')\n", "axs[1].set_ylabel('$T_e$ [eV]')\n", "axs[1].grid(True)\n", "cbar = fig.colorbar(c1, ax=axs[1], orientation='horizontal')\n", "cbar.set_label('$I_{sat}$ [mA]')\n", "\n", "# Draw Isat x Vf histogram\n", "axs[2].set_title('Synthetic $I_{sat}$ vs $V_f$')\n", "c2 = axs[2].scatter(synth_Isat*1000, synth_Vf, marker='v', c=synth_Te)\n", "axs[2].plot([popt_orig[0]*1000], [popt_orig[1]], 'r*')\n", "axs[2].set_xlabel('$I_{sat}$ [mA]')\n", "axs[2].set_ylabel('$V_f$ [V]')\n", "axs[2].grid(True)\n", "cbar = fig.colorbar(c2, ax=axs[2], orientation='horizontal')\n", "cbar.set_label('$T_e$ [eV]')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can see that the fit results *are*, indeed, correlated. In particular, an increase in $T_e$ can be compensated by an increase in $I_{sat}$ or a decrease in $V_f$. The latter two, $I_{sat}$ and $V_f$, are not strongly correlated. This means that the variation in $T_e$ is, in part, not caused by the probe data fluctuation but by the variation of $I_{sat}$ and $V_f$. This finding requires more in-depth analysis to properly acknowledge in the errobars used in this notebook, so we leave it here as a point of interest and future research.\n", "\n", "\n", "\n", "## Fit all $I-V$ characteristics throughout the discharge\n", "\n", "Finally, we perform the fitting procedure described above throughout the discharge. The process is fully automatic. Occasionally the fit doesn't converge; this will be treated later." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Define fit constants\n", "start = get_parameter('Diagnostics/BasicDiagnostics/Results/t_plasma_start', shot) * 1e-3 #s\n", "end = get_parameter('Diagnostics/BasicDiagnostics/Results/t_plasma_end', shot) * 1e-3 #s\n", "n = 2 #number of I-V characteristics to bin together\n", "\n", "# Perform the fitting\n", "extremes_for_calculation = extremes[(start < t_ext) & (t_ext < end)]\n", "fit_results = []\n", "for i in range(extremes_for_calculation.size-n):\n", " #Select full data sample\n", " k1 = extremes_for_calculation[i] #index of the sample beginning\n", " k2 = extremes_for_calculation[i+n] #index of the sample end\n", " V_sample = V[k1 : k2]\n", " I_sample = I[k1 : k2]\n", " \n", " #Perform bin averaging and fit the I-V characteristic as first guess\n", " Isat_estimate = np.mean(I_sample[I_sample > 0])\n", " mask = (I_binned > -2*Isat_estimate)\n", " if np.sum(mask) == 0: #no data to fit\n", " fit_results.append([np.nan]*6)\n", " continue\n", " V_binned, I_binned, Ierr = bin_average(V_sample, I_sample, N_bins=40)\n", " try:\n", " popt, pcov = optimization.curve_fit(fit3par, V_binned[mask], I_binned[mask], p0, sigma=Ierr[mask])\n", " except RuntimeError: #the fit is taking too long and probably diverged\n", " fit_results.append([np.nan]*6)\n", " continue\n", " \n", " #Fit the full data\n", " Isat_estimate = popt[0]\n", " mask = (I_sample > -2*Isat_estimate)\n", " if np.sum(mask) == 0: #no data to fit\n", " fit_results.append([np.nan]*6)\n", " continue\n", " try:\n", " popt, pcov = optimization.curve_fit(fit3par, V_sample[mask], I_sample[mask], popt)\n", " except RuntimeError: #the fit is taking too long and probably diverged\n", " fit_results.append([np.nan]*6)\n", " continue\n", " \n", " #Create synthetic fit results using bootstrapping and process them for errorbars\n", " synth_Isat, synth_Vf, synth_Te = create_bootstrap_fit_replicates(\n", " I_sample[mask], V_sample[mask], p0=popt, bss=100, b=5)\n", " err = [synth_Isat.std(), synth_Vf.std(), synth_Te.std()]\n", " fit_results.append(list(popt)+err)\n", "\n", "#Compile the fit results\n", "fit_results = np.transpose(np.array(fit_results))\n", "Isat, V_f, Te, Isat_err, Vf_err, Te_err = fit_results\n", "t_fit = np.convolve(t[extremes_for_calculation][:-1], np.ones(n), 'valid') / n + T/4\n", " # moving average of t_ext by n samples, except for the last, closing extreme\n", " # middle of the window where the used I-V characteristics were collected" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To improve the data quality, we remove the results where the fit evidently failed." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#Create a mask of results to be kept\n", "mask = np.array([True]*Isat.size)\n", "\n", "#Throw out fit results with larger than 50 % errorbar in Isat or Te\n", "# (not V_f, since that can be expected to pass through 0)\n", "mask = mask & (Isat_err < 0.5*abs(Isat)) & (Te_err < 0.5*abs(Te))\n", "\n", "#Throw out fit results with unreasonable/unphysical values\n", "mask = mask & (Isat > 0) & (Isat < 0.03)\n", "mask = mask & (V_f > -100) & (V_f < 30)\n", "mask = mask & (Te > 0) & (Te < 100)\n", "\n", "#Remove fits which don't meet all the criteria\n", "Isat[~mask] = np.nan\n", "V_f[~mask] = np.nan\n", "Te[~mask] = np.nan\n", "print('%i %% of the fits did not converge.' % (np.sum(~mask)/Isat.size * 100))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We save the fit results for later use." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from IPython.display import Markdown\n", "import os\n", "\n", "#os.mkdir('Results')\n", "np.savetxt('Results/Swept_LP_fitting_output.txt',\n", " np.transpose((t_fit, Isat, Isat_err, V_f, Vf_err, Te, Te_err)),\n", " header=('Discharge %i Langmuir probe fit results'\n", " 't [s] | Isat [A] | Isat_err [A] | V_floating [V] | V_floating_err [V] | '\n", " 'Te [eV] | Te_err [eV]' % shot) )\n", "Markdown(\"[Fit results](./Swept_LP_fitting_output.txt)\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally, let's plot the time evolution of the fit results - edge plasma parameters $I_{sat}$, $V_f$ and $T_e$ - during the whole discharge." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#Prepare figure\n", "fig, axs = plt.subplots(3, 1, num='Time evolution of fit results', figsize=(8,8), sharex=True)\n", "fig.tight_layout()\n", "fig.subplots_adjust(hspace=0)\n", "\n", "#Plot the ion saturated current\n", "axs[0].errorbar(1000*t_fit, Isat*1000, color='g', marker='o', yerr=Isat_err*1000,\n", " label='Ion saturation current', ls='None', capsize=3, elinewidth=0.5)\n", "axs[0].grid(True)\n", "axs[0].axhline(c='k')\n", "axs[0].set(ylabel='$I_{sat}$ [mA]', title='Swept Langmuir probe results, discharge #%i' % shot)\n", "axs[0].legend(loc=1)\n", "\n", "#Plot the floating potential\n", "axs[1].errorbar(1000*t_fit, V_f, color='b', marker='o', yerr=Vf_err,\n", " label='Floating potential', ls='None', capsize=3, elinewidth=0.5)\n", "axs[1].grid(True)\n", "axs[1].axhline(c='k')\n", "axs[1].set(ylabel='$V_f$ [V]')\n", "axs[1].legend(loc=1)\n", "\n", "#Plot the electron temperature\n", "axs[2].errorbar(1000*t_fit, Te, color='r', marker='o', yerr=Te_err,\n", " label='Electron temperature', ls='None', capsize=3, elinewidth=0.5)\n", "axs[2].grid(True)\n", "axs[2].axhline(c='k')\n", "axs[2].set(xlabel='$t$ [ms]', ylabel='$T_e$ [eV]')\n", "axs[2].legend(loc=1)\n", "\n", "# Save the figure\n", "plt.savefig('icon-fig')" ] } ], "metadata": { "celltoolbar": "Edit Metadata", "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.4" } }, "nbformat": 4, "nbformat_minor": 4 }