Data mining
Golem database supports web access, however it is very ineffective to download separately huge number of scalar parameters, therefore it is recommender to use function get_history
from pygolem toolkit. Function get_history
is in pygolem_lite.modules. This method is used in HistoricalAnalysis webpage. Although only python version is implemented, data can be easily saved to .mat
files. See the second example
Example of use:
from pygolem_lite import get_history
from matplotlib.pyplot import plot
shots = range(10000, 10500)
data = get_history("pressure", shots)
plot(shots, data, ',')
show()
Downloading data for breakdown studies link :
from pygolem_lite.modules import get_history
from numpy import savez, mean, isnan
diags = ['plasma', 'gas_filling', 'pressure_initial', 'pressure', 'Ub',
'Tb', 'Ucd', 'Tcd', 'Ust', 'Tst', 'preionization', 'breakdown_voltage', 'loop_voltage_max',
'plasma_life' , 'transformator_saturation']
shots = range(5000, 10900) # range of shot numbers
data = dict()
for diag in diags:
data[diag] = get_history(diag, shots)
print "Success rate ", (1-mean(isnan(data[diag])))*100, ' - ', diag
for diag in [ 'plasma_status', 'session_name']: # load string variables
data[diag] = get_history(diag, shots, dtype="str")
savez('data', shots = shots, data = data)
# ! save data for matlab !
from scipy.io import savemat
data['shots'] = shots
savemat('data', data)
Note:
- if plasma_status is
nan
=> some serious failure of diagnostics , plasma_life
> 15 ms is probably errorloop_voltage_max
< 5V is probably DAS errorpressure
> 100mPa is unphysical (probably opened chamber)transformator_saturation
if more than 0.8 and plasma == 1 => probably false plasma detectsession_name
- some sessions should be avoided ie Technological/, Vacuum/
Search closest shot script
This script finds shots close in database to user selected variables. Example of use:
Five closest shots will be downloaded to closest_shots file. Shots are searched in a preloaded data file - (shots 5000 to 10700)
List of inputs for closest shot search script:
diags = ["Ub","Ucd","Ubd", 'Tbd', 'Tcd', 'gas_filling', 'preionization', 'pressure_request',
'plasma', 'loop_voltage_max', 'plasma_life' , 'transformator_saturation', 'plasma_status', 'session_name']
Note:
- Plotting should be removed later
- If some variables are not user defined, they are treated as irrelevant dimensions
Parallel data downloader:
download.py - diagnostic and shot range can be setup in the script