h1. CDD 1an, Ingénieur de Recherche, développeur C++/Qt, visualisation/machine learning pour données satellites
%{color:red}OFFRE POURVUE%
L'équipe plasmas spatiaux du "Laboratoire de Physique des Plasmas":http://www.lpp.fr (LPP) recrute un(e) ingénieur(e) de recherche en CDD pour une durée de 1 an, dans le cadre du développement du logiciel graphique "SciQLOP":https://hephaistos.lpp.polytechnique.fr/redmine/projects/sciqlop/wiki, de recherche et visualisation de données plasmas mesurées par des satellites dans le milieu interplanétaire et magnétosphérique. Alliant des méthodes d'apprentissage statistiques à une interface intuitive, efficace et moderne, SciQLOP sera un logiciel unique en son genre au niveau mondial.
h2. Contexte
Depuis des dizaines d'années, des missions satellitaires sont envoyées dans l'espace afin de mesurer les propriétés plasmas et électromagnétiques de notre environnement proche et interplanétaire. Ces données sont actuellement stockées dans de grandes bases publiques et décrites dans des formats standards. Leur exploration et l'extraction d'intervalles présentant des signatures d'intérêt physique est cependant encore difficile. La grande variabilité des signatures observationnelles liée au caractère très dynamique des systèmes mesurés rend les méthodes de recherches basées sur des règles fixes très peu efficaces. L'exploration visuelle est donc aujourd'hui quasi incontournable mais présente évidemment les inconvénients d'être peu reproductible, longue et fastidieuse. Ceci provient, en grande partie, du manque d'outils graphiques permettant de l'exploration de bases de données et manière intuitive, efficace et indépendante de la mission d'origine. Ce projet a pour but le développement d'un logiciel graphique permettant une telle exploration, et possédant en son coeur des méthodes d'apprentissage statistique, pour l'établissement intelligent et reproductible de catalogues de signatures d'intérêt scientifique. Un prototype open-source a déjà été élaboré au sein du laboratoire et s'appuie sur l'API C++ Qt. L'aspect "machine learning" du projet est réalisé dans le cadre d'une collaboration avec le "laboratoire de mathématiques appliquées de l'école Polytechnique":http://www.cmap.polytechnique.fr.
h2. Descriptif du poste et mission
Vous serez intégré à l'équipe Plasmas Spatiaux du LPP en tant qu'ingénieur de recherche. Vous aurez pour mission le développement de l'application SciQLOP. Outre l'interface graphique permettant de visualiser et cataloguer les données de manière intuitive et efficace, vous serez amené à chercher et proposer des solutions quant aux méthodes de machine learning permettant à SciQLOP d'apprendre à reconnaitre des signatures intéressantes et les proposer à l'utilisateur. Ce travail de recherche et développement sera le fruit de votre interaction avec les scientifiques du laboratoire, experts en analyse de ces données et des instruments qui les produisent, ainsi qu'avec les experts en machine learning du laboratoire de mathématique appliquée. Les points clés du développement sont :
h2. Rémunération
2700€ brut mensuel, à préciser selon expérience.
h2. Emplacement
Vous serez basé au "Laboratoire de Physique des Plasmas, sur le campus de l'école Polytechnique":http://www.lpp.fr/Comment-venir-au-LPP, à Palaiseau.
p=. !https://hephaistos.lpp.polytechnique.fr/redmine/attachments/download/1094/jpg_plan_lpp_X-81667.jpg!
h2. Votre profil
Vous êtes motivés par le développement graphique et sensible à l'ergonomie de vos interfaces. Vous êtes très enthousiasmé à l'idée de développer un logiciel scientifique unique au monde et d'être l'un des précurseurs de l'utilisation du machine learning pour l'analyse des données spatiales in situ. Vous êtes une personne curieuse, avez un esprit d'initiative et faites preuve d'une grande autonomie. Vous aimez partager et le travail de groupe.
h3. Niveau de recrutement et expérience
h3. Compétences et expériences exigées
h3. Compétences souhaitées
Posséder les compétences suivantes est un grand atout :
h2. Nous contacter/postuler
"Nicolas Aunai, Alexis Jeandet":mailto:nicolas.aunai@lpp.polytechnique.fr?cc=alexis.jeandet@lpp.polytechnique.fr
h1. research engineer position (1 year), developer C++/Qt, visualization/machine learning for satellite data
%{color:red}POSITION FILLED%
The space plasma team of the "Laboratory of Plasma Physics":http://www.lpp.fr is hiring a research engineer for one year, for the development of the GUI application "SciQLOP":https://hephaistos.lpp.polytechnique.fr/redmine/projects/sciqlop/wik, dedicated to the research and visualization of in situ spacecraft data measured in the magnetosphere and interplanetary space. Gathering an intuitive and powerful user interface with machine learning methods, SciQLOP will be the first software of its kind for space data analysis.
h2. Context
For decades, satellite missions have been sent to space in order to measure plasma and electromagnetic fields in our nearby space environment. Although this data is continuously stored within large public databases in a single file format, exploring it and extracting intervals revealing signatures of physical interest remains quite difficult. The very dynamic nature of the observed systems results in a great variability of observational signatures, which makes methods based on a fix set of rules, no matter how complex they are, very inefficient. Visual exploration is therefore almost unavoidable but it comes with the drawbacks of being hardly reproducible, long and laborious, mainly because we lack the graphical tools that would for intuitive and efficient exploration independent of the mission from which the data originates. This project aims in developing such a graphical software, with at its core, machine learning methods enabling feature recognition and smart cataloging of scientifically interesting intervals. A proof-of-concept graphical interface has already been developed at the laboratory and is based on the C++ Qt framework. The machine learning learning capabilities will be based to a large extent on existing packages and in collaboration with the laboratory of applied mathematics of Ecole Polytechnique.
h2. Job description
You will be part of the space plasma team at LPP, as a research engineer. You will be in charge of the development of the software SciQLOP. Besides the intuitive graphical interface for visualization and cataloging, you will have to find, propose and implement machine learning solutions allowing SciQLOP to learn and recognize features in the data and suggest them to the user. Collaboration with experts in spacecraft data and instruments in our lab, and machine learning experts in the applied math. lab will be essential for the research and development work. The key points of your development will be:
h2. Location
You will be based at the Laboratory of Plasmas Physics, at "Ecole Polytechnique":http://www.lpp.fr/Comment-venir-au-LPP, in Palaiseau.
!https://hephaistos.lpp.polytechnique.fr/redmine/attachments/download/1094/jpg_plan_lpp_X-81667.jpg!
h2. Salary
gross income 2700€/month depending on the experience and qualifications.
h2. You
Your are motivated for developing graphical user interfaces and are particularly sensitive to their ergonomy. You are enthusiastic about developing a unique scientific application and being one of the precursors of machine learning application for space physics. You're curious, you have a spirit of initiative and an independent worker. You enjoy sharing and team work.
h3. Your experience and formation
h3. Required skills
h3. Desirable skills
Having skills among the following is a great asset:
h2. Contact us/apply
"Nicolas Aunai, Alexis Jeandet":mailto:nicolas.aunai@lpp.polytechnique.fr?cc=alexis.jeandet@lpp.polytechnique.fr
Web Service Activity Documents (SOAP, WSDL, etc.)
Spacecraft orbits
WSDL
database web services
You will find here all the past and current job opportunities associated to the SciQLOP project
Known time description
========================
Double Time ranges : date from 01/01/1970 to 01/01/2100
µs, ns, ps might also be needed for tt2000
Double (IEEE754)
64bits
min value: –1.7977E+308
max value :1.7977E+308
Number of seconds per year = 60*60*24*365.25 = 31 557 600
Numbers for 100 years :
3 155 760 000 s = 3.15576 e+9 s
3 155 760 000 000 ms = 3.15576 e+12 ms
3 155 760 000 000 000 µs = 3.15576 e+15 µs
3 155 760 000 000 000 000 ns = 3.15576 e+18 ns
3 155 760 000 000 000 000 000 ps = 3.15576 e+21 ps
Typical dynamics for a double seems to be 15 digits, after that we might
experiment precision loss.
Recommendation is to store time in QLop as microseconds since Epoch
(01-01-1970 00:00:00)
Known time description
|.Mission Name |.time var name |. units |.DEPEND
|.LABLAXIS|.FIELDNAM |.CATDESC |.Type |.VIRTUAL|.nb of
records|_.VAR_NOTES|
|Cluster FGM |time_tags__CDFNAME |ms |0 |UT |Universal Time |Interval
centred time tag |CDF_EPOCH | |normal |field missing|
|Cluster HIA |time_tags__CDFNAME |ms |0 |UT |Center Time |Interval
centred time tag |CDF_EPOCH | |normal |field missing|
||||||||||||
|Themis Efi,SCM |VARNAME_time |sec |TIME |UT |Same as time var
name|UTC, in seconds sinc 01-Jan-1970 00:00:00|CDF_DOUBLE | |normal
|Unleaped seconds|
|Themis Efi,SCM |VARNAME_epoch |field missing|0 |UT |Same as time var
name|Unrelated |CDF_EPOCH |true |0 |field missing|
|Themis Efi,SCM |VARNAME_dot0_epoch0|msec
The project proposal can be found here
h1. Offres d'emploi
Vous trouverez sur cette page les offres d'emploi passées ou courantes liées au projet SciQLOP
h1. Plot and OpenGL in QtCharts
The following test has been used to define the limits of OpenGL :
typedef struct attribute((packed)) dbl_str{
uint64_t mant:52;
uint64_t exp:11;
uint64_t sign:1;
}dbl_str;
typedef union dbl{
double dblval;
uint64_t intval;
dbl_str strval;
}dbl;
QT_CHARTS_USE_NAMESPACE
int main(int argc, char *argv[])
{
QApplication a(argc, argv);
QVector<double> timeVector; dbl offset; offset.strval.sign=0; offset.strval.exp=0b01111111111; offset.strval.mant=0b10000000000000000000000000000000000000000000000000000; for(int i = 0; i<(1<<20);i++) { timeVector.append(offset.dblval); offset.strval.mant+=1<<(52-24); } QLineSeries *seriesOGL = new QLineSeries(); seriesOGL->setUseOpenGL(true); for(int i=0;i<timeVector.count();i++) { double LUT[]={0.0,1.0,-1.0,2.0,-2.0,3.0}; seriesOGL->append(timeVector.at(i), LUT[i%6]); } Chart *chart = new Chart(); chart->legend()->hide(); chart->addSeries(seriesOGL); chart->createDefaultAxes(); chart->setTitle("Simple line chart example"); ChartView *chartView = new ChartView(chart); chartView->setRenderHint(QPainter::Antialiasing); QMainWindow window; window.setCentralWidget(chartView); window.resize(1400, 1300); window.show(); return a.exec();
}
By changing this line :
offset.strval.mant+=1<<(52-23);
to offset.strval.mant+=1<<(52-24);
we observed that the plot did not take into account any changes in the double mantissa after 28 bits (52-24),
i.e. some points are stacked because the plot cannot make the difference between two different abscissa values.
This corresponds to the size of the float mantissa.
We can then assume that the OpenGL plot uses floats.
We found the lines where the doubles are casted to floats in the QtCharts code.
This takes place in glxyseriesdata.cpp in GLXYSeriesDataManager::setPoints : each x and y of the points are casted to floats.
The new float vector is then used by glwidget.cpp in vertex and fragment source code called by GLWidget::paintGL
SciQlop Status
Priorities : Google doc link
Scientific Objectives and Performances
Functionnality
Status(% done)
SciQLOP should be portable, and an executable/setup should be available for Linux, Mac, Windows
50
Installing SciQLOP must be easy, must not require to read a documentation or install dependencies
50
When opened SciQLOP should by default reload the previous state (data products, plots, plugins, etc.)
0
SciQLOP comes with good user documentation, including galleries, examples and tutorials including video tutorials.
50
SciQLOP should propose the same functionalities on all platforms.
50
SciQLOP should propose the same performances on all platforms.
50
SciQLOP's GUI should remain light, beautiful and intuitive
50
Exploring databases, browsing data should be easier/faster than with other existing softwares
90
SciQLOP should provide efficient and transparent data browsing, access to user python routines, collaborative cataloging features with and without machine learning
0
SciQLOP should remain responsive when plotting 10 millions of points on a standard machine.
90
SciQLOP should have a limited list of data type it can handle and group them by common traits (scalars, vectors, image...).
100
SciQLOP may be able to provide features depending on the data type, source, unit or any condition/property the user may define or program.
50
Each modification of the data between the source (server) and the plot should be documented
?
SciQLOP should be used by students as a formation tool : provide users with knowledge toolkits associated to used quantities
?
SciQLOP should provide users with easy access to documentations on data, missions, instruments, etc.
90
SciQLOP should allow users to access wikipedia plasma/mission-related articles and suggest users to edit/add content.
0
SciQLOP should allow users to save and restore sessions.
0
SciQLOP user session should contain all data to restore its previous state.
0
Code redistribution
Functionnality
Status(% done)
SciQLOP source code will be GPLv2.
100
SciQLOP source folder should contain the files:README, INSTALL, COPYING, CHANGELOG.
0
All SciQLOP source files should contain GPLv2 header.
100
All SciQLOP dependencies should be compatible with GPLv2.
90
Code versioning
Functionnality
Status(% done)
SciQLOP source code modifications should have a link between features or bug corrections and code revisions
50
SciQLOP source code should be hosted on the laboratory server hephaistos1 with the mercurial version system.
100
h2.Code Writing
Functionnality
Status(% done)
SciQLOP developer's documentation, roadmap, issues, etc. will be done on the Redmine application : https://hephaistos.lpp.polytechnique.fr/redmine/projects/sciqlop
100
Unit test are developed for all modules and performed after each important merge
90
SciQLOP's code is homogeneous in its syntax and philosophy to guarantee easy maintainability.
100
SciQLOP's code is self-explanatory, comments are used to explain goals, methods rather than instructions. Comments record development flow (hacks, todos, etc.)
95
SciQLOP's code is based on the Qt framework and plotting capabilities use the QtCharts API.
100
SciQLop Modules
Functionnality
Status(% done)
Core
Database
All the native data types SciQLOP should be handled
100
Database module should allow to associate commands/functions on data/data types from GUI.
50
Database module should provide a way to view loaded products.
100
Database module GUI should show if a product is in use and by which module(Plot5, plugin2, PythonContext...).
0
Database module GUI should show which plugin provide the data .
0
Origin of the data should be associated with metadata. Ex: if product comes from a file, display the path and file metadata (dates, size, permissions etc.). If from datadownloader, display which one, i.e. amda, cdaweb etc.
0
Data products are accessible through a tree dynamically built with a filter
100
Users can easily access metadata on data products (mission, resolution, unit, etc.) by interacting with the tree
95
Default tree representations are proposed to the user - TBD
?
Caveat and other useful information usually present in headers are available from the database's GUI
0
Database representation should highlight data products when selected on plots
0
Any data imported into the user session, whether it is from a file, or from the Space data Module must be added to the database
0
Garbage collector - TBD
?
Each data product in the database acquires a unique identifier
100
Data products loaded from into the database are not mutable.
100
Users should be able to delete a product from the database through contextual menu
100
Deleting a product from the Database checks whether the product is still in use. If so warns the user that all plots will be updated accordingly
95 don't warn
Multiple occurences of the same dataproducts (i.e. database entry differing by their UID, and possibly time interval) are grouped under the same data product name in the tree and appear by their UID/plot
0
Database also show the currently defined variables in the python session, in a separate tab. Although python variables, like data products, contain data, they differ by being mutable and not associated with rich metadata such as origin, etc
0
Users can manually add a python function to the database in the context menu
0
Data Downloader
Data downloader should be a singleton.
100
Data downloader may implement most protocols.(HTTP, FTP, WebDav...)
?
Data downloader should handle proxy server.
100
Data downloader should know when data asked by the user is locally accessible already.
0
Python Engine
SciQLOP functionalities are scriptable through the ipython terminal.
0
Users can interact with the database through python, he may be able to pull data from database to python or push data from python to database.
0
Users can interact with any plot with python
0
User can grab data from plot or plot data with python
0
Data in the python terminal is implemented as nimpy array and DataFrames objects, with time as index
0
GUI Manager
GUI manager should be a singleton
0
GUI manager may provide an interface to populate menus from any module
0
Catalogs & Community functionalities
Plot
General
SciQLOP should be able to plot all kinds of data relevant to space plasma missions(scalar, vector, tensor, spectrogram
100
Each plot element (context : colorbars, ticks, titles, labels etc. ; data : lines, style, dots, etc.) should be clickable and customizable
40
Controls on figure items should be identical in all kinds of plots (e.g. changing title, labels etc. is done identically for all kinds of plots).
100
A plot “style” (context and data style) can be copy/pasted onto another plot
0
Users may define custom plot styles and save/reload them as desired
0
Plots should provide visualization of available metadata (instrument modes, etc.)
0
Each plot/view should provide the context menu associated to the data type.
50
Each plot/view should allow the user to select a portion of data and perform specific operations on it, such as labelling it, building new plots or applying data analysis methods on it (e.g. get statistical quantities such as mean/std, etc.).
50
Data can be selected by dragging a zone onto the plot: Time interval as rectangle for time series for example. A box in 3d plot/view for example.
0
For time dependent data the plot/view may ask for more data when the user scrolls to the borders of the plot/view.
0
Users should experience no significant lag related to downloading data associated with scrolling windows
0
Each plot/view should allow the integration of custom controls (user defined gui widgets) provided by the current plotted data. The control may be provided as a QWidget.
0
Each plot/view should be easily duplicable.
0
Figures/plots can be linked together or not.
100
Linked plots will zoom/unzoom, scroll together (horizontally)
100
A Lens effect enables the user to zoom on a part of the data without changing the plot range (like a lens effect on Gimp-like software). This can be made by adding a plot window overlapped onto the plot.
0
Any plot should come with a default legend, customizable by the user
100
Any plot can be dragged to a folder and exported as an image (eps, png, jpg)
0
User can select multiple plots holding shift key and drag them onto a folder to export images. SciQLOP ask the user whether multiple images or a summary plot should be done.
0
Mouse wheel can be used to scroll in time when the figure is selected.
100
Plot adjustments controls appear on specific locations on mouse hover. This keeps the interface clear but still customizable.
0
Scalar Time series
Contextual menu should give users rapid access to statistical functionalities such as mean(), std().
0
Data holes are handled in line style mode as a line segment.
100
Line width can be adjusting by clicking+mouse wheel
0
Vector Time series
SciQLOP can plot one or more components of a vector as a function of time
0
SciQLOP is aware the quantity is part of a vector. If Bx is plotted it knows which products represent By and Bz and can easily add them on the plot.
0
The contextual menu on a vector plot should allow users to easily transform their vector into another basis, user-provided and generic basis such as GSE, GSM, etc. or MVA.
0
Vectors displayed on linked plots will be represented in the same basis dynamically (one change of basis on one of the linked plot will change all of them)
0
Vectors can be plotted in 3D or 2D-projection as 3D vectors as a function of time. This representation will be useful for visualizing aspects specific to vectors such as change of direction (e.g. waves & discontinuities) not easily viewable in the time series format.
0
Spectrograms
SciQLOP should enable the same plot/context functionalities for all spectrograms, no matter the nature of the data (particles or waves), this include color range, colormaps, etc.
100
Characteristic frequencies should be overplottable as time series.
0
Using WAMP for theoretical wave frequencies and damping should be possible
0
Plasma eigenmodes visualization toolkit can be called from the spectrograms contextual menu
0
Data holes are handled by showing the background color/texture
100
By clicking on the colorbar, color range can easily be adjusted from cursors appearing on the colorbar on mouse hover
0
Orbits
SciQLOP provides the user with three 2D projections of the magnetosphere/heliosphere and its key regions
0
Heliosphere/Magnetospheric plot use different models (Tsyganenko, shue, Parker Spiral, etc.) easily interchangeable by the user.
0
Orbits of the satellites from which data is currently being viewed appears overplotted on the three 2D magnetospheric/heliosphere projections
0
The position of the spacecraft for the considered intervals must be clearly visible on the trajectory with a colored portion of the orbit, the color being associated to a particular plot (or set of linked plots).
0
Changing the interval range of a spacecraft on its orbit must change the plot time range accordingly and dynamically.
0
SciQLOP should offer an interoperability with CDPP/3DView to view orbits in 3D
0
SpaceData
General
Satellites Data provider should provide an access to CDAWeb data over REST protocol.
100
Satellites Data provider may allow advanced pattern filtering for fast data retrieval
10
Satellites Data provider should allow generic search/filtering among data fields.
100
Satellites Data provider search/filter should accept regular expressions
?
App
Machine Learning
h1. Testlatex
$\frac{x2}{\sqrt(\cos(x)}$
\begin{equation}
\frac{x2}{\sqrt(\cos(x)}
\end{equation}
SciQLOP (SCIentific Qt application for Learning from Observations of Plasmas) is an ergonomic and powerful tool enabling visualization and analysis of in situ spacecraft plasma data.
You can read the user guide.
Other ressources :
Visit the SciQLOP download page
brew install qt
brew install meson
vim ~/.bashrc
export PATH=/usr/local/Cellar/qt/5.9.2/bin:$PATH
source ~/.bashrc
mkdir /myPath/SciQLOP
git clone https://hephaistos.lpp.polytechnique.fr/rhodecode/HG_REPOSITORIES/LPP/SciQLOP_Repos/SciQLop /myPath/SciQLOP
/myPath/SciQLOP/build_cfg/mac/build.sh
ScipQLOP App is in /tmp !