This is a slightly different take on the discussion of Design Concept #1. I think there is nothing that actually conflict between these, and I intend to work this into a more complete and whole design.
Framework for Data Collection Control Software
This is a draft design specification for software for the control and collection of experimental data at synchrotrons. The initial scope of the project is to replace data collection software for x-ray spectroscopy and microprobe beamlines at the APS and other North American synchrotrons, but there are also desires to provide a solution more general to other x-ray techniques and to facilitate better workflow to data visualization and analysis methods. This second desire for a more comprehensive solution includes needs both at the beamline and for later, more detailed analysis. This paper describes the motivation for one such approach, and discusses some of the design criteria and proposed implementation.
To limit the scope of this paper somewhat, the discussion here is focused on "end user data collection" program(s), and not so much on hardware, hardware interfaces, or lower-level control systems. It is assumed throughout this paper that the EPICS control system can be used as an underlying control system. While EPICS is not used universally it is common at synchrotrons, and so provides a good starting point as a discussion of hardware interfaces. The stategies and solutions proposed here are not meant to be EPICS-specific, and it should become clear that using other systems should be possible.
Background and Motivation
There is a need for robust, easy to use software to control experiments at synchrotrons. There are currently many solutions, most of them home-built and specific to a particular beamline or facility. This is particularly true for EXAFS, x-ray spectroscopy, and x-ray microprobe beamlines. At the APS, for example, there is very nearly one control program in use per EXAFS and microprobe beamline. At other sources the beamlines may have a more uniform interface, but there is not uniformity across facilities.
The situation for diffraction beamlines is somewhat better as many of these use Spec as a common control program. Spec is widely used t man beamlines at the APS and around the world, and is designed for controling diffractometers. It hould be seen as a system that we can learn from, but Spec is very old, and not without problems, so we should not limit ourselves to "reinventing Spec".
In addition, protein crystallographers seem to have come to a common GUI interface for controlling experiments with Blu-ICE. I admit to being a less familiar with this than I shoud be, but it appears to provide a common interface for end-users across many similar beamlines, even though it may use different lower-level control systems. This is an important achievement to emulate.
In some sense, the goals here are more ambitious than emulating Spec or Blu-ICE for x-ray spectroscopy and microprobe beamlines. This is partially because the measurements done at microprobe beamlines can be quite diverse, including many x-ray technique (XRF, XRD, XAFS of several elements), and so a wider range of scan types (motor scans, energy scans) and detector types within a single experiment. We want a system that is flexible enough to allow end-users to do all of these measurements easily but system that is flexible enough to do relatively complicated experiments. We also want configuration and customization for a particular beamline to be failry straightforward.
I would also like to push the data collection software to include better "flow" for data collection, visualization, and on-line analysis. Traditionally these have been left as different domains, partly because of computer resource limitations. I think the consensus is that the easier it is to analyze data during data collection, the better off we are. There is a wide diversity in data visualization and analysis tools available. Like data collection programs, most of these are written "in house" or by scientists and so have issues with longevity and support.
Somewhat related to this, since the data collection program initiates the flow off data and the data visualization, processing, and analysis programs need to make use of the data as exists, it is important for the data collection system to have well-designed data formats. In addition, I would like to see the "experiment" or "beam run" be viewed as a set of related data. Currently, data files (mostly ASCII or detector-specific) are written for individual measurement, without much effort to synchronize or organize data files. This makes it too easy to lose data or the "meta" information about the experiment.
Design Requirements, Goals, and Hopes for Data Collection Software
The principle goals here are to provide a single software system that can be used at many XAFS and microprobe beamlines around the world, and also be capable of running other beamlines. Given that, these sets of requirements are probably pretty obvious:
- Simple 1 dimensional step scans of motors
- Simple 2 dimensional step scans of motors (maps)
- XAFS scans in "step mode" (complicated set of energy points and integration times)
- Motor and XAFS scans in "slew mode"
- Support for multi-element MCA for XRF spectra, saved at each point in scan.
- Support for fast-scans with hardware (eg Struck)
- Support for video image as data
- XRD camera data.
In addition there are several "highly desirables" that I would call requirements:
- Ability to Script with a simple macro language
- Simple but flexible configuration and setup of a particular beamline, with custom controls per beamline (or even experiment).
- Being able to "Save Positions" or Settings of various beamline components or parameters.
- Automatic logging with timestamps) of values of selected beamline components (ring current, mono temperature, etc). This "external data" is data that is not normally collected as part of the measurement, but can be very useful for diagnostics -- it can also be used to save state information (amplifier settings for detectors, sample name, etc) that does not change "rapidly" (see Note on Timing below)
- Remote access (with security!) that can interact with an existing experiment. Ideally, multiple clients on different networks should be able to control the experiment.
- GUI client that can combine video images of sample with beamline control and data collection.
Overview of Proposed Implementation
The solution I'm proposing consists of three key ingredients.
First, the solution must use a Client/Server arrangement to separate low-level control and acquisition from the client that requests the collection. This simplifies the design and make the clients more flexible in their design. It can allow multiple clients (in different languages, on different computer, etc) to talk to a single server, so that remote access can be possible. It also keeps all data collection pinned to a single program (the Server) that can focus on data synchronization and formatting. For the most part, this document focuses on the Server and the communication between Clients and Server.
Second, the Server executes commands that are in a simple macro language -- a domain specific language. This means that the Clients need only to send commands in the macro language to the Server. I believe this will make the clients easier to write (and customize), and allows a "Spec-like" macro language to be used in "expert mode". It also provides a route toward including visualization and analysis software, both on-line and off-line, as the macro language can be expanded to be the center-point for accessing those functionalities as well. Specifically, I propose using the python-like, python-derived TDL (really needs a better name) that Tom Trainor and I came up with. It easily beats the "Spec language", is fully-featured, and can fall back to python when needed. More details will be give elsewhere.
Third, the Server makes heavy use of a SQL-enabled relational database, which will include all "commands" sent from client to server, all "state information" about the experiment, and a substational amount of "external data" kept in the database. It would be possible to have *all* communication between Client and Server use such a database, using a Relational Database Managament System (RDMS), itself a client/server architecture. As discussed below (in Client/Server Communication) this is not the only possibility for communication. Still, this reliance on a database is, perhaps, the most radical ingredient in this approach, and is discussed in more detail below (in Why a Relational Database). There are a few advantages to using an SQL relational database server:
- Client-Server communication can be dead-simple. The client generates "command" strings to send to the server, and reads the database for state information. the server reacts to "commands" in the database, and writes to state information.
- Multiple-clients can be easily handled without conflicts. Clients can use many models (GUI, web) and toolsets.
- Security can be handled by the RDMS
- Logging and holding of "metadata" is held in the database.
- Long Macros are broken into several "commands" which can then be interrupted, and altered after a sequence has begun.
By storing "metadata" in a database for an entire set of experiments (say, a beam run), this can also act as part of the dataset for the visualization and analysis system.
There is an additional ingredient that is also perhaps somewhat unique and I'd like to include here. This is the idea of Instruments, which are simply logically grouping different motors or other control parameters into a named Instruments, so that settings of these parameters can be saved (by a name) and be restored as a group. For Epics, and Instrument is collection of PVs that may or may not be in the same record or of the same type.
A simple example Instrument would be a sample stage, consisting of 3 motors: X, Y, and Z. Defining an Instrument "Sample Stage" would allow saving the group of positions for all motors by a single name "Position 1", "My Sample", and so on. A key piece here is that these settings easily go into a relational database. In an Epics environment, the Instruments could be any group of PVs, so that setting an amplifier gain could be implemented as picking an instrument name. Within the constructs of the macro language, one could have a function like:
EpicsInstrument_Define("Sample Stage", x='13IDC:m1', y='13IDC:m2', z='13IDC:m3')to define the instrument. Saving the current position could be
EpicsInstrument_Save("Sample Stage", "this position", use_current=True)
or
EpicsInstrument_Save("Sample Stage", "position 2", x=22.000, y=33.130, z=0.100 )
Restoring to an old setting (ie, moving the motors to a saved position) could be
EpicsInstrument_Restore("Sample Stage", "position 2")
Of course, an instrument would be defined early in the process by the beamline scientist, and the other functions wrapped more simply:
SampleStage_Save("position 1")
SampleStage_MoveTo("position 2")This ingredient may be viewed as an implementaton detail, but I think it is important to mention here, as it allows for simple but flexible setup by the beamline scientist, and provides an easy client interface to experiment setup. In fact, there could be GUI-versions of these Instrument functions, so that a simple widget pane for managing an Instrument (saving, restoring settings) can automatically be generated from an instrument name
Disussion of Principle Components and Technologies
SQL based RDBMS: For data collection, the full-server relational database management systems such as MySQL or Postgresql are most appropriate. It is not so much the full relational model of the database that is key here, but the features of a) server/multiple clients, b) data integrity, and c) transactional capabilities that are most important. The relational aspects essentially allow the data to be searched in arbitrary ways, and is a bonus. As an important side note, SQLite is not appropriate for this use. As tempting as SQLite is (and a likely candidate for visualization/analysis tools), it simply cannot handle networked access.
MySQL and Postgres require a server to be running, and so are appropriate for data collection, where the collecting machine can run such a server along with the data collection process(es). Once collected, however, the data should be transferred to an SQLite-based database (single file, no server, no additional access control) for off-line data analysis. That is, the data analysis and visualization programs should not care which RDBMS system is used. In addition, the data held in the databases should be easily transferable between RDBMS systems. This does slightly limit the kinds of data and DB features that can be used, but this should not be significant.
Why a Relational Database
Basically, because we want to be able to store state information, meta data and share this with multiple clients, and then save all the data for an experiment for later use. It would be conceivable to store all data in a relational database, but I think this is not the best approach. Instead, the "raw" data should be stored in individual files from various detectors or scans, and the database should hold pointers to these files.
Client / Server Communication Protocols
There are several accepted methods for client/server communications. This is one detail that I am still unsure about. I think this cannot be viewed as an implementation detail, as the communication is central to the whole system. We have these needs (at least):
- Multiple concurrent clients
- Clients from differnt networks.
- Clients written in different languages.
- The ability to communicate complex data two ways.
Give these requirements, using a simple socket approach to write our own system is probably too fragile to seriously consider. The most obvious approach might then be to use XML-RPC. I am sure that XML-RPC can work. We've done the tests. But this approach is not perfect, and at least one other possibility exists.
The most interesting possibility is to use the SQL RDMS directly. That is, have the Clients write commands into a table in the Database and the Server watching that database table for what to do next. Any Client can get state information from other database tables that the Server writes too, and multiple Clients can see what other clients have requested from the Server. This has some appeal, but is also not perfect.
Other possibilities are probably not worth discussing in detail: Using Epics directly, using sockets directly, etc.
But I think we need to think about XML-RPC vs. SQL. Here are some pros and cons for the two approaches:
XML-RPC Pros:
- any language.
- complex calls into many functions provided by Server.
- network enabled.
- securable by port number.
- two-way transfer of data easy.
XML-RPC Cons:
- stateless transport mechanism with HTTP.
- not richly securable.
- difficult to see other clients.
SQL Pros:
- any language.
- network enabled.
- rich security available
- two-way transfer of data is not too hard.
- state-ful communication -- anyone
SQL Cons:
- need RDMS running with Server.
Since we *want* to store commands for archival purposes and for other clients to see, an implementation using XML-RPC alone seems unnecessarily difficult.
For both these methods, getting the actual scientific data (not just the metadata) from the Server is a bit trickier, as it involves larger amounts of dat and binary file formats. Here, XML-RPC is probably slightly better. A related discussion is how to handle communication for visualization and analysis programs. For that application, it's not clear that SQL database is needed, and requiring a RDMS is a big disadvantage. SQLite may be suitable, but it's not clear that it is preferred over XML-RPC.
It is conceivable to have both XML-RPC communication and direct access to the SQL database. Currently, I'm leaning toward SQL-only for data collection and XML-RPC for data analysis (using the data from the RDMS translated to an SQLite database for stored metadata). This is still complicated, and I'm not commited to this.
Note on Timing
For some experiments and applications, precise and fast timing is a requirement. In general, this is hard to do with software control alone, especially for a GUI application controlling an experiment. In Epics, for example, network latencies and record processing mean that reliable timing below 10Hz is uncommon. To be this rate, a hardware-only or hardware+dedicated control interface is needed.
The discussion here (including the proposed solution) avoids this altogether. If a solution for fast collection is available, it can be used. That is, the Server is not going to be the dedicated control interface for fast acquisition, but it can initiate a fast collection and read the data when it is completed.
This will be true for some other data collections too: the Server may simply initiates the process and look for completion of collection, but not actually collect the data.
In any case, it should not be assumed that anything in the Server happens faster than 10Hz. It is probably safe to assume that responing at 1 Hz will happen reliably.
Data files and formats
As discussed above, metadata about the experiment as a whole should be kept in a sqlite-compliant database. It is tempting to think that all the data should be kept in a database, but I think this is impractical. First, many cameras and diffraction area detectors essentially write out there own data files. In addition, many of these data sets are large -- a few Mb each. Many of these files are in standard formats (jpg, tiff). For other data collections, use of "nearly standard" HDF5 and/or NeXuS formats should be explored. I think the data collection program could spend a great deal of effort trying to get this raw data into the database, where it would sit only for retrieval as a single entity. That is, the data for a particular image or scan has a simple relation to the "metadata" that can be fully captured by storing the name of the disk file in the database.
This is more or less the "iTunes" model: the Metadata about songs and artists is held in an sqlite database points to disk files of the actual data (mp3, etc). Emphasizing the use of HDF5 files with common set of tags would seem to provide "uniform enough" data sets for visualization and analysis programs.
GUI client for data collection
For the end user, the GUI client at the beamline will be the principle interaction with the data collection system. Therefore, this interface program needs to be a carefully constructed to be easy to use, while exposing the specific beamline functionality.
Sticking with a microprobe/EXAFS beamline as the generic starting point, the GUI should have windows or tabs that allow the following:
- video image of sample (possibly more than one view). Ideally, this can be interactive and tied to motion of the Sample Stage so that Stage coordinates can be correlated with image pixel. That allowse choosing sample map regions, or "go to this spot" with mouse clicks.
- The ability to bring up a configuration window for a defined Instrument, with controls to adjust individual components and to save, restore, and browse Instrument Positions.
- a simple setup window for 2-dimensional maps
- a simple setup window for Energy scans (with EXAFS scan regions)
- ideally, scan parameters could be reloaded by name ("Big Map", "Zn XANES", etc) or from a previous scan.
- Quasi-live display of scan data during collection.
- Window for assisted Macro writing
In addition, this needs to be relative easy to customize for the beamline scientist. Here, I recommend making heavy use of the scripting language capabilites and Instrument concept to allow a fairly universal GUI application to "discover" what Instruments to show and what functionality to present to the end user.