Jefferson Lab tests a next-generation data acquisition scheme
Nuclear physics experiments worldwide are becoming ever more data intensive as researchers probe ever more deeply into the heart of matter. To get a better handle on the data, nuclear physicists are now turning to artificial intelligence and machine learning methods to help sift through the torrent in real-time.
A recent test of two systems that employ such methods at the U.S. Department of Energy's Thomas Jefferson National Accelerator Facility found that they can, indeed, enable real-time processing of raw data. Such systems could result in a streamlined data analysis process that is faster and more efficient, while also keeping more of the original data for future analysis than conventional systems. An article describing this work was recently published in The European Physical Journal Plus.
Nuclear physics is demanding, and it’s getting more so every year. Advances in accelerators demand ever-more powerful software and computing resources to make sense of the extreme amounts of raw data that experiments produce.
Nowhere is this more true than at Jefferson Lab, home of the powerful Continuous Electron Beam Accelerator Facility, known as CEBAF. The CEBAF is a DOE Office of Science user facility that blasts particle beams at specially chosen targets for study. The beam-target collisions trigger a cascade of subatomic particles. Much like a mighty microscope, these collisions probe deep into protons and neutrons to study quarks and gluons — the building blocks of the universe.
CEBAF has the highest luminosity in the world – no other particle accelerator generates more collisions. Because particle detectors record information about each cascade of subatomic particles — thousands of times per second — these experiments generate enormous amounts of raw data. During experiments, Jefferson Lab’s four experimental halls combined can generate more than 90 terabytes of data every day.
Now, Jefferson Lab is on the leading edge of harnessing artificial intelligence and machine learning for developing a revolutionary, smart, software-based streaming readout system (SRO) that can process — in real time — the vast amounts of data that nuclear physics experiments produce.
A better handle on more data
To reign in the amount of data that must be analyzed, nuclear physicists have relied on hardware-based “triggered” systems to help them pre-sort data and keep only the information they are looking for. These systems record only the data that occur a short time after the system picks up certain particles or events. Such systems have been reliable workhorses in nuclear physics for half a century.
Software-based streaming readout systems may feature a less complex physical infrastructure, yet they promise to be far more powerful, efficient, faster and flexible. An SRO scheme maximizes the amount of physics that can be extracted from an experiment, from initial decisions about which data to save to stimulating the physics in very complex detectors that can have millions of active readout channels.
Streaming readout is next-generation data acquisition. In experiments, it streams all data from each detector to a data center to be analyzed, tagged and filtered. The SRO automatically sifts through the enormous amount of data and makes initial decisions about which data should be saved from a detector’s hundreds to millions of active readout channels. Ultimately, the unnecessary background is filtered out, and the interesting bits are recorded.
All of the frontend work done by an SRO means that the actual analysis of data can take a fraction of the time. SROs also offer other advantages.
For physicist and computer scientist David Lawrence, head of the lab’s Experimental Physics Software and Computing Infrastructure (EPSCI) group, the biggest advantage of SROs is having a holistic picture of data or events when deciding which data to keep.
“By having the streaming picture,” Lawrence said, “we can look at the whole of the event instead of just triggering on some small part of it and hope that we’re capturing all of the good events and not losing anything.”
And for CEBAF’s superior luminosity, a streaming readout system has special appeal.
“One of the things that streaming readout really opens the door for is getting even higher luminosity — being able to disentangle events that may be starting to overlap a little, so you can pull out the really rare reactions that are taking place,” said Lawrence.
Still camera versus video camera
To demonstrate how a streaming readout framework would work compared to a triggered system, Jefferson Lab ran tests in two of its experimental halls in 2020.
To the physicists conducting the tests, the results came as no surprise.
“It was more a proof of principle rather than a true demonstration,” said Marco Battaglieri, senior scientist at the Istituto Nazionale Fisica Nucleare (INFN) in Italy. At the time the tests were conducted, Battaglieri was a senior staff scientist at Jefferson Lab.
The difference between using a triggered or a streaming readout system, he said, is much like the difference between using a still photo camera or a video camera to record a horse race.
“If you were to start today to design a data acquisition framework, you’d probably think about having a streaming mode,” Battaglieri said.
The sentiment isn’t merely academic. The tests were also a key part of Jefferson Lab’s ongoing efforts to develop and validate a streaming readout solution for the $2.6 billion Electron-Ion Collider, or EIC. The EIC is a next-generation facility to be built at DOE’s Brookhaven National Laboratory in New York.
The EIC is on the frontier of fundamental physics, critical to the future of research and particle accelerator technology around the world, and Jefferson Lab is a major partner in the project.
Because the EIC will be built from scratch over the next decade, it’s not bound to existing computing infrastructure. So streaming readout, Battaglieri said, is the clear option.
‘On the front edge’
The SRO tests were conducted in Jefferson Lab’s Experimental Halls B and D, said Mariangela Bondi, technologist at the INFN.
Hall D used an EIC calorimeter prototype, while Hall B used a more sophisticated detector with more channels. The tests included a high-level analysis framework called JANA2, funded under the lab’s Laboratory Directed Research and Development (LDRD) initiative. LDRD encourages small projects that are on the forefront of science and technology.
In the spring and summer, researchers ran on-beam tests of increasing complexity in each hall, Bondi said, “and both of them showed very positive results.”
The tests demonstrated that each SRO performed as well as expected in comparison with traditional data acquisition systems. The tests also provided evidence that SROs also do have the potential to exhibit superior performance in data analysis and reconstruction of events. An article describing their work and future perspectives was published in The European Physical Journal Plus.
The tests are also considered the first step toward developing an optimized framework for a streaming readout system at Jefferson Lab, with Hall B as the test bed.
“I’m extremely excited about this,” Lawrence said. “It’s being on the front edge of some new technology that we know is going to be what I would say is a standard operating procedure a generation from now.
“There are a lot of us that get very excited about not just the science, but the technical challenge of implementing this kind of thing. It’s a hard thing to do, but it’s very satisfying when you can get it all to work.”
Their efforts could not only advance physics by getting “more science per minute” out of accelerator operation, Lawrence said, but could also benefit the overall evolution of AI and machine learning that’s booming around the globe.
Streaming readout is also another step toward perfecting self-driving machines, and could spin off into other fields that require great precision and millisecond reaction times, such as medical applications and radiation therapies.
Further Reading
Journal Article: Streaming readout for next generation electron scattering experiments
By Tamara Dietrich
Contact: Kandice Carter, Jefferson Lab Communications Office, kcarter@jlab.org