Difference between revisions of "SoLID Software idea"

From solidwiki
Jump to: navigation, search
(Created 03/27/15(Zwzhao), Last modified 06/29/15(Rsholmes))
(Blanked the page)
Line 1: Line 1:
SoLID software idea page
feel free to record any thing you want to share
= general =
= simulation =
* [https://hallaweb.jlab.org/wiki/images/b/b7/Trajectories_and_replayability.pdf Trajectories and replayability] (Rich Holmes)
* Based on initial discussions on simulation I have added a summary of simulation requirements [https://hallaweb.jlab.org/wiki/images/e/e5/Solid_simulation_req.pdf Solid Simulation Req.] (Rakitha Beminiwattha)
* [https://hallaweb.jlab.org/wiki/images/a/a0/Simulation_output.pdf Simulation output] Initial ideas on data for output trees (Rich Holmes)
* some thoughts from Zhiwen Zhao
** For the output tree format, I think we definitely need a tree or a bank contain the event info. But the critical thing is that each sub-detector should be able to define it easily and freely. They can have very different truth info and digitization and the requirement can be different at different stage of study.
** For background mixing, the challenge is that different background have very different rates. For example, we can have photon from pion0 more than an order larger than pion+. Producing a file with enough photons from pion0 may have too little pion+ to do study with  meaningful statistics. If we increase number of events by an order, we have enough pion+, but wasting a lot computing time on pion0.  The things goes to extreme when combine low energy Em background with hadron background.  Therefore, running geant4 with all physics turned on only gives huge amount of low energy EM background, but little hadron background. So far, what we did is to run all background separately, then record background rate at entrance of detectors. Then do a separate simulation again for the detector with background thrown onto it accordingly.  This maybe ok for single detector study, but is not good for combined detector and DAQ study. We need to think of a solution from the top.
* thoughts from Thomas K Hemmick
** I like that fact that the scope of the discussion has shifted to the over-arching issues.  I agree with both Zhiwen that flexibility is critical and with Richard that there is such a thing as "enough rope to hang yourself".  We'll need judgement to decide the right number of rules since too many and too few are both bad.
** Although my favorite means of enforcing rules is via the compiler (...you WILL implement ALL the purely virtual methods...), there is typically not enough oomph in that area to make all our designs self-fulfilling.
** Another way to approach this same discussion is to list the things that can (will) go wrong if we don't address them at the architecture level.  Here is a short list from me:
*** 1)  The simulation geometry is different from the real geometry.
*** 2)  The offline code applies corrections that fix real data but ruin simulated data.
*** 3)  I just got a whole set of simulated events from someone's directory, but they can't tell me the details on the conditions used for the simulations so I cannot be sure if I can use them.
*** 4)  The detector has evolved to use new systems and I need different codes to look at simulated (and real) events from these two periods.
*** 5)  I've lost track of how the hits relate to the simulation.
*** 6)  Sometimes the hit list for a reconstructed track includes a few noise hits that were pretty much aligned with the simulated track.  Do I consider this track as reconstructed or not?
*** 7)  The simulation is too good.  How should I smear out the simulated data to match the true detector performance.  Is there a standard architecture for accomplishing this?
*** 8)  My friend wrote some code to apply fine corrections/calibrations to the data.  I don't know whether I should or should not do this to simulated data. 
*** 9)  Now that we're a couple of years in, I realize that to analyze my old dataset, I need some form of a hybrid of code:  progression in some places, but old standard stuff in others.  What do I do?
*** 10)  Are we (as a collaboration) convinced that all the recent changes are correct?  Do we have a system for tracking performance instead of just lines of code change (e.g. benchmark code that double-checks impact of offline changes to simulations via nightly generating performance metric runs).
** To my opinion, this long list actually requires only a few design paradigms (the fewer the better) to address it and avoid these issues by design.  One example of a design paradigm is that the simulated output is required (administrative rule) to be self describing.  We then define what it means to self-describe in terms of the information content (e.g. events, background, geometry, calibration assumptions, ...) and designate a (1) SIMPLE, (2) Universal, (3) Extensible format by which we will incorporate self-description into our files.  Another example of a design paradigm is portability.  We'll have to define portability (even to your laptop?) and then architect a solution.  PHENIX did poorly in this by not distinguishing between "core routines" (that open ANY file and then understand/unpack its standard content) and "all other code" (daq, online monitoring, simulation, reconstruction, analysis). 
** Here is an example set of few rules that can accomplish a lot:
*** (1)  Although there are many formats for input information and output information, only one "in memory" format can be allowed for any single piece of information.  Input translates to this and output translates from this.
*** (2)  All "in memory" formats for geometry, calibration, and the like will be built with streamers so that these have the *option* of being included in our output files.  The intention is that an output file containing the streamed geometry and calibration would be completely self describing and thereby require NO reference to external resources (static files, databases, etc...) when used by others.
*** (3)  The "in memory" formats will explicitly be coded to allow for schema evolution so that old files are compatible with new code.
** Tools like GEMC are, to my opinion, small internal engines that fit within our over-arching framework, as indicated by Rakitha.  To me, these are all basically the same and I have literally no preference.  The key is how we fit such tools into an overall picture of the COMPLETE computing task that will lie before us.
* [https://hallaweb.jlab.org/wiki/images/e/e8/Baffle_parameterization.pdf Baffle parameterization] (Rich Holmes)
** In the present version of the Perl scripts that define the More1 baffles1 the baffle geometry is defined by a set of 950 parameters. In order to develop and study variant baffle geometries, a simpler parameterization is desirable. Fortunately these parameters are exremely redundant and can be reduced to a much smaller set. In the context of a future SoLID software framework a smaller set like this could be regarded as a core description of the geometry, which the simulation and tracking code would use to construct their internal representations of the baffles.
* [https://hallaweb.jlab.org/wiki/images/4/40/Baffles_from_external_parameters.pdf Baffles from external parameters] (Rich Holmes)
** I've been familiarizing myself with GEMC 2's provisions for using external parameters to define geometry. As a case study I have created a parameter-driven version of the More1 baffles Perl script. Following are a description of this baffle definition and some observations on its use. Issues regarding incorporation of such parameter-driven geometry into an overall software framework are discussed.
= reconstruction =

Latest revision as of 19:44, 3 December 2020