Software Coordinating Committee Conference Call February 22, 2007 1:00 PM EST Recorder: C. DeTar Present: Brower, DeTar, Levkova, Pochinski, Scholz, Joo, Edwards Gottlieb, Fowler, Simone, Holmgren, Efstathiadis, Mawhinney, Basak, Clark, Jung, Absent: Watson, Osborn, Zhang, Khoriaty, Renner, ===================================================================== ** Action items Preamble COMMENT: Please send me slides to post on http://super.bu.edu/~brower/workshop This website is in our progress report to the DOE so it is good to have it as complete as possible. NEWS: David Richards and I were at the SciDAC kickoff meeting Feb 5,6 (see http://outreach.scidac.gov/kickoff/). All and all quite interesting. Clearly a major focus was to see how the SAPs and C&I can link up and collaborate. TRANSLATION: SAPs = Scientific Application Partnerships (This is what we are!) C&I = Centers for Enabling Tech & SciDAC Institutes (new SciDAC2 idea) There are many potential opportunties in Performace (PERI), Alogrithms (TOPS), Graphics (Visualization & Data Management), Threading technology (Computer Science Institute), Training (SciDAC Outreach Center),etc Too many in fact but we should exploit these links when they are of mutual advantage. In fact we are already doing this to a considerable degree so we should advertise this. This is certainly part of the new SciDAC paradigm, so it "pays" to get with the program. Fowler: As far as PERI is concerned... Chroma is the poster child for hard-to-analyze performance. There is one MS thesis based on this and possibly a paper. In one test 59% of time was spent in template code. The Livermore team is working on the analysis. Pat Worley is the driver. Whether we will be able to improve performance remains to be seen. Brower: There is a group working on security. They could help us with a Wiki page. They might also help us with a training workshop. See http://outreach.scidac.gov/ ============= Agenda ================= 1. Plan for miminal CRE implementation: (Results of meeting by Chip/Balint, Don, Statos) Balint: We discussed a common documentation environment as opposed to a common runtime document. We would standardize where it is easy to do so. e.g. we wanted to unify the actual run command. We agreed to provide generic builds of the various SciDAC packages. DeTar: The idea was to have further standardization be driven by the user, rather than by management. 2. News on Asqtad/RHMC on QCDOC? (Chulwoo, Carleton) DeTar: We are making progress. Jung: We sorted out the problems with switchable precision. ** We also needed input from Mike Clark. 2b. Discussion of a propagator format standard. Mawhinney: We would like to set a standard for propagator storage. Edwards: As for the binary format, we store all 12 spin indices of sources. DeTar: That wouldn't be desirable for our big lattices because of memory problems. We would want to split the propagator into one record per source Mawhinney: To specify the source, what if we created a catalog that describes the source and use an index: source 1, source 2, etc. to identify which one we use. DeTar: What if we kept the source in the file as a record, unless it is really trivial? Then the metadata that goes with the source could describe the parameters used to build it. Edwards: Our propagator metadata has all the gauge field metadata and lots of other data about smearing. We can reconstruct the propagator from information in the file. DeTar: We should pick a standard place in the file to put the metadata and allow collaborations to do what they want there. Then we need a place to publish the explanation of the metadata in enough detail that someone else could reproduce it. DeTar: We should have a subcommittee look at this. Perhaps Robert, Bob, Jim and me. Mawhinney: It is a matter of urgency for us. DeTar: We should at least decide on the storage order and the place to put the metadata. I'd hope we could do that quickly. Edwards ** I'll send around the lime contents of our file and description of the storage order. 3. Shared Memory Model for multi-core (Don) 4. QMT Mult-Treading interface design and prototyping: (see QMT slides on http://super.bu.edu/~brower/workshop or http:/super.bu.edu/~brower/scc ) Edwards: Jie has been working on QMT, but I haven't spoken with him lately. Our near erm exercise was to test performance of the Wilson Dslash and compare MPI on separate cores vs threaded. Brower: We have to look at the far term as well where we will have dozens of cores. Holmgren: On Opterons you have to make sure you use local memory or you take a hit. We have been looking at simple loops and comparing openMP and locking processors. You have to touch each page of memory before starting on a thread. Edwards: Have you talked with Jie? Homgren: ** Not yet? I will. Fowler I'd like to volunteer UNC folks to look at performance. We have done benchmarking and allocation experiments and are aware of the problem with local memory. As the number of processors increases, the ratio of local to remote goes down. The hypertransport coherence protocol is not very scalable. Gottlieb: Is there any literature? Fowler: No. It depends on the motherboard. John Miller Cromey (?) from Rice is working on multicore optimization. I'll ask him whether they could help. Brower: He mentioned SILK as a lightweight method to fork and spawn threads. Andrew borrowed the thread locks (barriers) to do something similar with Solaris. Fowler: Perhaps making contact with the ISCADS folks would help. ** I'd be happy to act as a liaison. 5. Et al Committee conference concluded at 2:20 PM EST. Next call Mar 1 at 1:00 PM EST ======================================================================