Software Coordinating Committee Conference Call January 11, 2007 1:00 PM EST Recorder: C. DeTar Present: Brower, Fowler, DeTar, Levkova, Holmgren, Simone, Gottlieb, Zhang, Mawhinney, Efstathiadis, Renner, Osborn, Joo, Pochinski, Jung, Khoriaty, Absent: Scholz, Edwards (NJ), Clark, Watson, ===================================================================== ** Action items 1. Code Optimization tasks for Cray XT3/Opteron (Robert, Balint, Don et al) Brower: The JLab group just got a substantial allocation on the ORNL Cray. Joo: We will be ready to go into production on anisotropic lattices with as/at ~ 3. We are working on temporal preconditioning. Pulls out temporal part of the Wilson operator and does it exactly. Good for anisotropy. SSE/assembly vs compiler tweaking on Opteron Brower: Is it worthwhile pursuing SSE on Opterons? Holmgren: Probably, because there will be quad core instructions with a better instruction set. There is a simulator now, like the one for the PowerPC. DeTar: Are you saying compilers just won't be keeping up with these architectural changes? Holmgren: I'd want to look at what James has done before I say. Joo: Andrew uses compiler directives and a C coding style that maps onto SSE. Pochinski: I use a couple of lines of assembly also. Brower: For the QDP++ optimization, will it be packaged for QLA? Joo: It could be done in a variety of ways. Joo: What is the best way to use the assembler? For example, how to avoid register conflicts? Holmgren: In our approach we told gcc not to use a set of registers. Cray was helping us, but the clobber list they provided didn't seem to be enough. Gottlieb: Whelan told me two of the routins were broken. Joo: One could also use the Pathscale compiler, which is supposed to be good for the Opteron. But I tried compiling Chroma. It got lost in the IPA. Gottlieb: It worked for us on the Cray XT3. It wasn't strikingly slower, at least. Holmgren: We have a divergence in SSE strategies. It would be good to establish some coherence. Osborn: QLA uses compiler intrinsics. Experience Database Gottlieb: We need a database to collect our experience with compilers. Gottlieb: Lots of things get changed to improve performance. It would be good to have documentation of the reasons and the results. Ying: We are working on a performance database for PERI. It is still under development. It won't be ready for a couple months. Brower: It would be good to have a suggestion and something sooner. Gottlieb: How can we learn more about compiler intrinsics? Pochinski: The gcc info page. Osborn: The gcc man pages are good. The instruction guide is good. Shuffling is tricky. Holmgren: I found some web tutorials. I will try to refind them. It would be good to put this sort of info on a projects page. Holmgren: James, did you find cache bypasses useful? Osborn: It was hard to predict. BGL optimization DeTar: What is the status or QMP/BGL? Khoriaty: I hope to commit the test suite soon. It seems to be working. Testing the alternatives Fowler: Is there a small test bed for comparing strategies for optimization. Holmgren: We have been using the full inverters for Asqtad and DWF as the test jig. DeTar: For Asqtad Steve Gottlieb has a test suite. 2. QMP for BlueGene/L & P (James, What is CPS need/use for QMP on BG/L?) Jung: The p4 evolution on LLNL uses QMP/MPI on CPS. I am testing native QMP/BGL at BU. Osborn: I'd like to know the performance comparison between QMP/MPI and native. Status of the BNL BGL Stratos: Rumors are that the machine has been built at IBM. We are preparing the space. So some time soon. 3. Level 3 Domain Wall Inverter for BlueGene/L & P (Andrew, Chulwoo, Edwards/Balint, James ....????) Brower: It would be good to compare benchmarks, so we can choose the best approach. Are we duplicating efforts? Joo: Some use different conventions for DWF, so the effort is not entirely duplicated. Jung: We checkerboard in the conventional way, which may pose a problem for some approaches. Brower: Perhaps we can compare Chroma and Andrew's. Jung: Will Andrew's scheme support external checkerboarding?. Pochinski: With my inverter, checkerboarding is internal and private. Joo: That poses problems for Chroma HMC, but works for propagator calculations. Brower: So first we should list the characteristics for comparison. Then we can decide how to proceed. Joo: Andrew, let's work together to produce a comparison Jung: I will participate. Osborn: Include us as well. Brower: I'll contact Vranos. He is supposedly porting his DWF Dslash code to the P. 4. Status of Production for Asqtad on QCDOC (Dru, Carleton,..) Renner: We are doing shorter runs to get around the problem. We are now running only five trajectories at a time instead of ten. Mawhinney: We will allow MILC to run a bit longer to compensate for production lost on the 8K. We'll need to know how many lattices were produced under the 8K and how many might have been produced on two 4K's. DeTar: We will provide those numbers. Quarterly reports Ying: Are we required to submit quarterly reports to you? Brower: Bob Sugar will tell us what he needs. Web site Brower: We seem still to have my old much-maligend web page Joo: 5. Left over from 2006: Minimal CRE Software Release & Control Page etc. Committee conference concluded at 2:30 PM EST. Next call Jan 18 at 1:00 PM EST ======================================================================