Date: Tue, 13 Sep 2011 17:46:44 +0200
From: m.diehl@mpie.de
To: wangleyu@msu.edu, lebenso@lanl.gov, denny.tjahjanto@imdea.org,
o.guevenc@mpie.de, n.jia@mpie.de, m.diehl@mpie.de, c.kords@mpie.de,
c.zambaldi@mpie.de, p.eisenlohr@mpie.de, f.roters@mpie.de
Subject: update: /home/svn/repos/DAMASK to 1000
Message-ID: <4e6f7ae4.Xjnd/szYAh8QCXTo%m.diehl@mpie.de>
User-Agent: Heirloom mailx 12.2 01/07/07
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
A processing/pre/FromEBSD/
A processing/pre/FromEBSD/Hex2Cub.cpp
A processing/pre/FromEBSD/SpectralMethodFromEBDS
A processing/pre/FromEBSD/patchFromReconstructedBoundaries
D processing/pre/patchFromReconstructedBoundaries
added two small (quick and dirty) tools to convert data from EBSD to input files for spectral method, put them together with patchFromReconstructedBoundaries into new folder.
Date: Tue, 13 Sep 2011 17:54:06 +0200
From: m.diehl@mpie.de
To: wangleyu@msu.edu, lebenso@lanl.gov, denny.tjahjanto@imdea.org,
o.guevenc@mpie.de, n.jia@mpie.de, m.diehl@mpie.de, c.kords@mpie.de,
c.zambaldi@mpie.de, p.eisenlohr@mpie.de, f.roters@mpie.de
Subject: update: /home/svn/repos/DAMASK to 1001
Message-ID: <4e6f7c9e.v9E4JVN2a6bg5tL8%m.diehl@mpie.de>
User-Agent: Heirloom mailx 12.2 01/07/07
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
U code/DAMASK_spectral.f90
U code/DAMASK_spectral_interface.f90
U code/IO.f90
U code/crystallite.f90
U code/homogenization_RGC.f90
U code/lattice.f90
U code/makefile
U code/material.f90
U code/mesh.f90
did a lot of polishing:
- removed unnecessary "return" before end of subroutine or function:
- changed undetermined array length (:) to (1:3)
To prevent problems with some code analysing tools:
- "3D oneliner loops" (with ";) only for "do" and "enddo" at the same time
- removed line continuation in OMP statements
made the makefile more flexible, removed heap-arrays switch
Date: Tue, 13 Sep 2011 17:57:07 +0200
From: m.diehl@mpie.de
To: wangleyu@msu.edu, lebenso@lanl.gov, denny.tjahjanto@imdea.org,
o.guevenc@mpie.de, n.jia@mpie.de, m.diehl@mpie.de, c.kords@mpie.de,
c.zambaldi@mpie.de, p.eisenlohr@mpie.de, f.roters@mpie.de
Subject: update: /home/svn/repos/DAMASK to 1002
Message-ID: <4e6f7d53.IEDDzo+JSzDWNSBr%m.diehl@mpie.de>
User-Agent: Heirloom mailx 12.2 01/07/07
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
A documentation/Compiling/
A documentation/Compiling/Stack+usage.pdf
A documentation/ParallelizationAndTuning/
A documentation/ParallelizationAndTuning/BSC_tools_Overview.pdf
A documentation/ParallelizationAndTuning/Intro_Perf.pdf
A documentation/ParallelizationAndTuning/Kcachegrind.pdf
A documentation/ParallelizationAndTuning/LRZ210703_1.pdf
A documentation/ParallelizationAndTuning/LRZ210703_2.pdf
A documentation/ParallelizationAndTuning/MUST_Overview.pdf
A documentation/ParallelizationAndTuning/NPB-MZ-MPI-BT_Exercise.pdf
A documentation/ParallelizationAndTuning/PAPI.pdf
A documentation/ParallelizationAndTuning/PSC_Exercise_BT-MPI.pdf
A documentation/ParallelizationAndTuning/Paraver_Exercise.pdf
A documentation/ParallelizationAndTuning/Periscope_Overview.pdf
A documentation/ParallelizationAndTuning/SIONlib.pdf
A documentation/ParallelizationAndTuning/Scalasca_Examples.pdf
A documentation/ParallelizationAndTuning/Scalasca_Exercise_BTMZ.pdf
A documentation/ParallelizationAndTuning/Scalasca_Overview.pdf
A documentation/ParallelizationAndTuning/Scalasca_Patterns.pdf
A documentation/ParallelizationAndTuning/TAU.pdf
A documentation/ParallelizationAndTuning/VIHPS-TW8.pdf
A documentation/ParallelizationAndTuning/Vampir_Exercise.pdf
A documentation/ParallelizationAndTuning/Vampir_Overview.pdf
A documentation/ParallelizationAndTuning/instructions_periscope.pdf
A documentation/ParallelizationAndTuning/manualf06.pdf
added some information from Tuning workshop in Aachen regarding tuning/parallelization
added slides with information how to prevent segmentation fauld
- removed unnecessary "return" before end of subroutine or function:
- changed undetermined array length (:) to (1:3)
To prevent problems with some code analysing tools:
- "3D oneliner loops" (with ";) only for "do" and "enddo" at the same time
- removed line continuation in OMP statements
made the makefile more flexible, removed heap-arrays switch
* use "math_invert3x3" instead of "math_inv3x3" for inversion of Fe
* for dislocation stress calculation: first regular case, then special case of dead dislocations in central ip
* "dv_dtau" now given for each dislocation type, so is a (ns,4) array
* deleted unused variables in "_LpAndItsTangent"
* corrected contribution of deads in "_LpAndItsTangent"
* the NaN variables defined in math did not give a proper NaN value, so use 0.0/0.0 again
* neighbors with nonlocal constitution but local properties (i.e. /nonlocal/ flag not set) are also considered for incoming fluxes
made convergence independent of size and resolution,
polishing output in DAMASK_spectral.f90
added function to compute eigenvalues without eigenvectors and function to convert a 3x3 logical to a 9 vector in math.f90
removed obsolete variable in numerics.f90
corrected calculation of stress BC condition. Depending on given BC, the stiffness matrix is reduced and than inversed. Then it is filled with zeros and used for the calculation of the correct change of deformation gradient. All calculation is done using dP/dF
fixed bug in bc_temperature assignment that was hitting memory.
Temperature is taken from the first loadcase and evolves from there in an adiabatic fashion for the moment. I.e. T-specifications from later loadcases are ignored...
* dislocation flux is blocked if we encounter a sign change in the resolved shear stress from the central ip to the neighbor
* do not set density to zero if below certain threshold; this creates an artificial sink term
* damper initialized with one
* inversion of Mandelized stiffness tensor does not work, have to use plain tensor
* new functions in math that allow for conversion between Mandel and Plain tensors
you have to specify the job you are restarting from in the job description (cae), if you prepare your input file by hand this is the first line after *Heading
example: if the first job was using Oldjob.inp the first entry in the job description needs to be Oldjob (without the .inp)
as for Marc restart works only from last converged increment, i.e. ther restart writing should be specified like this:
*retsart, write, frequency=1, overlay
Overlay is not essential but saves a lot of disk space and as stated before you can only restart from the last converged increment anyway
* Marc: node displacements are added to initial node coordinates (mesh_node0) to get current node positions (mesh_node), then ip coordinates are deduced
* Abaqus: ip coordinates are directly updated, no update of node coordinates!
* Spectral: for the moment no update of either ip or node coordinates! passing only dummy values with initial ip coordinates
* replaced "dble" intrinsic function by "real" with pReal kind in constitutive_nonlocal.f90
* removed useless line breaks in output of state in CPFEM.f90
* Also added some more openmp directives to increase percentage of parallelized code.
* "implicit none" was missing in two subroutines of homogenization and constitutive.
0 : only version infos and all from "hypela2"/"umat"
1 : basic outputs from "CPFEM.f90", basic output from initialization routines, debug_info
2 : extensive outputs from "CPFEM.f90", extensive output from initialization routines
3 : basic outputs from "homogenization.f90"
4 : extensive outputs from "homogenization.f90"
5 : basic outputs from "crystallite.f90"
6 : extensive outputs from "crystallite.f90"
7 : basic outputs from the constitutive files
8 : extensive outputs from the constitutive files
If verbosity is equal to zero, all counters in debug are not set during calculation (e.g. debug_StressLoopDistribution or debug_cumDotStateTicks). This might speed up parallel calculation, because all these need critical statements which extremely slow down parallel computation.
In order to keep it like that, please follow these simple rules:
DON'T use implicit array subscripts:
example: real, dimension(3,3) :: A,B
A(:,2) = B(:,1) <--- DON'T USE
A(1:3,2) = B(1:3,1) <--- BETTER USE
In many cases the use of explicit array subscripts is inevitable for parallelization. Additionally, it is an easy means to prevent memory leaks.
Enclose all write statements with the following:
!$OMP CRITICAL (write2out)
<your write statement>
!$OMP END CRITICAL (write2out)
Whenever you change something in the code and are not sure if it affects parallelization and leads to nonconforming behavior, please ask me and/or Franz to check this.
* removed input variables in constitutive_collectDotState and constitutive_postResults that are not needed anymore (because of recent changes in constitutive_nonlocal)
Now it is possible to compile a single precision spectral solver/crystal plasticity by replacing mesh.f90 and prec.f90 with mesh_single.f90 and prec_single.f90.
For the spectral method, just call "make precision=single" instead of "make". Use "make clean" evertime you switch precision
First try of implement single precision crystal plasticity, not working yet.
polishing text about geometry construction.
polishing postResults, still having problems concerning machines without MSC installation
* dislocation flux and internal stress calculation now consistent with new definition of slip system lattice according to paper (polarity of screws inverted)
* now complaining when encountering an unknown nonlocal parameter in material.config
* use same error ID for all material parameters out of bounds
* symmetric flux calculation in side dotState can now be omitted (because of new treatment of periodicity)
* switching back to "local flux balance" (add leaving and entering fluxes at central MP, don't touch neighbor) instead of "flux distribution" (subtract leaving fluxes from central MP and add them at neighboring MP). This has the advantage that there is almost no need for CRITICAL statements in parallelization, so hopefully this results in some speed up.
To enable this feature one has to add the following somewhere in the marc input file:
$mpie periodic x y z
for having periodicity in all directions
$mpie periodic z x
for having periodicity in x and z direction
etc.
Note that this only works for regular meshes!!!
postprocessing: renamed name of python/f2py modul from "reconstruct" to "postprocessingMath", added some numerical operations to use for postprocessing.
* need to recalculate dislocation velocity in postResults, otherwise we take values of last perturbed state! So the following outputs were up to now showing the perturbed state: shearrate, dislocation velocity, all density rates!
mpie_spectral and numerics: added switch to prevent pre calculation of gamma_hat. slower, but saves memory
3Dvisualize: started to add support for gmsh (not fully working yet)
reconstruct: new version of f2py/Fortran subroutines for output of results from spectral method
removed storage of full cauchy stres field from mpie_spectral.f90, only average is stored now
added cauchy stress and von mises equivalent calculation to spectral post.
renamed mpie_spectral2.f90 to mpie_spectral2d.f90 (testing file, not properly working at the moment)
changed file extension and variable names in mpie_spectral.f90 and mpie_spectral_interface.f90 from "mesh" to "geom". Removed direct output from mpie_spectral.f90, all output is now base on materialpoint_results(:,1,:)
* default value of the OMP_NUM_THREADS variable has to be restored at the end of mpie subroutine, since marc also seems to use and change(!) this
* usage: "export MPIE_NUM_THREADS=<number of threads>" to set variable in shell, then restart mentat and compile with option 3 (at the moment this does only work on ws 6, since all other workstations use compiler option "-save"; this puts all local variables by default in static memory, which is a killer for parallelization!)
* better use SINGLE (having an implicit barrier at the end) instead of MASTER construct
* deleted all explicit BARRIERs after do loops since parallel loop construct implies barrier at the end
* had to add some BARRIER constructs
* only the master thread is allowed to increase the state counter
yet parallelization seems not to give a significant decrease in calculation time with nonlocal model (because of too many CRITICAL statements?)
* also put a call to constitutive_microstructure at the start of each crystallite_integration subroutine like it was before. need that for nonlocal model in case of crystallite cutback
numerics: polishing
mpie_cpfem_marc: polishing
..powerlaw: aware of symmetryType function
crystallite: aware of symmetryType function, smaller leapfrog acceleration
IO: new warning 101
CPFEM: range of odd stress is now -1e15...+1e15, H_sym is used for stiffness
Major changes:
CPFEM.f90 =>
1. Moving the initialization out of CPFEM_general into a separate subroutine, which is directly called by the hypela2 (Beware, the Abaqus version must also be modified in order to adapt with this change).
2. Restore primary state variables in CPFEM_init from binary files when requested (Marc flag: restart read).
3. Writing primary state variables into binary files (Marc flag: restart write).
FEsolving.f90 =>
1. Adding functions to recognize Marc restart flags: read and write and the corresponding restart file (parent job).
2. Change the initial value of cycleCounter = -1 in conjuction with the change made the ping-pong scheme
homogenization_RGC.f90 =>
1. Just syntax polishing.
IO.f90 =>
1. Adding functions/subroutines to open binary files for writing the primary state variables for restart purpose.
mpie_cpfem_marc.f90
1. Modification of the general scheme for collection and calculation in order to accommodate the newly added restart feature.
* in Fixed Point Iteration: update dependent states after state preguess was missing; on the other hand, the first call to constitutive_microstructure was obsolete
* fluxes are now again calculated and distributed only! by the originating material point. this means that the central MP might change the dotState of its neighbor. have to see whether locks slow down parallel computation
* detection of grain boundary in constitutive_nonlocal_microstructure with the help of transmissivity
* enforce positive densities in constitutive_nonlocal_microstructure (needed because dotState does not create cutbacks for negative densities anymore)
* reset single mobile densities below certain threshold to zero (also done in constitutive_nonlocal_microstructure)
* constitutive_nonlocal_kinetics only gets local state variable as input, no need for the entire array here
* dv_dtau is always positive
* multiplication is only active when there is already some initial density of the respective type
added mpie_spectral2.f90, a version that should get the new algorithm proposed in 2010. until now, it is the same as mpie_spectral.f90 (large strain formulation by suquet et al) but with c2c, c2c FFT
added some parameters for spectral method to numerics.f90 (tolerance)
changed error message concerning spectral method in IO.f90
corrected calculation of stress BC in mpie_spectral.f90