DAMASK_EICMD/processing/post/permuteData.py

#!/usr/bin/env python3

import os
import sys
from optparse import OptionParser

import numpy as np

import damask


scriptName = os.path.splitext(os.path.basename(__file__))[0]
scriptID   = ' '.join([scriptName,damask.version])


# --------------------------------------------------------------------
#                                MAIN
# --------------------------------------------------------------------

parser = OptionParser(option_class=damask.extendableOption, usage='%prog options [ASCIItable(s)]', description = """
Permute all values in given column(s).

""", version = scriptID)

parser.add_option('-l','--label',
                  dest = 'label',
                  action = 'extend', metavar = '<string LIST>',
                  help  ='column(s) to permute')
parser.add_option('-u', '--unique',
                  dest = 'unique',
                  action = 'store_true',
                  help = 'shuffle unique values as group')
parser.add_option('-r', '--rnd',
                  dest = 'randomSeed',
                  type = 'int', metavar = 'int',
                  help = 'seed of random number generator [%default]')

parser.set_defaults(label = [],
                    unique = False,
                    randomSeed = None,
                   )

(options,filenames) = parser.parse_args()

if len(options.label) == 0:
  parser.error('no labels specified.')

# --- loop over input files -------------------------------------------------------------------------

if filenames == []: filenames = [None]

for name in filenames:
  try:
    table = damask.ASCIItable(name = name)
  except IOError:
    continue
  damask.util.report(scriptName,name)

# ------------------------------------------ read header ------------------------------------------

  table.head_read()

# ------------------------------------------ process labels ---------------------------------------  

  errors  = []
  remarks = []
  columns = []
  dims    = []

  indices    = table.label_index    (options.label)
  dimensions = table.label_dimension(options.label)
  for i,index in enumerate(indices):
    if index == -1: remarks.append('label "{}" not present...'.format(options.label[i]))
    else:
      columns.append(index)
      dims.append(dimensions[i])

  if remarks != []: damask.util.croak(remarks)
  if errors  != []:
    damask.util.croak(errors)
    table.close(dismiss = True)
    continue
       
# ------------------------------------------ assemble header ---------------------------------------

  randomSeed = int(os.urandom(4).hex(), 16) if options.randomSeed is None else options.randomSeed   # random seed per file
  np.random.seed(randomSeed)

  table.info_append([scriptID + '\t' + ' '.join(sys.argv[1:]),
                     'random seed {}'.format(randomSeed),
                    ])
  table.head_write()

# ------------------------------------------ process data ------------------------------------------

  table.data_readArray()                                                                            # read all data at once
  for col,dim in zip(columns,dims):
    if options.unique:
      s = set(map(tuple,table.data[:,col:col+dim]))                                                 # generate set of (unique) values
      uniques = np.array(map(np.array,s))                                                           # translate set to np.array
      shuffler = dict(zip(s,np.random.permutation(len(s))))                                         # random permutation
      table.data[:,col:col+dim] = uniques[np.array(map(lambda x: shuffler[tuple(x)],
                                                       table.data[:,col:col+dim]))]                 # fill table with mapped uniques
    else:
      np.random.shuffle(table.data[:,col:col+dim])                                                  # independently shuffle every row

# ------------------------------------------ output result -----------------------------------------  

  table.data_writeArray()

# ------------------------------------------ output finalization -----------------------------------  

  table.close()                                                                                     # close ASCII tables
explicitly require python3 on older systems, python3 tpyically exists but python2 is the defaul 2018-11-17 12:42:12 +05:30			`#!/usr/bin/env python3`
new scripts for: -generating vtk point cloud from x,y,z ASCIItable data -adding scalar values and color tuples from ASCIItable to vtk point cloud -permuting data in ASCIItable columns (used to shuffle ordered grain indices) 2013-11-27 01:49:27 +05:30
standardizing import follows PEP style guide, encoding not needed for python3 2019-06-14 16:33:30 +05:30			`import os`
			`import sys`
added some more post processing tests DAMASK env now reports in MiB (GiB is too big for small machines) 2014-08-07 22:21:26 +05:30			`from optparse import OptionParser`
standardizing import follows PEP style guide, encoding not needed for python3 2019-06-14 16:33:30 +05:30
			`import numpy as np`

new scripts for: -generating vtk point cloud from x,y,z ASCIItable data -adding scalar values and color tuples from ASCIItable to vtk point cloud -permuting data in ASCIItable columns (used to shuffle ordered grain indices) 2013-11-27 01:49:27 +05:30			`import damask`

standardizing import follows PEP style guide, encoding not needed for python3 2019-06-14 16:33:30 +05:30
python files now report their version depending on VERSION file in $DAMASK_ROOT 2016-01-27 22:36:00 +05:30			`scriptName = os.path.splitext(os.path.basename(__file__))[0]`
			`scriptID = ' '.join([scriptName,damask.version])`
new scripts for: -generating vtk point cloud from x,y,z ASCIItable data -adding scalar values and color tuples from ASCIItable to vtk point cloud -permuting data in ASCIItable columns (used to shuffle ordered grain indices) 2013-11-27 01:49:27 +05:30
standardizing import follows PEP style guide, encoding not needed for python3 2019-06-14 16:33:30 +05:30
new scripts for: -generating vtk point cloud from x,y,z ASCIItable data -adding scalar values and color tuples from ASCIItable to vtk point cloud -permuting data in ASCIItable columns (used to shuffle ordered grain indices) 2013-11-27 01:49:27 +05:30			`# --------------------------------------------------------------------`
			`# MAIN`
			`# --------------------------------------------------------------------`

more precise help 2019-02-16 22:55:41 +05:30			`parser = OptionParser(option_class=damask.extendableOption, usage='%prog options [ASCIItable(s)]', description = """`
new scripts for: -generating vtk point cloud from x,y,z ASCIItable data -adding scalar values and color tuples from ASCIItable to vtk point cloud -permuting data in ASCIItable columns (used to shuffle ordered grain indices) 2013-11-27 01:49:27 +05:30			`Permute all values in given column(s).`

added some more post processing tests DAMASK env now reports in MiB (GiB is too big for small machines) 2014-08-07 22:21:26 +05:30			`""", version = scriptID)`
new scripts for: -generating vtk point cloud from x,y,z ASCIItable data -adding scalar values and color tuples from ASCIItable to vtk point cloud -permuting data in ASCIItable columns (used to shuffle ordered grain indices) 2013-11-27 01:49:27 +05:30
outsourced multiple repetitive functions into ASCIItable class. changed ASCIItable API from file-handles to filenames. adopted these changes in pre and post processing scripts. unified behavior and look. fixed bugs here and there. improved functionality. 2015-08-08 00:33:26 +05:30			`parser.add_option('-l','--label',`
			`dest = 'label',`
			`action = 'extend', metavar = '<string LIST>',`
			`help ='column(s) to permute')`
added group shuffling (i.e. exchanging unique values) 2015-12-04 06:47:45 +05:30			`parser.add_option('-u', '--unique',`
			`dest = 'unique',`
			`action = 'store_true',`
			`help = 'shuffle unique values as group')`
outsourced multiple repetitive functions into ASCIItable class. changed ASCIItable API from file-handles to filenames. adopted these changes in pre and post processing scripts. unified behavior and look. fixed bugs here and there. improved functionality. 2015-08-08 00:33:26 +05:30			`parser.add_option('-r', '--rnd',`
			`dest = 'randomSeed',`
			`type = 'int', metavar = 'int',`
			`help = 'seed of random number generator [%default]')`

			`parser.set_defaults(label = [],`
typo 2020-01-26 19:44:16 +05:30			`unique = False,`
outsourced multiple repetitive functions into ASCIItable class. changed ASCIItable API from file-handles to filenames. adopted these changes in pre and post processing scripts. unified behavior and look. fixed bugs here and there. improved functionality. 2015-08-08 00:33:26 +05:30			`randomSeed = None,`
			`)`
new scripts for: -generating vtk point cloud from x,y,z ASCIItable data -adding scalar values and color tuples from ASCIItable to vtk point cloud -permuting data in ASCIItable columns (used to shuffle ordered grain indices) 2013-11-27 01:49:27 +05:30
			`(options,filenames) = parser.parse_args()`

outsourced multiple repetitive functions into ASCIItable class. changed ASCIItable API from file-handles to filenames. adopted these changes in pre and post processing scripts. unified behavior and look. fixed bugs here and there. improved functionality. 2015-08-08 00:33:26 +05:30			`if len(options.label) == 0:`
			`parser.error('no labels specified.')`
new scripts for: -generating vtk point cloud from x,y,z ASCIItable data -adding scalar values and color tuples from ASCIItable to vtk point cloud -permuting data in ASCIItable columns (used to shuffle ordered grain indices) 2013-11-27 01:49:27 +05:30
fixed ‘too many files open’ issue when processing a large list of input tables. 2015-02-11 22:52:47 +05:30			`# --- loop over input files -------------------------------------------------------------------------`
outsourced multiple repetitive functions into ASCIItable class. changed ASCIItable API from file-handles to filenames. adopted these changes in pre and post processing scripts. unified behavior and look. fixed bugs here and there. improved functionality. 2015-08-08 00:33:26 +05:30
adoption of new ASCIItable API 2015-08-18 20:07:32 +05:30			`if filenames == []: filenames = [None]`
outsourced multiple repetitive functions into ASCIItable class. changed ASCIItable API from file-handles to filenames. adopted these changes in pre and post processing scripts. unified behavior and look. fixed bugs here and there. improved functionality. 2015-08-08 00:33:26 +05:30
added some more post processing tests DAMASK env now reports in MiB (GiB is too big for small machines) 2014-08-07 22:21:26 +05:30			`for name in filenames:`
adoption of new ASCIItable API 2015-08-18 20:07:32 +05:30			`try:`
'buffered' has no effect any more 2020-02-20 19:35:38 +05:30			`table = damask.ASCIItable(name = name)`
no bare 'except' 2020-01-29 04:09:46 +05:30			`except IOError:`
			`continue`
adopted philips changes for reporting, using pyflakes to clean up 2015-09-24 14:54:42 +05:30			`damask.util.report(scriptName,name)`
new scripts for: -generating vtk point cloud from x,y,z ASCIItable data -adding scalar values and color tuples from ASCIItable to vtk point cloud -permuting data in ASCIItable columns (used to shuffle ordered grain indices) 2013-11-27 01:49:27 +05:30
outsourced multiple repetitive functions into ASCIItable class. changed ASCIItable API from file-handles to filenames. adopted these changes in pre and post processing scripts. unified behavior and look. fixed bugs here and there. improved functionality. 2015-08-08 00:33:26 +05:30			`# ------------------------------------------ read header ------------------------------------------`

			`table.head_read()`

			`# ------------------------------------------ process labels ---------------------------------------`

			`errors = []`
			`remarks = []`
			`columns = []`
			`dims = []`

			`indices = table.label_index (options.label)`
			`dimensions = table.label_dimension(options.label)`
			`for i,index in enumerate(indices):`
added group shuffling (i.e. exchanging unique values) 2015-12-04 06:47:45 +05:30			`if index == -1: remarks.append('label "{}" not present...'.format(options.label[i]))`
added some more post processing tests DAMASK env now reports in MiB (GiB is too big for small machines) 2014-08-07 22:21:26 +05:30			`else:`
outsourced multiple repetitive functions into ASCIItable class. changed ASCIItable API from file-handles to filenames. adopted these changes in pre and post processing scripts. unified behavior and look. fixed bugs here and there. improved functionality. 2015-08-08 00:33:26 +05:30			`columns.append(index)`
			`dims.append(dimensions[i])`

adopted philips changes for reporting, using pyflakes to clean up 2015-09-24 14:54:42 +05:30			`if remarks != []: damask.util.croak(remarks)`
outsourced multiple repetitive functions into ASCIItable class. changed ASCIItable API from file-handles to filenames. adopted these changes in pre and post processing scripts. unified behavior and look. fixed bugs here and there. improved functionality. 2015-08-08 00:33:26 +05:30			`if errors != []:`
adopted philips changes for reporting, using pyflakes to clean up 2015-09-24 14:54:42 +05:30			`damask.util.croak(errors)`
outsourced multiple repetitive functions into ASCIItable class. changed ASCIItable API from file-handles to filenames. adopted these changes in pre and post processing scripts. unified behavior and look. fixed bugs here and there. improved functionality. 2015-08-08 00:33:26 +05:30			`table.close(dismiss = True)`
			`continue`
added some more post processing tests DAMASK env now reports in MiB (GiB is too big for small machines) 2014-08-07 22:21:26 +05:30
			`# ------------------------------------------ assemble header ---------------------------------------`
outsourced multiple repetitive functions into ASCIItable class. changed ASCIItable API from file-handles to filenames. adopted these changes in pre and post processing scripts. unified behavior and look. fixed bugs here and there. improved functionality. 2015-08-08 00:33:26 +05:30
correct conversion for python3 2020-02-21 15:30:53 +05:30			`randomSeed = int(os.urandom(4).hex(), 16) if options.randomSeed is None else options.randomSeed # random seed per file`
outsourced multiple repetitive functions into ASCIItable class. changed ASCIItable API from file-handles to filenames. adopted these changes in pre and post processing scripts. unified behavior and look. fixed bugs here and there. improved functionality. 2015-08-08 00:33:26 +05:30			`np.random.seed(randomSeed)`

			`table.info_append([scriptID + '\t' + ' '.join(sys.argv[1:]),`
			`'random seed {}'.format(randomSeed),`
			`])`
added some more post processing tests DAMASK env now reports in MiB (GiB is too big for small machines) 2014-08-07 22:21:26 +05:30			`table.head_write()`

			`# ------------------------------------------ process data ------------------------------------------`
outsourced multiple repetitive functions into ASCIItable class. changed ASCIItable API from file-handles to filenames. adopted these changes in pre and post processing scripts. unified behavior and look. fixed bugs here and there. improved functionality. 2015-08-08 00:33:26 +05:30
			`table.data_readArray() # read all data at once`
			`for col,dim in zip(columns,dims):`
added group shuffling (i.e. exchanging unique values) 2015-12-04 06:47:45 +05:30			`if options.unique:`
			`s = set(map(tuple,table.data[:,col:col+dim])) # generate set of (unique) values`
			`uniques = np.array(map(np.array,s)) # translate set to np.array`
			`shuffler = dict(zip(s,np.random.permutation(len(s)))) # random permutation`
			`table.data[:,col:col+dim] = uniques[np.array(map(lambda x: shuffler[tuple(x)],`
			`table.data[:,col:col+dim]))] # fill table with mapped uniques`
			`else:`
			`np.random.shuffle(table.data[:,col:col+dim]) # independently shuffle every row`
new scripts for: -generating vtk point cloud from x,y,z ASCIItable data -adding scalar values and color tuples from ASCIItable to vtk point cloud -permuting data in ASCIItable columns (used to shuffle ordered grain indices) 2013-11-27 01:49:27 +05:30
added some more post processing tests DAMASK env now reports in MiB (GiB is too big for small machines) 2014-08-07 22:21:26 +05:30			`# ------------------------------------------ output result -----------------------------------------`
new scripts for: -generating vtk point cloud from x,y,z ASCIItable data -adding scalar values and color tuples from ASCIItable to vtk point cloud -permuting data in ASCIItable columns (used to shuffle ordered grain indices) 2013-11-27 01:49:27 +05:30
outsourced multiple repetitive functions into ASCIItable class. changed ASCIItable API from file-handles to filenames. adopted these changes in pre and post processing scripts. unified behavior and look. fixed bugs here and there. improved functionality. 2015-08-08 00:33:26 +05:30			`table.data_writeArray()`

			`# ------------------------------------------ output finalization -----------------------------------`

			`table.close() # close ASCII tables`