DAMASK_EICMD/processing/post/filterTable.py

#!/usr/bin/env python
# -*- coding: UTF-8 no BOM -*-

import os,re,sys,string,fnmatch,math,random,numpy as np
from optparse import OptionParser
import damask

scriptID = '$Id$'
scriptName = os.path.splitext(scriptID.split()[1])[0]

# --------------------------------------------------------------------
#                                MAIN
# --------------------------------------------------------------------

parser = OptionParser(option_class=damask.extendableOption, usage='%prog options [file[s]]', description = """
Filter rows according to condition and columns by either white or black listing.

Examples:
Every odd row if x coordinate is positive -- " #ip.x# >= 0.0 and #_row_#%2 == 1 ).
All rows where label 'foo' equals 'bar' -- " #foo# == \"bar\" "

""", version = scriptID)

parser.add_option('-w','--white',
                  dest   = 'whitelist',
                  action = 'extend', metavar = '<string LIST>',
                  help   = 'whitelist of column labels (a,b,c,...)')
parser.add_option('-b','--black',
                  dest   = 'blacklist',
                  action = 'extend', metavar='<string LIST>',
                  help   = 'blacklist of column labels (a,b,c,...)')
parser.add_option('-c','--condition',
                  dest   = 'condition', metavar='string',
                  help   = 'condition to filter rows')

parser.set_defaults(condition = '',
                   )

(options,filenames) = parser.parse_args()

# --- loop over input files -------------------------------------------------------------------------

if filenames == []: filenames = ['STDIN']

for name in filenames:
  if not (name == 'STDIN' or os.path.exists(name)): continue
  table = damask.ASCIItable(name = name, outname = name+'_tmp',
                            buffered = False)
  table.croak('\033[1m'+scriptName+'\033[0m'+(': '+name if name != 'STDIN' else ''))

# ------------------------------------------ assemble info ---------------------------------------  

  table.head_read()
  table.info_append(scriptID + '\t' + ' '.join(sys.argv[1:]))                                                                                # read ASCII header info

# ------------------------------------------ process data ---------------------------------------  

  specials = { \
               '_row_': 0,
             }
  labels = []
  positions = []

  for position,label in enumerate(table.labels):
    if    (options.whitelist == None or     any([   position in table.label_indexrange(needle) \
                                                 or fnmatch.fnmatch(label,needle) for needle in options.whitelist])) \
      and (options.blacklist == None or not any([   position in table.label_indexrange(needle) \
                                                 or fnmatch.fnmatch(label,needle) for needle in options.blacklist])):  # a label to keep?
      labels.append(label)                                                                          # remember name...
      positions.append(position)                                                                    # ...and position

  if len(labels) > 0 and options.whitelist != None and options.blacklist == None:                   # check whether reordering is possible
    position = np.zeros(len(labels))
    for i,label in enumerate(labels):                                                               # check each selected label
      match = [   positions[i] in table.label_indexrange(needle) \
               or fnmatch.fnmatch(label,needle) for needle in options.whitelist]                       # which whitelist items do match it
      position[i] = match.index(True) if np.sum(match) == 1 else -1                                 # unique match --> store which

    sorted = np.lexsort((labels,position))
    order = range(len(labels)) if sorted[0] < 0 else sorted                                         # skip reordering if non-unique, i.e. first sorted is "-1"
  else:
    order = range(len(labels))                                                                      # maintain original order of labels
  
  interpolator = []
  condition = options.condition                                                                     # copy per file, might be altered
  for position,operand in enumerate(set(re.findall(r'#(([s]#)?(.+?))#',condition))):                # find three groups
    condition = condition.replace('#'+operand[0]+'#',
                                          {  '': '{%i}'%position,
                                           's#':'"{%i}"'%position}[operand[1]])
    if operand[2] in specials:                                                                      # special label ?
      interpolator += ['specials["%s"]'%operand[2]]
    else:
      try:
        interpolator += ['%s(table.data[%i])'%({  '':'float',
                                                's#':'str'}[operand[1]],
                                               table.labels.index(operand[2]))]
      except:
        parser.error('column %s not found...\n'%operand[2])

  evaluator = "'" + condition + "'.format(" + ','.join(interpolator) + ")"
  
# ------------------------------------------ assemble header ---------------------------------------

  table.labels_clear()
  table.labels_append(np.array(labels)[order])                                                      # update with new label set
  table.head_write()

# ------------------------------------------ process and output data ------------------------------------------

  positions = np.array(positions)[order]
  outputAlive = True
  while outputAlive and table.data_read():                                                          # read next data line of ASCII table
    specials['_row_'] += 1                                                                          # count row
    if condition == '' or eval(eval(evaluator)):                                                    # valid row ?
      table.data = [table.data[position] for position in positions]                                 # retain filtered columns
      outputAlive = table.data_write()                                                              # output processed line

# ------------------------------------------ finalize output -----------------------------------------

  table.close()                                                                                     # close input ASCII table (works for stdin)

  if name != 'STDIN': os.rename(name+'_tmp',name)                                                   # overwrite old one with tmp new
script to filter an ASCIItable by condition. each row is evaluated against the condition, columns can be black- or white-listed (white wins)... 2012-02-16 14:06:35 +05:30			`#!/usr/bin/env python`
tested new scripts to update shebang, all files got same shebang (and for python files encoding) 2014-04-02 00:11:14 +05:30			`# -- coding: UTF-8 no BOM --`
script to filter an ASCIItable by condition. each row is evaluated against the condition, columns can be black- or white-listed (white wins)... 2012-02-16 14:06:35 +05:30
outsourced multiple repetitive functions into ASCIItable class. changed ASCIItable API from file-handles to filenames. adopted these changes in pre and post processing scripts. unified behavior and look. fixed bugs here and there. improved functionality. 2015-08-08 00:33:26 +05:30			`import os,re,sys,string,fnmatch,math,random,numpy as np`
added some more post processing tests and improved output 2014-08-07 00:36:33 +05:30			`from optparse import OptionParser`
			`import damask`
script to filter an ASCIItable by condition. each row is evaluated against the condition, columns can be black- or white-listed (white wins)... 2012-02-16 14:06:35 +05:30
updated $Id$ handling. added (colored) my script's name reporting. 2013-09-09 19:42:00 +05:30			`scriptID = '$Id$'`
polishing 2014-12-19 00:56:52 +05:30			`scriptName = os.path.splitext(scriptID.split()[1])[0]`
script to filter an ASCIItable by condition. each row is evaluated against the condition, columns can be black- or white-listed (white wins)... 2012-02-16 14:06:35 +05:30
			`# --------------------------------------------------------------------`
			`# MAIN`
			`# --------------------------------------------------------------------`

added some more post processing tests and improved output 2014-08-07 00:36:33 +05:30			`parser = OptionParser(option_class=damask.extendableOption, usage='%prog options [file[s]]', description = """`
script to filter an ASCIItable by condition. each row is evaluated against the condition, columns can be black- or white-listed (white wins)... 2012-02-16 14:06:35 +05:30			`Filter rows according to condition and columns by either white or black listing.`

conditions can handle string data. use "#s#label#" format to indicate that column headed by "label" contains strings (not floats)... 2012-12-03 19:59:21 +05:30			`Examples:`
			`Every odd row if x coordinate is positive -- " #ip.x# >= 0.0 and #_row_#%2 == 1 ).`
corrected typo in usage hint 2013-12-12 08:05:01 +05:30			`All rows where label 'foo' equals 'bar' -- " #foo# == \"bar\" "`
script to filter an ASCIItable by condition. each row is evaluated against the condition, columns can be black- or white-listed (white wins)... 2012-02-16 14:06:35 +05:30
added some more post processing tests and improved output 2014-08-07 00:36:33 +05:30			`""", version = scriptID)`
script to filter an ASCIItable by condition. each row is evaluated against the condition, columns can be black- or white-listed (white wins)... 2012-02-16 14:06:35 +05:30
outsourced multiple repetitive functions into ASCIItable class. changed ASCIItable API from file-handles to filenames. adopted these changes in pre and post processing scripts. unified behavior and look. fixed bugs here and there. improved functionality. 2015-08-08 00:33:26 +05:30			`parser.add_option('-w','--white',`
			`dest = 'whitelist',`
			`action = 'extend', metavar = '<string LIST>',`
			`help = 'whitelist of column labels (a,b,c,...)')`
			`parser.add_option('-b','--black',`
			`dest = 'blacklist',`
			`action = 'extend', metavar='<string LIST>',`
			`help = 'blacklist of column labels (a,b,c,...)')`
			`parser.add_option('-c','--condition',`
			`dest = 'condition', metavar='string',`
			`help = 'condition to filter rows')`

			`parser.set_defaults(condition = '',`
			`)`
script to filter an ASCIItable by condition. each row is evaluated against the condition, columns can be black- or white-listed (white wins)... 2012-02-16 14:06:35 +05:30
			`(options,filenames) = parser.parse_args()`

outsourced multiple repetitive functions into ASCIItable class. changed ASCIItable API from file-handles to filenames. adopted these changes in pre and post processing scripts. unified behavior and look. fixed bugs here and there. improved functionality. 2015-08-08 00:33:26 +05:30			`# --- loop over input files -------------------------------------------------------------------------`

			`if filenames == []: filenames = ['STDIN']`
script to filter an ASCIItable by condition. each row is evaluated against the condition, columns can be black- or white-listed (white wins)... 2012-02-16 14:06:35 +05:30
fixed bug where option.condition was altered in processing of first file, thus preventing additional files from seeing the same user input. changed file handle setup to allow for large lists of input files. 2014-10-24 02:53:10 +05:30			`for name in filenames:`
outsourced multiple repetitive functions into ASCIItable class. changed ASCIItable API from file-handles to filenames. adopted these changes in pre and post processing scripts. unified behavior and look. fixed bugs here and there. improved functionality. 2015-08-08 00:33:26 +05:30			`if not (name == 'STDIN' or os.path.exists(name)): continue`
			`table = damask.ASCIItable(name = name, outname = name+'_tmp',`
			`buffered = False)`
			`table.croak('\033[1m'+scriptName+'\033[0m'+(': '+name if name != 'STDIN' else ''))`

			`# ------------------------------------------ assemble info ---------------------------------------`

			`table.head_read()`
			`table.info_append(scriptID + '\t' + ' '.join(sys.argv[1:])) # read ASCII header info`
script to filter an ASCIItable by condition. each row is evaluated against the condition, columns can be black- or white-listed (white wins)... 2012-02-16 14:06:35 +05:30
outsourced multiple repetitive functions into ASCIItable class. changed ASCIItable API from file-handles to filenames. adopted these changes in pre and post processing scripts. unified behavior and look. fixed bugs here and there. improved functionality. 2015-08-08 00:33:26 +05:30			`# ------------------------------------------ process data ---------------------------------------`
added some more post processing tests and improved output 2014-08-07 00:36:33 +05:30
script to filter an ASCIItable by condition. each row is evaluated against the condition, columns can be black- or white-listed (white wins)... 2012-02-16 14:06:35 +05:30			`specials = { \`
			`'_row_': 0,`
			`}`
			`labels = []`
			`positions = []`
fixed bug where option.condition was altered in processing of first file, thus preventing additional files from seeing the same user input. changed file handle setup to allow for large lists of input files. 2014-10-24 02:53:10 +05:30
script to filter an ASCIItable by condition. each row is evaluated against the condition, columns can be black- or white-listed (white wins)... 2012-02-16 14:06:35 +05:30			`for position,label in enumerate(table.labels):`
outsourced multiple repetitive functions into ASCIItable class. changed ASCIItable API from file-handles to filenames. adopted these changes in pre and post processing scripts. unified behavior and look. fixed bugs here and there. improved functionality. 2015-08-08 00:33:26 +05:30			`if (options.whitelist == None or any([ position in table.label_indexrange(needle) \`
			`or fnmatch.fnmatch(label,needle) for needle in options.whitelist])) \`
			`and (options.blacklist == None or not any([ position in table.label_indexrange(needle) \`
			`or fnmatch.fnmatch(label,needle) for needle in options.blacklist])): # a label to keep?`
added some more post processing tests and improved output 2014-08-07 00:36:33 +05:30			`labels.append(label) # remember name...`
			`positions.append(position) # ...and position`
script to filter an ASCIItable by condition. each row is evaluated against the condition, columns can be black- or white-listed (white wins)... 2012-02-16 14:06:35 +05:30
outsourced multiple repetitive functions into ASCIItable class. changed ASCIItable API from file-handles to filenames. adopted these changes in pre and post processing scripts. unified behavior and look. fixed bugs here and there. improved functionality. 2015-08-08 00:33:26 +05:30			`if len(labels) > 0 and options.whitelist != None and options.blacklist == None: # check whether reordering is possible`
fixed bug for only blacklisting. 2015-05-29 04:01:32 +05:30			`position = np.zeros(len(labels))`
			`for i,label in enumerate(labels): # check each selected label`
outsourced multiple repetitive functions into ASCIItable class. changed ASCIItable API from file-handles to filenames. adopted these changes in pre and post processing scripts. unified behavior and look. fixed bugs here and there. improved functionality. 2015-08-08 00:33:26 +05:30			`match = [ positions[i] in table.label_indexrange(needle) \`
			`or fnmatch.fnmatch(label,needle) for needle in options.whitelist] # which whitelist items do match it`
fixed bug for only blacklisting. 2015-05-29 04:01:32 +05:30			`position[i] = match.index(True) if np.sum(match) == 1 else -1 # unique match --> store which`

outsourced multiple repetitive functions into ASCIItable class. changed ASCIItable API from file-handles to filenames. adopted these changes in pre and post processing scripts. unified behavior and look. fixed bugs here and there. improved functionality. 2015-08-08 00:33:26 +05:30			`sorted = np.lexsort((labels,position))`
fixed bug for only blacklisting. 2015-05-29 04:01:32 +05:30			`order = range(len(labels)) if sorted[0] < 0 else sorted # skip reordering if non-unique, i.e. first sorted is "-1"`
			`else:`
			`order = range(len(labels)) # maintain original order of labels`
order of whitelist is maintained in output (only if no blacklist is specified). allows to rearrange column order in tables. 2015-05-28 04:32:11 +05:30
script to filter an ASCIItable by condition. each row is evaluated against the condition, columns can be black- or white-listed (white wins)... 2012-02-16 14:06:35 +05:30			`interpolator = []`
small bug fix for last commit 2014-10-24 12:06:01 +05:30			`condition = options.condition # copy per file, might be altered`
			`for position,operand in enumerate(set(re.findall(r'#(([s]#)?(.+?))#',condition))): # find three groups`
			`condition = condition.replace('#'+operand[0]+'#',`
fixed bug where option.condition was altered in processing of first file, thus preventing additional files from seeing the same user input. changed file handle setup to allow for large lists of input files. 2014-10-24 02:53:10 +05:30			`{ '': '{%i}'%position,`
			`'s#':'"{%i}"'%position}[operand[1]])`
added some more post processing tests and improved output 2014-08-07 00:36:33 +05:30			`if operand[2] in specials: # special label ?`
conditions can handle string data. use "#s#label#" format to indicate that column headed by "label" contains strings (not floats)... 2012-12-03 19:59:21 +05:30			`interpolator += ['specials["%s"]'%operand[2]]`
script to filter an ASCIItable by condition. each row is evaluated against the condition, columns can be black- or white-listed (white wins)... 2012-02-16 14:06:35 +05:30			`else:`
			`try:`
conditions can handle string data. use "#s#label#" format to indicate that column headed by "label" contains strings (not floats)... 2012-12-03 19:59:21 +05:30			`interpolator += ['%s(table.data[%i])'%({ '':'float',`
			`'s#':'str'}[operand[1]],`
			`table.labels.index(operand[2]))]`
script to filter an ASCIItable by condition. each row is evaluated against the condition, columns can be black- or white-listed (white wins)... 2012-02-16 14:06:35 +05:30			`except:`
fixed small bug in error reporting (missing column head complaint was broken...) 2013-04-17 01:40:17 +05:30			`parser.error('column %s not found...\n'%operand[2])`
script to filter an ASCIItable by condition. each row is evaluated against the condition, columns can be black- or white-listed (white wins)... 2012-02-16 14:06:35 +05:30
fixed bug where option.condition was altered in processing of first file, thus preventing additional files from seeing the same user input. changed file handle setup to allow for large lists of input files. 2014-10-24 02:53:10 +05:30			`evaluator = "'" + condition + "'.format(" + ','.join(interpolator) + ")"`
conditions can handle string data. use "#s#label#" format to indicate that column headed by "label" contains strings (not floats)... 2012-12-03 19:59:21 +05:30
added some more post processing tests and improved output 2014-08-07 00:36:33 +05:30			`# ------------------------------------------ assemble header ---------------------------------------`
outsourced multiple repetitive functions into ASCIItable class. changed ASCIItable API from file-handles to filenames. adopted these changes in pre and post processing scripts. unified behavior and look. fixed bugs here and there. improved functionality. 2015-08-08 00:33:26 +05:30
			`table.labels_clear()`
			`table.labels_append(np.array(labels)[order]) # update with new label set`
script to filter an ASCIItable by condition. each row is evaluated against the condition, columns can be black- or white-listed (white wins)... 2012-02-16 14:06:35 +05:30			`table.head_write()`

outsourced multiple repetitive functions into ASCIItable class. changed ASCIItable API from file-handles to filenames. adopted these changes in pre and post processing scripts. unified behavior and look. fixed bugs here and there. improved functionality. 2015-08-08 00:33:26 +05:30			`# ------------------------------------------ process and output data ------------------------------------------`

fixed bug for only blacklisting. 2015-05-29 04:01:32 +05:30			`positions = np.array(positions)[order]`
fixed whitelist/blacklist behavior (blacking out overrides whitelisting) now broken pipes stop script's outputting. 2012-02-17 00:17:07 +05:30			`outputAlive = True`
added some more post processing tests and improved output 2014-08-07 00:36:33 +05:30			`while outputAlive and table.data_read(): # read next data line of ASCII table`
			`specials['_row_'] += 1 # count row`
fixed bug where option.condition was altered in processing of first file, thus preventing additional files from seeing the same user input. changed file handle setup to allow for large lists of input files. 2014-10-24 02:53:10 +05:30			`if condition == '' or eval(eval(evaluator)): # valid row ?`
added some more post processing tests and improved output 2014-08-07 00:36:33 +05:30			`table.data = [table.data[position] for position in positions] # retain filtered columns`
			`outputAlive = table.data_write() # output processed line`
script to filter an ASCIItable by condition. each row is evaluated against the condition, columns can be black- or white-listed (white wins)... 2012-02-16 14:06:35 +05:30
outsourced multiple repetitive functions into ASCIItable class. changed ASCIItable API from file-handles to filenames. adopted these changes in pre and post processing scripts. unified behavior and look. fixed bugs here and there. improved functionality. 2015-08-08 00:33:26 +05:30			`# ------------------------------------------ finalize output -----------------------------------------`

			`table.close() # close input ASCII table (works for stdin)`
script to filter an ASCIItable by condition. each row is evaluated against the condition, columns can be black- or white-listed (white wins)... 2012-02-16 14:06:35 +05:30
outsourced multiple repetitive functions into ASCIItable class. changed ASCIItable API from file-handles to filenames. adopted these changes in pre and post processing scripts. unified behavior and look. fixed bugs here and there. improved functionality. 2015-08-08 00:33:26 +05:30			`if name != 'STDIN': os.rename(name+'_tmp',name) # overwrite old one with tmp new`