Package thirdparty :: Module DSV
[show private | hide private]
[frames | no frames]

Module thirdparty.DSV

DSV.py - Cliff Wells, 2002
  Import/export DSV (delimiter separated values, a generalization of CSV).

$Id: DSV.py 3878 2007-01-09 22:28:37Z djpham $
Modified by Joe Pham <djpham@bitpim.org> to accommodate wxPython 2.8+

Basic use:

   from DSV import DSV

   data = file.read()
   qualifier = DSV.guessTextQualifier(data) # optional
   data = DSV.organizeIntoLines(data, textQualifier = qualifier)
   delimiter = DSV.guessDelimiter(data) # optional
   data = DSV.importDSV(data, delimiter = delimiter, textQualifier = qualifier)
   hasHeader = DSV.guessHeaders(data) # optional

If you know the delimiters, qualifiers, etc, you may skip the optional
'guessing' steps as they rely on heuristics anyway (although they seem
to work well, there is no guarantee they are correct). What they are
best used for is to make a good guess regarding the data structure and then
let the user confirm it.

As such there is a 'wizard' to aid in this process (use this in lieu of
the above code - requires wxPython):

   from DSV import DSV

   dlg = DSV.ImportWizardDialog(parent, -1, 'DSV Import Wizard', filename)
   dlg.ShowModal()
   headers, data = dlg.ImportData() # may also return None
   dlg.Destroy()

The dlg.ImportData() method may also take a function as an optional argument
specifying what it should do about malformed rows.  See the example at the bottom
of this file. A few common functions are provided in this file (padRow, skipRow,
useRow).

Requires Python 2.0 or later
Wizards tested with wxPython 2.2.5/NT 4.0, 2.3.2/Win2000 and Linux/GTK (RedHat 7.x)

Classes
ImportWizardDialog  
ImportWizardPanel_Delimiters  
InvalidData  
InvalidDelimiter  
InvalidNumberOfColumns  
InvalidTextQualifier  

Function Summary
  exportDSV(input, delimiter, textQualifier, quoteall)
PROTOTYPE: exportDSV(input, delimiter = ',', textQualifier = '"', quoteall = 0) DESCRIPTION: Exports to DSV (delimiter-separated values) format.
  guessDelimiter(input, textQualifier)
PROTOTYPE: guessDelimiter(input, textQualifier = '"') DESCRIPTION: Tries to guess the delimiter.
  guessHeaders(input, columns)
PROTOTYPE:...
  guessTextQualifier(input)
PROTOTYPE:...
  importDSV(input, delimiter, textQualifier, columns, updateFunction, errorHandler)
PROTOTYPE: importDSV(input, delimiter = ',', textQualifier = '"', columns = 0, updateFunction = None, errorHandler = None) DESCRIPTION: parses lines of data in CSV format ARGUMENTS: - input is a list of strings (built by organizeIntoLines) - delimiter is the character used to delimit columns - textQualifier is the character used to delimit ambiguous data - columns is the expected number of columns in each row or 0 - updateFunction is a callback function called once per record (could be used for updating progress bars).
  modeOfLengths(input)
PROTOTYPE: modeOfLengths(input) DESCRIPTION: Finds the mode (most frequently occurring value) of the lengths of the lines.
  organizeIntoLines(input, textQualifier, limit)
PROTOTYPE: organizeIntoLines(input, textQualifier = '"', limit = None) DESCRIPTION: Takes raw data (as from file.read()) and organizes it into lines.
  padRow(oldrow, newrow, columns, maxColumns)
pads all rows to the same length with empty strings
  skipRow(oldrow, newrow, columns, maxColumns)
skips any inconsistent rows
  useRow(oldrow, newrow, columns, maxColumns)
returns row unchanged

Function Details

exportDSV(input, delimiter=',', textQualifier='"', quoteall=0)

PROTOTYPE:
  exportDSV(input, delimiter = ',', textQualifier = '"', quoteall = 0)
DESCRIPTION:
  Exports to DSV (delimiter-separated values) format.
ARGUMENTS:
  - input is list of lists of data (as returned by importDSV)
  - delimiter is character used to delimit columns
  - textQualifier is character used to delimit ambiguous data
  - quoteall is boolean specifying whether to quote all data or only data
    that requires it
RETURNS:
  data as string

guessDelimiter(input, textQualifier='"')

PROTOTYPE:
  guessDelimiter(input, textQualifier = '"')
DESCRIPTION:
  Tries to guess the delimiter.
ARGUMENTS:
  - input is raw data as string
  - textQualifier is a character used to delimit ambiguous data
RETURNS:
  single character or None

guessHeaders(input, columns=0)

PROTOTYPE:
  guessHeaders(input, columns = 0)
DESCRIPTION:
  Decides whether row 0 is a header row
ARGUMENTS:
  - input is a list of lists of data (as returned by importDSV)
  - columns is either the expected number of columns in each row or 0
RETURNS:
  - true if data has header row

guessTextQualifier(input)

PROTOTYPE:
  guessTextQualifier(input)
DESCRIPTION:
  tries to guess if the text qualifier (a character delimiting ambiguous data)
  is a single or double-quote (or None)
ARGUMENTS:
  - input is raw data as a string
RETURNS:
  single character or None

importDSV(input, delimiter=',', textQualifier='"', columns=0, updateFunction=None, errorHandler=None)

PROTOTYPE:
  importDSV(input, delimiter = ',', textQualifier = '"', columns = 0,
            updateFunction = None, errorHandler = None)
DESCRIPTION:
  parses lines of data in CSV format
ARGUMENTS:
  - input is a list of strings (built by organizeIntoLines)
  - delimiter is the character used to delimit columns
  - textQualifier is the character used to delimit ambiguous data
  - columns is the expected number of columns in each row or 0
  - updateFunction is a callback function called once per record (could be
    used for updating progress bars). Its prototype is
       updateFunction(percentDone)
       - percentDone is an integer between 0 and 100
  - errorHandler is a callback invoked whenever a row has an unexpected number
    of columns. Its prototype is
       errorHandler(oldrow, newrow, columns, maxColumns)
          where
          - oldrow is the unparsed data
          - newrow is the parsed data
          - columns is the expected length of a row
          - maxColumns is the longest row in the data
RETURNS:
  list of lists of data

modeOfLengths(input)

PROTOTYPE:
  modeOfLengths(input)
DESCRIPTION:
  Finds the mode (most frequently occurring value) of the lengths of the lines.
ARGUMENTS:
  - input is list of lists of data
RETURNS:
  mode as integer

organizeIntoLines(input, textQualifier='"', limit=None)

PROTOTYPE:
  organizeIntoLines(input, textQualifier = '"', limit = None)
DESCRIPTION:
  Takes raw data (as from file.read()) and organizes it into lines.
  Newlines that occur within text qualifiers are treated as normal
  characters, not line delimiters.
ARGUMENTS:
  - input is raw data as a string
  - textQualifier is a character used to delimit ambiguous data
  - limit is a integer specifying the maximum number of lines to organize
RETURNS:
  list of strings

padRow(oldrow, newrow, columns, maxColumns)

pads all rows to the same length with empty strings

skipRow(oldrow, newrow, columns, maxColumns)

skips any inconsistent rows

useRow(oldrow, newrow, columns, maxColumns)

returns row unchanged

Generated by Epydoc 2.1 on Fri Aug 15 18:58:27 2008 http://epydoc.sf.net