Trees | Indices | Help |
|
---|
|
DSV.py - Cliff Wells, 2002 Import/export DSV (delimiter separated values, a generalization of CSV). $Id: DSV.py 3878 2007-01-09 22:28:37Z djpham $ Modified by Joe Pham <djpham@bitpim.org> to accommodate wxPython 2.8+ Basic use: from DSV import DSV data = file.read() qualifier = DSV.guessTextQualifier(data) # optional data = DSV.organizeIntoLines(data, textQualifier = qualifier) delimiter = DSV.guessDelimiter(data) # optional data = DSV.importDSV(data, delimiter = delimiter, textQualifier = qualifier) hasHeader = DSV.guessHeaders(data) # optional If you know the delimiters, qualifiers, etc, you may skip the optional 'guessing' steps as they rely on heuristics anyway (although they seem to work well, there is no guarantee they are correct). What they are best used for is to make a good guess regarding the data structure and then let the user confirm it. As such there is a 'wizard' to aid in this process (use this in lieu of the above code - requires wxPython): from DSV import DSV dlg = DSV.ImportWizardDialog(parent, -1, 'DSV Import Wizard', filename) dlg.ShowModal() headers, data = dlg.ImportData() # may also return None dlg.Destroy() The dlg.ImportData() method may also take a function as an optional argument specifying what it should do about malformed rows. See the example at the bottom of this file. A few common functions are provided in this file (padRow, skipRow, useRow). Requires Python 2.0 or later Wizards tested with wxPython 2.2.5/NT 4.0, 2.3.2/Win2000 and Linux/GTK (RedHat 7.x)
Version: 1.4
|
|||
InvalidDelimiter | |||
InvalidTextQualifier | |||
InvalidData | |||
InvalidNumberOfColumns | |||
ImportWizardPanel_Delimiters CLASS(SUPERCLASS): ImportWizardPanel_Delimiters(wx.Panel) DESCRIPTION: A wx.Panel that provides a basic interface for validating and changing the parameters for importing a delimited text file. |
|||
ImportWizardDialog CLASS(SUPERCLASS): ImportWizardDialog(wx.Dialog) DESCRIPTION: A dialog allowing the user to preview and change the options for importing a file. |
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|
|||
__version__ =
Bugs/Caveats: |
|
PROTOTYPE: guessTextQualifier(input) DESCRIPTION: tries to guess if the text qualifier (a character delimiting ambiguous data) is a single or double-quote (or None) ARGUMENTS: - input is raw data as a string RETURNS: single character or None |
PROTOTYPE: guessDelimiter(input, textQualifier = '"') DESCRIPTION: Tries to guess the delimiter. ARGUMENTS: - input is raw data as string - textQualifier is a character used to delimit ambiguous data RETURNS: single character or None |
PROTOTYPE: modeOfLengths(input) DESCRIPTION: Finds the mode (most frequently occurring value) of the lengths of the lines. ARGUMENTS: - input is list of lists of data RETURNS: mode as integer |
PROTOTYPE: guessHeaders(input, columns = 0) DESCRIPTION: Decides whether row 0 is a header row ARGUMENTS: - input is a list of lists of data (as returned by importDSV) - columns is either the expected number of columns in each row or 0 RETURNS: - true if data has header row |
PROTOTYPE: organizeIntoLines(input, textQualifier = '"', limit = None) DESCRIPTION: Takes raw data (as from file.read()) and organizes it into lines. Newlines that occur within text qualifiers are treated as normal characters, not line delimiters. ARGUMENTS: - input is raw data as a string - textQualifier is a character used to delimit ambiguous data - limit is a integer specifying the maximum number of lines to organize RETURNS: list of strings |
PROTOTYPE: importDSV(input, delimiter = ',', textQualifier = '"', columns = 0, updateFunction = None, errorHandler = None) DESCRIPTION: parses lines of data in CSV format ARGUMENTS: - input is a list of strings (built by organizeIntoLines) - delimiter is the character used to delimit columns - textQualifier is the character used to delimit ambiguous data - columns is the expected number of columns in each row or 0 - updateFunction is a callback function called once per record (could be used for updating progress bars). Its prototype is updateFunction(percentDone) - percentDone is an integer between 0 and 100 - errorHandler is a callback invoked whenever a row has an unexpected number of columns. Its prototype is errorHandler(oldrow, newrow, columns, maxColumns) where - oldrow is the unparsed data - newrow is the parsed data - columns is the expected length of a row - maxColumns is the longest row in the data RETURNS: list of lists of data |
PROTOTYPE: exportDSV(input, delimiter = ',', textQualifier = '"', quoteall = 0) DESCRIPTION: Exports to DSV (delimiter-separated values) format. ARGUMENTS: - input is list of lists of data (as returned by importDSV) - delimiter is character used to delimit columns - textQualifier is character used to delimit ambiguous data - quoteall is boolean specifying whether to quote all data or only data that requires it RETURNS: data as string |
|
__version__Bugs/Caveats:
Why another CSV tool?:
To do:
|
Trees | Indices | Help |
|
---|
Generated by Epydoc 3.0.1 on Sun Jan 24 16:19:54 2010 | http://epydoc.sourceforge.net |