csv2csv

Copyright 2004 - 2005, Stuart Udall

overview
important bits
installation
configuration and startup
controls and methods
issues and limitations
planned improvements
revision history
latest version

version 2.02: September 17, 2005


 
  overview next section top of page

CSV2CSV reformats CSV files, and in the process, allows for various modifications. Columns from a source CSV can be copied verbatim, or merged together, or re-ordered; columns can also be deleted, new columns can be added, and static data can be appended and prepended to existing columns.

If a record has too many or too few fields, or if a field is too long, a form will be displayed where manual corrections can be made.

No changes are made to the source CSV; each modified record is written to a new CSV file.

Summary of features

  • supports up to 500 different rulesets
  • interactive problem record and field editors
  • option to ignore first x lines of sourcefile (usually fieldnames)
  • option to output customised fieldnames as first line of outfile
  • can amalgamate multiple columns into one (with customisable separator character)
  • can add new columns
  • can append/prepend data to columns
  • can substitute one item of data for another
  • can clean up numeric data
  • can strip whitespace
  • can insert today's date into the data
  • supports delimited, partially-delimited or undelimited sourcefiles
  • input/output filesize limited only by available diskspace
  • full logging


 
  important bits next section top of page

  • requires Windows 9X/NT4/2K/XP
  • This program is LICENSED SOFTWARE and may not be copied or distributed without prior written permission of the author.
  • Please see the license agreement included with the software for the complete terms and conditions of use of the software.


 
  installation next section top of page

  1. run the self-extracting distribution archive; extract to a directory of your choice

 
  configuration and startup next section top of page

To execute CSV2CSV, type the following at the command prompt:

CSV2CSV rulesetname

..where rulesetname is the name of the ruleset to process.

If CSV2CSV is to be used repeatedly, the above command can be placed in a batchfile, and an icon for the batchfile created on the desktop. This allows one-click execution of a given ruleset.

Rulesets are essentially named groups of configuration commands. Rulesets allow you to process input files in different formats, without having to reconfigure the software each time. Up to 500 different rulesets are supported. Usually, one ruleset per output file is required.

The rulelist command, in the main section of the INI file, tells CSV2CSV which rulesets to process. The rulesets are then defined as individual sections in the INI file. Each ruleset can contain the following commands:

infilethe full name and path of the input file
outfilethe full name and path of the output file
infieldstitles for columns (fields) in input file
outfieldstitles for columns (fields) in output file
outfieldlengthsthe maximum lengths of each field in the output file
writefieldsset to YES to write the supplied outfieldnames as the first line of the output file
skipfirst# number of lines of sourcefile to skip
separatorstring to insert between dataitems of amalgamated columns
delimitedset to YES if dataitems are delimited (surrounded by another character, usually "quotation marks")
stripwhitespaceset to YES to remove leading and trailing spaces from dataitems
filternumericset to YES to remove all characters except numbers and decimal places from ALL formatted monetary values
dateformatset to YY to output the $$todaysdate reserved word with two-digit years
substituteuse to alter the value of a data item matching a certain criteria
columncommands to define the handling of each column of input data


 
  controls and methods next section top of page

field names

CSV2CSV assigns names to each column in both the source and target CSV. This is done using the infields and outfields settings. These settings list the names of each column, with the first column being assigned the first name, the second column the second name, and so on. Example:

infields=username,add1,telephone,amount,custcode
outfields=custcode,name,address,phone,balance

CSV2CSV does not require that the input file contain fieldnames. If it does, however, use the skipfirst setting to ignore them.

The fieldnames set in the infields setting need not actually match the fieldnames given on the first line of the CSV file, if any. The fieldnames set in the infields setting are used to control CSV2CSV itself, in particular by the column commands (see below). However, there must be one fieldname in the fieldnames set in the infields setting for each field in the input file.

field lengths

CSV2CSV filters overlength input data, according to the fieldlengths specified in the setting outfieldlengths. Each parameter on the outfieldlengths line matches the fieldname in the same parameter position in the outfields setting. Example:

outfields=name,address,phone
outfieldlengths=30,50,20

This tells CSV2CSV that maximum length of the name field is 30 characters, the maximum length of the address field is 50 characters, and the maximum length of the phone field is 20 characters.

To specify a formatted monetary value (see below), eg, xxxxxx.xx, to permit values of up to 999999 pounds and 99 pence, use the notation 6.2. Example:

outfields=name,address,phone,balance,status
outfieldlengths=30,50,20,6.2,5

If the filternumeric setting is used to strip illegal characters from formatted monetary values, the length of the filtered value is checked against the permitted field length.

If an overlength field is detected, a form will be displayed where the field contents can be edited until they are the correct length. If the form is aborted (if Escape is pressed), then the data will be truncated according to the rules below:

  • If CSV2CSV encounters an overlength string value, it truncates the right-most portion of it, retaining as much as possible of the data, starting from the left.
  • If CSV2CSV encounters an overlength numeric value (that is, a field specified by 6.2 notation or similar), it does not truncate the data, it instead ignores the entire line of CSV input data. This was done to prevent records with missing zeros from being output, as would occur if the data were truncated.

formatted monetary values

This term refers to data in the input file that represents a currency value, for example "1111.11". These are processed slightly differently, in that their fieldlengths are entered in the INI file in x.y format (see above). This is done to permit both sides of the decimal to be checked; otherwise a value of 111.111 could be parsed to the output file, which is not a legal value.

The filternumeric setting can be used to strip all invalid characters from all formatted monetary values in the input file. For example, the values in the input file may be something like "£1,111.11" (including the quotes). This notation is problematic for CSV2CSV and may also be problematic for the tools that next work on the data. Setting filternumeric to YES will convert the example value to simply "1111.11" (not including the quotes).

field substitution

The substitute command tells CSV2CSV to replace input data matching certain criteria with other data in the output file. The syntax of the command is:

substitute=columnname,input text,output text

If, in any row, the input text is detected in the column defined by columnname, it will be replaced with the output text. Example:

substitute=name,Accounts Department,Accts Dept

Only one substitute command is currently permitted.

column commands

commandMAP
purposecopies a column from the source to the target CSV
syntaxMAP sourcecolumnname TO targetcolumnname
examplemap ref to acct_number
 
commandAMALGAMATE
purposemerges multiple columns from the source to a single column in the target CSV
syntaxAMALGAMATE sourcecolumnname[,sourcecolumnname,...] TO targetcolumnname
exampleamalgamate name,company to name
 
commandADD
purposeadds a new column to the target CSV
syntaxADD value TO targetcolumnname
exampleadd $ to currency
 
commandPREPEND
purposeprepends new data to an existing column in the target CSV
syntaxPREPEND value TO targetcolumnname
exampleprepend ABC123 to custref
 
commandAPPEND
purposeappends new data to an existing column in the target CSV
syntaxAPPEND value TO targetcolumnname
exampleappend ABC123 to custref
 

There is no DELETE command. To delete a column from the source CSV (that is, to prevent it from appearing in the output file), don't reference the column in any MAP, AMALGAMATE, ADD, APPEND or PREPEND commands. The column must still be listed in the infields setting, however if it is not mentioned in any column commands, it will be ignored, and thus removed from the output file.

missing fields

If a line of incoming data contains too many or too few fields, a form will be displayed where corrections to each field can be made.

If the record editing form appears, this is usually due to the line mentioned being incorrectly formatted in some way. For example, it may be missing a comma somewhere, or a quotation mark, or have too many of these.

If the record editing form is aborted (if Escape is pressed), then the line of data will not be processed, and a message will appear in the logfile:

error: too many/too few fields on line 168 of [infile.csv] - input line ignored

If the record editing form appears for every record in the input file, either the input file is missing a field (that is, an entire column of data is not present in the file), OR, the infields setting in the INI file contains too many fieldnames.

reserved words

CSV2CSV currently supports a single reserved word - this being $$todaysdate. This word may be used in the value parameter of the ADD, APPEND and PREPEND commands. If CSV2CSV finds this word in the value parameter of the above commands, it will substitute the current date for the word. In this way, the current date can be included in the output data. Sample usage:

column=append $$todaysdate to invnum

Notes:

The logfile will contain details about any errors reported, as well as general usage stats. Check it if errors are displayed during operation.

If the outfile exists, it will be overwritten when CSV2CSV executes.

If the user aborts processing, any partially-created outfile will be removed.


 
  issues and limitations next section top of page

  • the use of fieldnames containing embedded spaces is NOT recommended, although they may work in some circumstances


 
  planned improvements next section top of page

  • bad character filtering for all fields (currently only available with formatted numeric values)
  • configurable delimiting character (currently hardcoded as ASCII 34)
  • allow manual missing fields and overlength correction to be disabled
  • allow fieldlength filtering to be disabled
  • add support for truncation from the left (retaining from the right)?
  • truncating a delimited field loses the closing delimiter
  • shore up support for fieldnames with embedded spaces?

 
  revision history top of page

November 2, 20040.01initial development
?0.02
?0.03
?0.04
March 22, 20050.05added delimiter support
April 4, 20050.06added interactive problem record editor
April 19, 20050.07added view error/view data buttons to editor
May 10, 20051.08commercial release
July 15, 20051.09added bad character filtering for formatted numeric values
July 31, 20051.10added the PREPEND command
September 14, 20052.00recompiled as a 32-bit console application
September 15, 20052.01added the APPEND command
September 17, 20052.02added the $$todaysdate reserved word