CSV2file

Copyright 2003 - 2005, Stuart Udall

overview
important bits
installation
configuration and startup
controls and methods
issues and limitations
planned improvements
revision history
latest version

version 2.01: October 2, 2005


 
  overview next section top of page

CSV2file generates a series of files from a template, an INI file, and a Comma-Separated-Value (CSV) file. One file per row in the CSV file is generated. The template determines what each file contains. The CSV data is inserted in each file at the points specified by the template.

The software can generate text files of any format, including HTML, XML, Javascript, and PHP - it simply parses whatever is in the template.

CSV2file can also substitute values in the template for static strings.

CSV2file can also generate an index, which hyperlinks to each of the pages it generates. The index is fully customisable.

CSV2file can upload all the files it generates to an FTP server.

CSV2file detects whether an item in the previous index has been dropped from the current index. If so, it will remove the file from the FTP server.

CSV2file generates a logfile, so its activity can be examined later.


 
  important bits next section top of page

  • Requires Windows 95 or higher.
  • FTP support requires CURL.EXE (included)
  • This program is licensed according to the license document LICENSE.PDF, bundled with the program.

 
  installation next section top of page
  1. Extract the contents of the distribution archive to a directory of your choice. The directory structure present in the distribution archive is not essential and may be reconfigured at will, however the included sample module expects the directory structure to be as supplied.

 
  configuration and startup next section top of page
  1. Using a text editor such as Notepad, edit CSV2file.INI (in the same directory as the CSV2file program file) to suit your configuration. A summary of settings is below:

    settingmeaning
    logfilename of file to contain logged activity
    rulelistlist of active modules
    autorulename of module to generate without prompting (for use in a script)

    Note: disable autorule by either leaving the setting blank, or commenting out the line (place a semicolon in front of the line to do this).

    Each module has a number of additional settings:

    settingmeaning
    sourcedatathe name of the CSV file containing the source data
    templatefilethe name of the file that contains the template
    targetdirthe name of the local directory to contain the generated files
    preparsethe commandline to a preparser, if any (leave blank to disable)
    datadesca list of columnnames, describing the data in the CSV file [1]
    toggletitlesa list of columnnames, describing the data in the togglefield [4]
    togglecolumnthe name of the column containing the togglefield (blank for none) [4]
    indexbythe name of the column by which to index [2]
    namebythe name of the column by which to name the output files [6]
    indexwiththe name of the togglecolumns to index alongside the indexby field [7]
    indexfalsethe text to insert into the index when a togglecolumn is 0 [7]
    indextruethe text to insert into the index when a togglecolumn is 1 [7]
    indexcellfalseHTML to be inserted into the <TD> tag of the cell holding indexfalse text [7]
    indexcelltrueHTML to be inserted into the <TD> tag of the cell holding indextrue text [7]
    indexleadHTMLHTML to be inserted before each indexitem [10]
    indextrailHTMLHTML to be inserted after each indexitem [10]
    indexheaderthe optional name of the file containing the header of the indexpage
    indexfooterthe optional name of the file containing the footer of the indexpage
    linkleadHTMLHTML to be inserted before the filename in each link in the index [3]
    linktrailHTMLHTML to be inserted after the filename in each link in the index [3]
    linkcellHTMLHTML to be inserted in the <TD> tag before each link in the index [11]
    ftpserveraddress of FTP server to receive published pages
    ftpusernameusername to use when logging into FTP server
    ftpdirdirectory on FTP server to hold published files
    ftpdummyfilefile on FTP server to download and discard [8]
    ftpindexnamethe name of the file on the FTP server to hold the generated index [9]
    oldindexthe name of the local directory containing the previous index (this is updated automatically by the software)
    substitutevariablename,substitution_text,alternate_substitution_text [5]

[1] Note: The datadesc line is essentially a list of the titles or labels of each column in the CSV. These names must match the variables you use in the template. Example:

Your CSV, a product database, has rows and columns of data in it. Each row contains the data for each product, while each column contains a different type of data. Say the first column is the product code, the second is the product description, and the third is the price of the product. You arbitrarily label these prodcode, proddesc, and prodval, and construct your template so it says something like:

product description: $$proddesc

Now, as you've used $$proddesc in the template, the generator needs to know which column of CSV data this corresponds to. You tell it which columns are what by using the datadesc line, which is a comma-delimited string which lists the column titles/variable names, as you used them in the template, and also, in the order they are listed in the CSV.

As you used them in the template means that the variablenames must be spelled identically. The program looks for an exact match between the variablenames used in the template, and the variablenames listed in the datadesc line.

In the order they are listed in the CSV means that the variablenames listed on the datadesc line must occur in the same sequence as the CSV columns they refer to. For example, if the first column of your CSV contained the data item product code, the second column, product description, and the third, product price, you might use a datadesc line like this:

datadesc=prodcode,proddesc,prodval

Correct coding of the datadesc line is crucial to correct operation of the program.

Note: do not use leading $$ signs on the datadesc line. Although the datadesc line does contain variablenames, the only location these variablenames are prefixed with $$ is in the template.

[2] Note: The indexby line selects which column of CSV data is used as the link text in the index. The link text is what is displayed in the index for the user to click on. If indexby is left blank, indexing is turned off.

[3] Note: The linkleadHTML line defines any HTML to be inserted immediately before the filename in the link to each page on the index. To understand this more clearly, consider a sample link in the index:

<a href=filename.htm>link text</a>

If you set linkleadHTML to ABCDEF, you'll see this in the index:

<a href=ABCDEFfilename.htm>link text</a>

Similarly, if you set linktrailHTML to XYZ, you'll see this in the index:

<a href=filename.htmXYZ>link text</a>

These settings are both optional; to disable, simply leave them blank. They are primarily useful when inserting Javascript into the anchor tag.

[4] Note: The togglefield is a field in the CSV data which contains a binary string, example: 01010101010. These strings can be used to compactly represent YES or NO variables in a CSV. CSV2file supports a single togglefield, currently 32 toggles wide. CSV2file is told which column in the CSV is the togglefield with the togglecolumn setting (give the name of the column - CSV columnnames are defined by the datadesc= line in the INI file). Each individual toggle in the togglefield (either a 1 or a 0) is given a name using the toggletitles setting. If there is no togglefield, simply leave the togglecolumn setting blank.

Note: toggletitles are read left-to-right, eg, a togglefield of 1,0,0 with toggletitles defined as A,B,C would have values assigned as A=1, B=0, C=0.

The toggletitles setting is defined in a similar manner to datadesc. For example, if your togglefield was 0101, with the first column denoting colour screen, the second denoting inbuilt LAN, the third, onboard sound, and the fourth, Firewire capability, then your toggletitles line might be defined like this:

toggletitles=colour,LAN,sound,firewire

This done, these names can be then referred to as if they were names for other columns of CSV data. That is, to use the individual toggles in the template, simply use their columnnames (defined by toggletitles) as $$variables.

When CSV2file detects a togglefield being used, it does NOT simply replace the variablename with the value of the toggle (which is either a 1 or a 0). Rather, it tests the value of the toggle, and substitutes prespecified text depending on whether the value is a 1 or a 0. If no substitution text is supplied, the toggle is ignored entirely. Substitution is detailed below.

[5] Note: Up to 200 substitute commands may be supplied in a single module. These commands essentially replace a variable in the template with data supplied in the INI file, instead of the CSV. The first parameter on the substitute= line is the name of the variable to replace. This variable must be listed in either datadesc or toggletitles (ie. the data to replace must be in the CSV). The second parameter is the text to replace the variable with. If the variablename is found in toggletitles, the second parameter is only inserted if the value of the variable (as defined by the togglefield coming out of the CSV for this particular row of CSV data) is 1. Otherwise, the third parameter is inserted instead (ie. when the value of the toggle in the CSV is 0). If the variable is not a toggle (ie. it was not found in toggletitles), the third parameter, if supplied, is ignored.

[6] Note: The nameby line selects which column of CSV data is used as the filename for each file generated. If nameby is left blank, a randomly-generated filename is used.

[7] Note: The indexwith line denotes which togglecolumns, if any, are indexed alongside each index entry. This feature is used to create an index comprised of a table of the CSV data. For example, if the source CSV contains a product name and a togglefield, the product name can be used as the index entry (defined with indexby); next to each name can be the fields you specify in indexwith. If the value of the specified togglecolumn(s) is false, the text supplied in the indexfalse line is inserted into the index. If the value is true, the text supplied in the indextrue is inserted instead. The togglefields are output in the order specified by the indexwith setting.

The indexcellfalse and indexcelltrue settings can HTML to be inserted into the <TD> tag of the cell holding the above text. This feature is to allow the above text to be aligned and styled.

[8] Note: The ftpdummyfile line specifies a file to download and discard. This is required in order to remove orphaned files from the FTP server. The file is not itself deleted; the FTP module does not permit a delete operation without a filename also being supplied, however. The file is simply downloaded and ignored. As a download can take time, select a file which is small - perhaps robots.txt, which should be in the root of your webspace..? This is indeed a "kludge".. unfortunate, but necessary!

[9] Note: The ftpindexname line specifies a file to hold the generated index on the FTP server. When a set of pages is generated, the index is named "index.htm" on the local system. However uploading this file to a webserver may overwrite some other index file. Therefore, this setting allows the name of the file on the server to be specified. If this setting is left blank, no index is uploaded at all.

[10] Note: The indexleadHTML line defines any HTML to be inserted immediately before the table cell containing each index item. To understand this more clearly, consider a sample table cell in the index:

<td>cell data</td>

If you set indexleadHTML to ABCDEF, you'll see this in the index:

ABCDEF<td>cell data</td>

Similarly, if you set indextrailHTML to XYZ, you'll see this in the index:

<td>cell data</td>XYZ

These settings are both optional; to disable, simply leave them blank. They are primarily useful when inserting Javascript into the table.

[11] Note: The linkcellHTML line denotes defines any HTML to be inserted inside the <TD> tag of the each cell in the index containing a link to the indexed subpages. To understand this more clearly, consider a sample table cell in the index:

<td>cell data</td>

If you set linkcellHTML to class=linkcell, you'll see this in the index:

<td class=linkcell>cell data</td>

This setting is optional; to disable, simply leave it blank. It is primarily useful when inserting CSS into the table.

Note: You may use semicolons to add comments (which are not interpreted) to the main section of the INI file (which is at the very top of the file, before any [modules]). Simply insert a semicolon and everything on that line after the semicolon will be ignored. Example:

; this is a comment.

However you may NOT use comments in any [module] (where each module is defined). This is because semicolons may be used in the INI data itself (for example, if Javascript is used).
 
  controls and methods next section top of page

  • Type

    CSV2file module [FTPpassword]

    from the command line to execute the program, where module is the name of the rule (defined in the INI file) to process, and where FTPpassword is the password for your FTP server. Omitting module is not permitted. Omitting FTPpassword causes FTP upload to be skipped; to generate a new set of files without uploading to the FTP server, don't enter a password.

  • To add a module:

    • Add its name to the rulelist entry in the INI file. The following rulelist line creates two modules called test1 and test2:
      rulelist=[test1],[test2]
      
    • You must then create a new section in the INI file to define the module. The simplest way to achieve this is to copy-and-paste an existing section. See configuration and startup for details on the contents of CSV2file.INI.

  • The templatefiles are built as such:

    • to insert variables, name them $$variablename, and place them anywhere in the text, as such:
      text text text$$variablename text text text
      

      There's one catch. If the variable is not at the end of the line, you must leave a space after the variablename. This is required so the generator can distinguish the variablename from your other text. This space will not be rendered by the generator. To render a space immediately after a variablename, place two spaces after it, as such:

      text text text$$variablename  text text text
      

      If $$variablename was set to "apple", the generated text would thus be:

      text text textapple text text text
      

      If the second space was left out, the generated text would be:

      text text textappletext text text
      

      If the first space after $$variablename was left out, the match would not work at all, and you'd see your variablename rendered as text:

      text text text$$variablenametext text text
      

      ..and you'd see an error in the logfile:

      ! CSV2file 0.11 Nov 14, 2004 14:20:23 [miscpage]: warning: variable [$$variablenametext] not found
      

      Variables not matched to the supplied data or to a reserved variable (see below) are rendered verbatim.

      If you see your variablenames rendered as text, and you have listed it in the datadesc line or the toggletitles line, the name is not being recognised for some reason. Check that there's no typos in either the template or the INI file, that you're using two dollar signs prior to your variablename (eg. $$prodcode) in the template only, and that you have left a space after the variablename if there is any text after it.

      To expand on this, consider a template constructed as such:

      text text text$$variablenametext text text
      

      Although the intent of the template may be to leave no spaces before or after the variable text, what will in fact happen is that the generator will see a variable called $$variablenametext. This is because it uses everything between the $$ signs and the first space to occur afterwards as the variablename to match upon. Consequently, it will then attempt to look up $$variablenametext in either datadesc or toggletitles, and, upon failing to find it, insert the variablename itself into the text. To avoid this, ensure to leave a space between the variable name and any text that follows it. This space is never inserted into any generated pages. If you want to leave no spaces after the variablename, use a single space only, as described above.

      If the variablename is the last text on the line, a trailing space is not required.

    • to insert a newline (CR/LF), use the reserved variable $$newline
    • to insert today's date, use the reserved variable $$todaysdate
    • the remaining text can be anything - HTML, javascript, CSS, XML, plaintext - whatever. Only $$variables are interpreted.

 
  issues and limitations next section top of page
  • input files may not have long filenames
  • orphan detection is not currently operational

 
  planned improvements next section top of page
  • conditional indexing (only index if data matches given criteria)

 
  revision history top of page

December 6, 20030.1initial development
December 21, 20030.2improved templating engine, added draft documentation
December 24, 20030.3improved configurability, added linkleadHTML and linktrailHTML
January 26, 20040.4added substitution, support for togglefields, support for a preparser, and user-defined output filenames
February 5, 20040.5added ability to generate a tabulated index from togglefield data (eg. compatibility chart)
February 11, 20040.6added orphan handling and FTP support
February 12, 20040.7bugfixes
May 22, 20040.8added ability to skip indexing
May 23, 20040.9optimised for use in a scripted environment
June 1, 20040.10bugfix
June 12, 20040.11bugfix
January 31, 20050.12added support for linkcellHTML
May 10, 20051.12commercial release
September 12, 20052.00recompiled as a 32-bit console application
October 2, 20052.01performance tweaks