csv2html

Copyright 2000 - 2005, Stuart Udall

overview
important bits
installation
configuration and startup
controls and methods
issues and limitations
planned improvements
revision history
latest version

version 2.02: September 24, 2005


 
  overview next section top of page

CSV2HTML is a database tabulator, amalgamator and ranking tool. It can read any Comma-Separated-Value-formatted file and turn it into a webpage. It can either simply parse the CSV data into a table, or it can sum the frequency of occurence of different CSV data items in a particular column. This frequency count can be used to make "most-popular" reports. Attractive charts can be included alongside the data to give added impact and meaning.

In summation mode, CSV2HTML can also amalgamate dataitems, such that the hitcount from one item can be added to the hitcount from another. CSV2HTML can also graph the resulting list, in HTML, next to the list itself.

CSV2HTML also includes conditional summation, whereby dataitems are only included in the analysis if they start with, end with, start and end with, or exactly match a particular string (eg. /tech/ or .htm).

CSV2HTML can exclude whole columns from inclusion in its tabulated reports, and exclude specific dataitems from inclusion in its summation reports.

CSV2HTML can also simply convert textfiles to HTML.

CSV2HTML generates a detailed logfile, so its activity can be examined later.

Feature summary:

  • supports up to 500 different reports, each independently customisable
  • can set threshold (to eliminate noise at the bottom of the list)
  • can amalgamate one line into another (in case an item was renamed)
  • can conditionally compile report to include only lines starting, ending, starting and ending, or exactly matching a given string
  • can exclude dataitems from reports
  • attractive reports, including gradient graphs, all colours customisable
  • can mark up plain text files
  • can automatically embed hyperlinks
  • graphs are stand-alone HTML and don't require any GIFs or other images or files to display


 
  important bits next section top of page

  • Requires Windows 95 or higher.
  • This program is LICENSED SOFTWARE and may not be copied or distributed without prior written permission of the author.
  • Please see the license agreement included with the software for the complete terms and conditions of use of the software.

 
  installation next section top of page
  1. Extract the contents of the distribution archive to a directory of your choice.

 
  configuration and startup next section top of page

CSV2HTML reads its configuration from the file CSV2HTML.INI. Settings are defined in the syntax settingname=settingvalue. The top of the INI file (described below as the "main section") contains settings which apply to all reports. Rulesets are then defined as individual sections, which follow the main section - these contain configuration data used only by that ruleset. Rulesets are essentially named groups of configuration commands. Rulesets allow you to produce many different reports, with customised settings for each, without having to reconfigure the software each time. Up to 500 different rulesets are supported.

When CSV2HTML runs, it converts the CSV-formatted input file to an HTML table. Each row and column of CSV data becomes a row and column of table data. Each item of CSV data becomes a table cell, with the data displayed inside the cell. It is thus possible to control the display of the CSV data by changing the various properties of the HTML table.

Using a text editor such as Notepad, edit CSV2HTML.INI to suit your configuration.

A summary of the settings in the main section of the INI is below:

settingnamemeaningexample
rulelistlist of reports to process[topengines]
logfilefull name and path of logfilec:\logs\csv2html.log
logIDstring to insert as prefix to log entriescustomer44
bodyHTMLHTML for BODY tagbgcolor=#BBBBBB text=#222222 vlink=navy link=blue
tableHTMLHTML for TABLE tagborder=0 cellpadding=2 cellspacing=0
headingbackcolour used for heading background#EEEEEE
rowcolcolour used for row background#999999
graphfullcolour used for filled graph barred
graphemptycolour used for empty graph bar
graphcolourscolour used for gradient filled graph barsblack,purple,navy,blue,cyan,green,yellow,orange,crimson,red
cellstyleCSS used to render each cell{font-family: Tahoma, Verdana, Arial, sans-serif; font-size: 13px; line-height: 13px;}
rawfontHTMLHTML used to render raw text<font face="Monospac821 BT" size=1>
rawfontstyleCSS used to render raw textfont-size: 10px; line-height: 12px;

  • The rulelist setting, in the main section of the INI file, tells CSV2HTML which rulesets to process. The syntax of the rulelist setting is:
    rulelist=[rule1],[rule2],[rule3]
    

    Rulesets can have any name, except they may not include the following characters: [];

    These names must match the names used in place of the [rulename] setting described below.

  • The logfile setting defines where CSV2HTML keeps a record of its activity. If the file does not exist, it will be created; if it does exist, it will be appended to.

  • The logID setting can be used to differentiate instances of CSV2HTML from each other. Multiple copies of CSV2HTML can be configured to write to the same logfile - to tell them apart this setting is used to insert a custom string in the logfile entry. If you set logID=customer44, you'd see this kind of entry in the log:
    - csv2html 2.01 Sep 15, 2005 16:30:40 [customer44/toputesa]: ranked 18 unique items from 2425
    

  • The bodyHTML setting can be used to insert custom HTML into the <body> tag of each report. This can be used to set background and link colours, and/or to apply a stylesheet. The text is inserted into the <BODY> tag itself; if you set bodyHTML to ABC=123, then your reports will be generated with this body tag: <BODY ABC=123>. Use this syntax to apply a stylesheet:
    bodyHTML=class=bodystyle
    

    The body tag in this case would be rendered as: <BODY class=bodystyle>

  • The tableHTML setting can be used to insert custom HTML into the <table> tag of each report. This can be used to set table spacing, border and other properties, and/or to apply a stylesheet. The text is inserted into the <TABLE> tag itself; if you set tableHTML to ABC=123, then your reports will be generated with this table tag: <TABLE ABC=123>. Use this syntax to apply a stylesheet:
    tableHTML=class=tablestyle
    

    The table tag in this case would be rendered as: <TABLE class=tablestyle>

  • The headingback setting is used to set a background colour for the table row containing fieldnames (which are defined by the heading setting described below).

  • The rowcol setting is used to set a background colour for every second row in the table. CSV2HTML produces reports with alternating row background colours (as they are easier to read). This setting controls the alternate colour. The first and third (and so on) are given no background colour at all (and thus assume the background colour of the page). To disable alternating row colouring, set this value to nothing (leave it blank).

  • The graphfull setting allows you to define the colour used to draw the graphs. Only one colour may be defined here. To specify multiple colours, use the graphcolours setting.

  • The graphempty setting allows you to define the colour used to draw the portion of the line not consumed by the graph's bar. Setting this to nothing (leaving it blank) causes this region to assume the colour of the page background (recommended).

  • The graphcolours setting allows you to define ten colours to be used when drawing the graph. The first colour is used for the lowest 10% of all graphed items; the second colour, for the graphed items between 20% and 10%, and so on, with the tenth colour being used for items in the top 10% of all listed items. If less than ten colours are specified, the colour defined in the graphfull setting will be used for all undefined colours. If graphcolours is not defined at all (left blank) then gradient graphing is disabled and a one-colour chart using the colour defined in the graphfull setting will be generated.

  • The cellstyle setting allows you to define a style to be applied to each cell of the table. This setting controls how each CSV dataitem is displayed. Raw CSS only may be supplied here. This setting is also used to control the display of totals and percentages (in summation mode only). Example syntax:
    cellstyle={font-family: Tahoma, Verdana, Arial, sans-serif; font-size: 13px; line-height: 13px;}
    

  • The rawfontHTML setting allows you to define HTML to be inserted before raw text is rendered. If you specify a <FONT> tag here, then the raw text will be rendered with those font properties. This setting is only used if rawtext is set to YES (see below). Alternatively, you can use the rawfontstyle setting to apply a stylesheet.

  • The rawfontstyle setting allows you to define a style to be applied to raw text. Raw CSS only may be supplied here. This setting is only used if rawtext is set to YES (see below). Alternatively, you can use the rawfontHTML setting to apply <FONT> tags and other HTML.

Each ruleset can contain the following commands:

settingnamemeaningexample
[rulename]the name of this rule; must be enclosed in [square brackets][myrule]
sourcethe name of the source CSV filec:\data\file.csv
targetthe name of the target HTML reportpagec:\data\file.htm
titlebartext to insert in the titlebar of the reportpageMost popular widgets (2005)
titletext to insert as a title on the reportpage (may include HTML)Most popular widgets (2005)
headHTMLoptional HTML to place into the <head> section of the output<script src=global.js></script>
headingcomma-delimited list of column headingsdescription,number of purchases,graph
stripcolcomma-delimited list of column numbers to exclude from the reportpage3,4
wrapcolcomma-delimited list of colum numbers to wrap (eg. to allow text to flow to a new line within a cell)1,7
sumset to yes to enable summation modeyes
columnthe number of the column to sum3
conditionalthe condition upon which to include this item in summation3,/sales/widgets/
thresholdthe minimum hitcount a dataitem must have to be included in the report10
graphset to yes to enable graphingyes
rawtextset to yes if input file is a plain text fileyes
autolinksummation mode: set to yes to enable
tabulation mode: a list of columns to autolink
1,2
autolinkprefixoptional HTML to place into the href field of the tag, before the URL/cgi-bin/offsite.cgi?
autolinksuffixoptional HTML to place into the href field of the tag, after the URL.htm
autolinkforceset to yes to force the content of a column to be autolinkedyes
rownumberingset to yes to enable numbering of each row in the table of the reportyes
delimitedset to yes if input file contains quoted stringsyes
amalgamatecommand specifying dataitems to merge/oldmodels/,[dead widgets]
excludecommand specifying dataitems to ignore[dead widgets]

A summary of ruleset settings is below. See the next section for details on conditional summation, amalgamation, exclusion and auto-linking.

  • [rulename] is not actually a setting, but the name of the ruleset. It must appear in the INI file immediately above the settings it pertains to. It must start on the first character of an otherwise blank line. It must be surrounded in [square brackets].

  • The source setting is used to define the name and location of the input file.

  • The target setting is used to define the name and location of the output file.

  • The titlebar setting is used to define the text in the browser's titlebar, displayed when viewing the report.

  • The title setting is used to define the title of the report, as seen immediately above the table when viewing the report.

  • The headHTML setting can be used to insert custom HTML into the <head> section of the report. This can be used to insert Javascripts, a stylesheet, or something else:
    headHTML=<link rel="stylesheet" href="/global.css" type="text/css">
    

  • The heading setting is a CSV-delimited list of heading titles. The first title is applied to the first column, the second title to the second column, and so on:
    heading=client,timestamp,requested URL,referring URL,useragent
    

  • The stripcol setting is a CSV-delimited list of column numbers to exclude from the report. This setting is ignored if CSV2HTML is in summation mode. If you only want to include columns 2,4 and 6 of your 6-column CSV in your report, you'd use this syntax:
    stripcol=1,3,5
    

  • The wrapcol setting is a CSV-delimited list of column numbers in which to permit word-wrapping. This setting is ignored if CSV2HTML is in summation mode. If you only want to allow columns 2 and 6 of your report to wrap, you'd use this syntax:
    wrapcol=2,6
    

  • The sum setting is set to YES to enable summation mode. This mode produces a listing of the frequency of occurence of each item in a particular column, in order of highest frequency. See below for more on summation. If this setting is left blank, or set to anything apart from YES, then summation mode will be disabled, and CSV2HTML will output a table consisting of the complete input file instead.

  • The column setting is used to select which column of the input file to process. This setting is used only in summation mode.

  • The threshold setting is used to prevent items with a low frequency count from being listed in the report. Items with frequency counts below the value of this setting will not be listed. Their numbers will still be counted in the totals, however (in contrast to exclude settings). All items below the threshold will be instead condensed into a single line in the totals section of the report, under "not listed". This option is only used if CSV2HTML is in summation mode.

  • The graph setting is set to YES to enable graphing. This option is only used if CSV2HTML is in summation mode.

  • The rawtext setting is set to YES to tell CSV2HTML that the input file is a raw text file, not a CSV file. This feature can be used to instantly turn any text file into a webpage.

  • The autolink setting is set to YES to enable automatic hyperlinking of the selected column content. The data in the cell is used as both the destination URL, and as the text displayed for the user to click on. Data that does not begin with / or http:// is not autolinked. As an example, an autolinked data item containing a value of 'http://www.yahoo.com' would be output as follows:
    <a href=http://www.yahoo.com>http://www.yahoo.com</a>
    

    In tabulation mode however, autolink is instead a CSV-delimited list of column numbers to autolink. To autolink columns 1 and 5 this syntax would be used:

    autolink=1,5
    

  • The autolinkprefix setting is used to insert extra text before the URL of an autolinked data item. If your data item was 'http://www.yahoo.com', you could insert a call to a link tracker like this:
    autolinkprefix=http://www.yourdomain.com/cgi-bin/tracker.cgi?
    

    The following would then be output as the destination URL:

    http://www.yourdomain.com/cgi-bin/tracker.cgi?http://www.yahoo.com
    

  • The autolinksuffix setting is used to insert extra text after the URL of an autolinked data item. If your data item was 'document1995-03-04', you could turn this into a filename like this:
    autolinksuffix=.htm
    

    The following would then be output as the destination URL:

    document1995-03-04.htm
    

  • The autolinkforce setting is set to YES to force autolinking to occur even when the data item does not start with / or http://.

  • The rownumbering setting is set to YES to include a number on each row in the report. This option works in tabulation mode only at present.

  • The delimited setting is set to YES if working with a quote-delimited CSV file - that is, one with data items like this:
    "abc","def","ghi"
    

 
  controls and methods next section top of page

  • Type CSV2HTML [reportname] from the command line to execute the program. If you leave [reportname] blank, the rulelist line in the INI file will be used, and reports for all rules will be generated. Otherwise, CSV2HTML will attempt to generate the report named [reportname].

  • If CSV2HTML is to be used repeatedly, the above command can be placed in a batchfile, and an icon for the batchfile created on the desktop. This allows one-click execution of a given report.

  • To add a report:

    • Add its name to the rulelist entry in the INI file. The following rulelist line creates two modules called test1 and test2:
      rulelist=[test1],[test2]
      
    • You must then create a new section in the INI file to define the module. The simplest way to achieve this is to copy-and-paste an existing section. See configuration and startup for details on the contents of CSV2HTML.INI.

About summation

CSV2HTML has two main modes, these being tabulation mode and summation mode. Tabulation refers to the relatively simple process of converting an entire CSV file into an HTML table; summation, however, selects a single column of the CSV data to work with.

In summation mode, the frequency of occurrence of each item of data in the selected column is computed. The frequency of occurrence simply means how many times the dataitem appears, in total, in the entire column. The items with the highest frequency count - that is, the ones that are listed in the column the most - are listed first in the report. Percentages are calculated, so as to permit easy comparison.

Summation mode is useful when you have data items repeating themselves throughout the CSV file, and you want to know which items appear how often. Summation can be used with webserver logfiles, for example, to determine which page in a given directory is the most frequently accessed.

About conditional summation

Conditional summation refers to the ability to include a given dataitem in the analysis only if it starts with or ends with a given string. There are four options for this setting:

optionformatdescription
startcolumnnumber,start,searchstringthe dataitem in columnnumber must start with searchstring
endcolumnnumber,end,searchstringthe dataitem in columnnumber must end with searchstring
startendcolumnnumber,startend,startsearchstring,endsearchstringthe dataitem in columnnumber must start with startsearchstring AND and with endsearchstring
endorcolumnnumber,endor,endsearchstring1,endsearchstring2,...the dataitem in columnnumber must end with endsearchstring1 OR endsearchstring2 OR ...

Note: up to 500 different endings can be included if the 'endor' option is used.

Note: leave no spaces, unless you wish them to be included in the searchstrings.

About amalgamation

Amalgamation refers to the ability to convert a dataitem containing a given string of alphanumeric characters (eg. index.html) to another string (eg My Homepage). Each rule can contain up to 500 different amalgamate= statements. Each statement is comprised of the string to test each dataitem for, and the string used to replace that dataitem's entry, if the string is found anywhere in the dataitem. The two strings are separated by a comma.

Amalgamation can be used to combine multiple dataitems into a single total. For example, the following two statements add the hitcounts from both main.htm and index.htm to My Homepage:

amalgamate=/main.htm,My Homepage
amalgamate=/index.htm,My Homepage

The above setting will also add /tunes/index.htm and /tunes/index.html to the hitcount for My Homepage, as both strings contain '/index.htm' which is the searchstring specified in the second amalgamate= statement. Use amalgamation carefully. :)

For precise control over amalgamation, CSV2HTML can test whether the dataitem exactly matches the searchstring, rather than the default 'if it contains the searchstring'. To tell CSV2HTML to test for an exact match, add two equals signs after the existing equals sign, on the amalgamate= line, such as:

amalgamate===/index.htm,My Homepage

The above setting will not add /tunes/index.htm and /tunes/index.html to the hitcount for My Homepage, as both strings are somehow different from '/index.htm' which is the searchstring specified in the above amalgamate= statement.

Note: Two extra equals signs are used in case a string needs to start with a single equals sign (which is possible and will be processed correctly). However, any string starting with two equals signs in the first part of the amalgamate= statement will be read as having none, and exact string matching will be used (ie. CSV2HTML will interpret your string as a 'match exact' command). Ah, the price paid for exact string matching. :)

Note: strings used to replace dataitems may contain HTML, XML or indeed anything at all, with a single exception:

Note: strings used to replace dataitems may not contain commas; any commas found will be converted to fullstops.

Note: the use of amalgamation is not required; a report will still be generated if no amalgamation statements are found.

About exclusion

Exclusion refers to the ability to block dataitems from being counted in final tallies. Exclusion can only be done in summation mode. Up to 500 exclusion statements are permitted. Exclusion is done on an exact match basis, including case. Exclusion is performed after amalgamation. To block any dataitems matching the string "http://www.baddomain.com/" from a report use this command:

exclude=http://www.baddomain.com/

 
  issues and limitations next section top of page

  • barcharts don't print out at present

 
  planned improvements next section top of page

  • indexing
  • archiving of reports
  • tidy up raw text settings and allow a stylesheet name to be defined
  • allow a stylesheet name to be defined for celldata

 
  revision history top of page

?, 20000.1initial development
?, 20000.2added BODY and TABLE customisation
January 10, 20010.3added amalgamation
February 25, 20010.4added graphing
November 26, 20010.5added conditional summation
January 31, 20020.6bugfixes and speed optimisations
October 13, 20020.7added raw textfile support and autolinking
?, 20030.8added autolinkprefix and autolinksuffix
December 30, 20030.9added headHTML setting
May 11, 20040.10added item reporting and debug option; errorhandling improvements
?, 20040.11added various stylesheet settings
?, 20050.12added delimited setting
May 10, 20051.13commercial release
June 4, 20051.14added autolinkforce option
September 3, 20052.00recompiled as a 32-bit console application
September 12, 20052.01added exclude command and gradient graph colouring
September 24, 20052.02added row numbering