Generic Excel Components
Introduction
The Excel Components are used to convert XSLX files to RTX files, insert formulas.
Prerequisites
Contents of the Component
Object Type | Name |
---|---|
Process Definition | CUS_XLSX_Append_RTX_Data |
Process Definition | CUS_XLSX_Insert_Formula |
Process Definition | CUS_XLSX_Optimize_Shared_String_Table |
CUS_XLSX_Append_RTX_Data
Accepts one or more RTX
data source files and writes them to one or more worksheets in a single target XLSX
file.
If the workbook does not exist, it will be created. The same applies for any worksheets in the workbook. New worksheets can be created at specific positions in the workbook, if required, and existing worksheets may be moved (right) to accommodate this.
For each data source, a header can be derived from either the RTX
data source or from a nominated template XLSX
file, worksheet and cell range. If headers are taken from RTX
, the column description is preferred, if this is not available, the column name will be used.
Headers and data can optionally be written to specific locations within the target file. However, if the location is not provided, default values will be applied. See the definition of the parameters IN_HEADER_START_CELL
and IN_DATA_START_CELL
for more information.
Note that when writing a header row, any data in the target cells will be replaced, this is different to the writing of data where rows are always inserted and, for existing rows that exist after the insertion point, their row numbers will be incremented (moved down the spreadsheet) by the number of new rows inserted.
When new data is inserted, this can affect existing references to those cells from named ranges and ranges in formulas. These ranges are maintained by the implementation, the following rule is applied; if data is inserted into rows covered by a cell range for such a reference, the range is expanded to include the new data where, if the data is inserted at the first row of the reference, the start row of the reference is incremented (the row numbers are adjusted, or moved down the spreadsheet), together with the data that it references, and the new data is not included in the referencing range.
Some support is provided for copying styles from a template worksheet. Currently only number formats are catered for. The format will be created in the target worksheet if it doesn't already exist and made available to numeric cells inserted there.
When handling dates, Excel does not support time zone information and, where time zones are encountered in input data, the value is converted to UTC
and time zone information is dropped.
This implementation loads only a small part of the spreadsheet definition at any one point in time and so exhibits minimal consumption of system memory.
File content (for parameters IN_RTX_FILE
, IN_SOURCE_TEMPLATE_FILE
and IN_EXCEL_FILE
) can be accessed through Job File, Document or file system references. However, it should be noted that Document references are supported only from specific product versions (i.e. 9.2.4.9, 9.2.6.4 and 9.2.8.0).
Limitations include but are not restricted to:
- Currently only supports paths to files, support for Scheduler Documents and Job File references is not provided.
- Does not allow for replacing existing data, will only insert or append.
- Cannot write data adjacent to existing data cells, will only write new rows. If data from two, or more, data sources are required side by side, this should be merged into a single data source prior to invoking this process.
- Where number formats can be copied from a template
XLSX
source, this implementation does not provide support for other styles including borders, fills, fonts, alignments or themes. - Extensive support for cell ranges in formulas and named cell ranges, and the maintenance of their row numbers, is provided but the implementation has not been tested with cell ranges associated with charts, pivot tables, print areas, etc.
- Support for maintaining references to cells from other workbooks is not provided.
- Does not support concurrent processes targeting the same
XLSX
workbook.
Known issues
- A bug exists in parameter validation where an error is reported, and the process is stopped with an error state, if a user attempts to read data from multiple data sources, write this data to multiple target worksheets, read column headers from a template worksheet but write headers to only some of the of the target worksheets. The error is not reported if headers are written to all target worksheets.
Process parameters
Some parameters are mandatory, some can accept only a single value, and some can accept single or multiple values.
Where multiple values are specified, these comprise a semi-colon (;) delimited list of values. Parameter multiplicity is subject to strict validation (see parameter multiplicity validation matrix). Where permitted, empty or null values can be specified in a list by just omitting the value, e.g. A1;;C1
All files are identified using a Job File, Document or file system references/file path (string value).
Parameter | Description | Documentation | Values |
---|---|---|---|
IN_RTX_FILE | Input RTX file(s) | (mandatory) One or more Job File, Document or fully qualified file system references fully qualified paths to RTX files. For the purposes of parameter validation, the number of RTX data sources will determine the potential multiplicity of all other parameters, i.e. if there are 3 data sources, other parameters will comprise 0, 1 or 3 values, as determined by their own multiplicity constraints. |
|
IN_USE_RTX_HEADER | Column headers from RTX file | This parameter is mutually exclusive to parameter IN_SOURCE_HEADER_CELL_RANGE . Does not take column headers from the RTX data sources by default. | X - Copy headers from all RTX data sources for writing to a single or multiple target worksheet(s). Column headers will be copied to the cell references specified in parameter IN_HEADER_START_CELL . This behavior can be overridden using different patterns of values for IN_HEADER_START_CELL . |
IN_SOURCE_TEMPLATE_FILE | The source workbook template file | The Job File, Document or fully qualified file system reference (fully qualified path) to a template XLSX workbook file from where examples of cell number formats and column header text can be copied to the target worksheets. This file may be the same as a file identified in parameter IN_EXCEL_FILE . Headers from a template worksheet and data formatting are not required by default. | The fully qualified Job File, Document or fully qualified file system reference path to a template XLSX workbook; headers from a template worksheet and / or data formatting should be copied from the source template worksheet. |
IN_SOURCE_TEMPLATE_SHEET | The source worksheet(s) for formats and headers | One or more values identifying the names of template worksheets in the template XLSX workbook. Supports a single empty value and empty values in a list, in which case, if parameter IN_SOURCE_TEMPLATE_FILE is also specified, the value(s) for IN_SHEET_NAME are used to replace the empty value(s) in this parameter at runtime. | Null (not set), single, or multiple values.
|
IN_SOURCE_FORMAT_CELL_RANGE | The source cell range for cell formats | One or more values identifying the range of cells, in the template worksheet, from where cell formatting is copied. The cell range must identify cells from a single row and the width of the range must match the number of columns in the corresponding RTX data source. | Null , single or multiple values:
|
IN_SOURCE_HEADER_CELL_RANGE | The source cell range for column headers | One or more values identifying the range of cells, in the template worksheet, from where column headers will be copied. The cell range must identify cells from a single row and the width of the range must match the number of columns in the RTX data. Supports a single empty value and empty values in a list, in which case, no header(s) will be written for that RTX data source. This parameter is mutually exclusive to parameter IN_USE_RTX_HEADER .Accepts Null , single or multiple values. | Null (empty) - Headers (from a template worksheet) are not required.RTX data sources, and written to one or more target worksheets
|
IN_EXCEL_FILE | The target Excel file | (Mandatory) The Job File, Document or fully qualified file system reference (fully qualified path) to the target XLSX workbook. The file will be created if it does not exist. All RTX data will be written to one or more worksheets in this file. | |
IN_SHEET_NAME | The target worksheet name | (Mandatory) One or more values identifying the names of worksheets, in the target XLSX workbook, where RTX data will be written. Accepts single or multiple values. | Single worksheet
RTX data source is written to a single worksheet, the order data is written is determined by the file order in parameter IN_RTX_FILE . When multiple values are specified, the number of values must match that in parameter IN_RTX_FILE . |
IN_SHEET_INDEX | The required worksheet index (if created new) | One or more values identifying the (1-based) sheet number where a worksheet should be created in the target workbook (if the worksheet does not currently exist). Worksheets will not be moved to the requested index if they already exist in the workbook. If the target worksheet already exists, this value will be ignored. When creating a worksheet in a specific location, other worksheets may be shifted right to accommodate it in the requested index. Accepts Null , single or multiple values. | Null - If a target worksheet is not found within a workbook, it will be created as the last worksheet in the workbook.A single index - if it doesn't yet exist, a new worksheet will be created at this (1-based) index. Multiple indexes
|
IN_HEADER_START_CELL | The target cell where headers will be written | One or more values identifying the cell where the header for the first column will be written on the target worksheet, headers for the remaining columns will be written to cells to the right of this. A header is written to the location specified in (or defaulted to by) this parameter. The data at that location is replaced, the header is not inserted before the existing row, as is the case for new data rows. The values entered here relate to the worksheet before any new rows are added to it. Therefore, if a header start location is specified as B10 , but 10 new rows of data are inserted at any row before that, the header will in fact be found at location B20 after processing is complete. Refer to the descriptions for input parameters IN_USE_RTX_HEADER , IN_SOURCE_TEMPLATE_SHEET and IN_SOURCE_TEMPLATE_SHEET when specifying a source for headers. The values specified in this parameter are interpreted differently depending on how many RTX data sources are written to the current worksheet:When only one data source is written to the worksheet in question (for multiple input values, this reverts to behavior as though only single parameter values were supplied).Accepts Null , single or multiple values. |
|
IN_DATA_START_CELL | The target cell where RTX data will be written | One or more values identifying the cell from where the writing of data will commence on the target worksheet. The values entered here relate to the worksheet before any new rows are added to it. Therefore, if a data start location is specified as B10 , but 10 new rows of data are inserted at any row before that, the new data will in fact be found at location B20 after this operation has completed. When data is inserted, any existing worksheet rows, where the row number is greater than or equal to that specified in this parameter, will be bumped down the worksheet (i.e. their row number will increase) by the total number of rows inserted. |
The values specified in parameter IN_DATA_START_CELL
are interpreted differently depending on how many RTX
data sources are written to the current worksheet:
- When only one data source is written to a worksheet (Multiple inputs revert to behavior as though only single parameter values were supplied).
Null
, single or multiple values.Null
- If
IN_HEADER_START_CELL
isNull
.- Data from the
RTX
file will be appended to the end of the worksheet.
- Data from the
- If
IN_HEADER_START_CELL
has a single value (or multiple values where only one is relevant to this worksheet)- Data from the
RTX
file will be written immediately after the header.- Except, where there is already data in the worksheet, after the header row, in which case data from the
RTX
file will be appended to the end of the worksheet. - Single cell reference
- Except, where there is already data in the worksheet, after the header row, in which case data from the
- Data from the
- Data from the
RTX
file will be written to this start cell.- Multiple cell references (Multiple can be considered not to apply when there is only one data source relevant to this worksheet)
- Data from the
RTX
file will be written to this start cell. - When more than one data source is written to a single worksheet.
Null
, single or multiple values.Null
- If
IN_HEADER_START_CELL
isNull
.- Data from the first
RTX
file will be appended to the end of the worksheet and data from all others will be written immediately after that.
- Data from the first
- If
IN_HEADER_START_CELL
has a single value.- Data from the first
RTX
file will be written immediately after the single header, and data from all others will be written immediately after that.- Except, where there is already data in the worksheet, after the header row, where data from the first
RTX
file will be appended to the end of the worksheet and data from all others will follow that.
- Except, where there is already data in the worksheet, after the header row, where data from the first
- Data from the first
- If
IN_HEADER_START_CELL
has multiple values.- Data from the first
RTX
file will be written immediately after the first header location specified.- Except, where there is already data in the worksheet, after the header row, where data from the first
RTX
file will be appended to the end of the worksheet.
- Except, where there is already data in the worksheet, after the header row, where data from the first
- Data from all remaining
RTX
data sources will be written immediately after their corresponding [adjusted] header location.- Except, where there is already data in the worksheet, after that header row, where data will be appended to the end of the worksheet.
- A null value, provided for this data source's
IN_HEADER_START_CELL
, will result in its data being appended after data written for the previousRTX
data source, i.e. with no header.- Single cell reference
- Data from the first
- Data from the first
RTX
file will be written to this start cell and data from all others will be written immediately after that.- Multiple cell references
- Data from the first
RTX
file will be written to its start cell. - Data from all remaining
RTX
data sources will be written immediately after their corresponding [adjusted] start cell.- Data start cells, for these remaining data sources, will be adjusted by the number of rows already written to any previous data start cell(s). That is, data start cells should be specified relative to the state of the worksheet before any rows are written as we [may] have no idea how many rows will be written from each data source. For example:
- Files
RTX1
andRTX2
both have 10 rows, data start cell 1 is specified asA5
and data start cell 2 isA10
. - Data from
RTX1
is written to rows 5-14. - Data start cell 2 is adjusted to
A20
(i.e. A(10 +RTX1
number of rows)). - Data from
RTX2
is written to rows 20-29. - This approach is deterministic and means the user does not have to be concerned with the number of rows in each of the
RTX
files in order to avoid conflicts between parameters.
- Files
- Data start cells, for these remaining data sources, will be adjusted by the number of rows already written to any previous data start cell(s). That is, data start cells should be specified relative to the state of the worksheet before any rows are written as we [may] have no idea how many rows will be written from each data source. For example:
CUS_XLSX_Insert_Formula
Inserts a formula into a worksheet. It will convert all cells, within a target cell range, from shared strings to numbers and apply the number format specified to that range. The number format will be created if it does not yet exist. It is expected that the converted cells are included in the cell range(s) for the formula, but this is not enforced. For the conversion of strings to numeric values, the user can specify both the decimal point and the thousands separator characters that are expected within the string data. This implementation loads only a small part of the spreadsheet definition at any one point in time and so exhibits minimal consumption of system memory. Limitations include but are not restricted to:
- Currently only supports paths to files, support for Scheduler Documents and Job File references is not provided.
- Does not support concurrent processes targeting the same
XLSX
workbook.
Process parameters
All files are identified using an absolute path (string value).
Parameter | Description | Documentation |
---|---|---|
IN_EXCEL_FILE | The target Excel file | (Mandatory) The fully qualified path to the target XLSX workbook. A mandatory, single value parameter. |
IN_SHEET_NAME | The worksheet name | (Mandatory) The name of the worksheet where the formula will be inserted. |
IN_DATA_CELL_RANGE | The range of cells that should be converted from strings to numbers | The cell range where any shared string cells will be converted to their numeric equivalent. All other cell types will be left unchanged. Note that the cell range can be specified open-ended, e.g. A3:A , which means the cell range include all cells in column A, from row 3 to the end of the worksheet. |
IN_DATA_CELL_FORMAT_SPECIFIER | The number format to be applied to the range of cells that should be converted from strings to numbers | (Mandatory) The number format applied to all numeric cells found in the cell range specified in IN_DATA_CELL_RANGE . |
IN_FORMULA_CELL_REFERENCE | The cell reference where the new formula should be inserted | (Mandatory) The location of the cell where the formula will be written. |
IN_FORMULA | The formula to insert | (Mandatory) The formula that will be inserted into the worksheet. |
IN_THOUSANDS_SEPARATOR | The thousands separator character | The thousands separator character to consider when converting text to a numeric value. If a value is not provided a default value of a comma (, ) is applied. |
IN_DECIMAL_POINT | The decimal point character | The decimal point character to consider when converting text to a numeric value. If a value is not provided a default value of a comma (, ) is applied. |
CUS_XLSX_Optimize_Shared_String_Table
Given a target workbook, will locate all in-line strings and convert them to shared strings, making use of existing values where possible or moving the string value to the shared string table. Limitations include but are not restricted to:
- Currently only supports paths to files, support for Scheduler Documents and Job File references is not provided.
- This process must load and maintain the entire shared string table in memory. This can result in the consumption of a large amount of system memory (as determined by the size of the target workbook and the number of string values used there).
- Does not support concurrent processes targeting the same
XLSX
workbook.
Process parameters
The only parameter is mandatory and can accept only a single value. The target file is identified using an absolute path (string value).
IN_EXCEL_FILE
- The target Excel file The fully qualified path to the target XLSX
workbook.