Metadata Node
Overview
Mapping Settings
Local Types Settings
Examples
Built-in Types
Scripting
Trademarks
Overview
The Metadata node provides a convenient mechanism for defining metadata and associating that metadata with one or more fields. Unlike the Type node where each field has its own metadata definition, the Metadata node separates the metadata definition from the mapping of those definitions to fields. In addition, the metadata definitions can be obtained from another Metadata node which means a stream can include a “single source of truth” for all metadata. This makes the management of metadata much simpler than can be achieved using the standard Type node.
The Metadata node also includes a user-friendly interface (UI) which organises definitions of flags, ranges and sets in a clear fashion. It also allows multiple fields to be mapped to a metadata definition with a single click. The UI also makes it simpler to define sets with large numbers of discrete values by allowing rows copied from text files or Microsoft® Excel® to be pasted directly into the UI.
Finally, the Metadata node includes a number of built-in types that occur frequently in data:
- 0/1 flag values (by default the Type node defines 0/1 values as an integer range)
- integer sets representing days of the week, months of the year etc.
- integer and real ranges for probabilities, percentages etc.
Mapping Settings
This tab defines the where the metadata definitions are obtained from and how those definitions are mapped to fields.
Metadata mode
This defines where the node obtains its metadata definitions from:
- Provider: (the default setting) use metadata defined in the Local Types Settings tab
- Consumer: use metadata defined in the Metadata node defined by Provider node
Provider node
When Metadata mode is set to Consumer, this specifies the ID of the of the Metadata node that provides the metadata.
Field types
This specifies how metadata definitions are mapped to which fields. Multiple fields can be selected from the Fields control and a single type from the Types control. When the right arrow is clicked, this will add the selected fields and type to the Field types control. Mappings can be removed by selecting one or more rows in the Field types control and clicking the left arrow.
Local Types Settings
This tab defines the custom flag, set and range types that can be used by the Mapping Settings when Metadata mode is set to Provider. There are 3 subsections to this tab for each of the 3 types of metadata: Ranges, Sets and Flags. Each subsection contains a table where each row represents a metadata definition.
Controls at the bottom of each table allow you to:
- Create a new type definition from scratch
- Clone/create a new type definition based on another definition (requires that a single row has been selected in the table)
- Edit an existing type definition (requires that a single row has been selected in the table)
- Delete type definitions (requires that one or more rows have been selected in the table)
Creating A New Type
All new types require:
- a name
- a unique ID
- the type of value (“storage”) – for sets and flags, this is either String, Integer or Real; for ranges, this is either Integer or Real
The remaining information depends on the metadata being created:
- for ranges, this is the lower and upper bounds of the range
- for sets, this is the list of valid values the make up the set in the data and (optionally) the value labels
- for flags, this is the true and false values that appear in the data and (optionally) the true and false value labels
Cloning A Type
Sometimes it is simpler to start from an existing type when both types are similar to each other. Cloning creates a copy of an existing type so that the existing settings can be modified. Note that a new type ID must be specified.
Editing An Existing Type
This option allows an existing type to be edited. For example, modifying the name, changing the set members etc. Note that the unique ID cannot be modified.
Deleting Types
This option allows the selected types to be deleted.
Note that if a definition gets deleted that is still mapped to one or more fields then in the Field types setting, those fields will shown with an error symbol. This will also happen in any Metadata nodes that consume this node’s metadata. Fields mapped to non-existent types will have their metadata unchanged.
Examples
Defining A Household Income Range
- Name: Change to
Household Income
. Note that as you type this, the ID will be updated - ID:
range_household_income
- Type: Change to
Real
- Lower: Change to
0.0
- Upper: Change to
1000000.0
(i.e. one million)
Defining A Car Category Set
- Name: Change to
Car Category
. Note that as you type this, the ID will be updated - ID:
set_car_category
- Type: Keep as
String
- Values: Enter the following values – one value per line, no commas:
mpv
,suv
,hybrid
,electric
,other
- Value labels: Enter the following values – one value per line, no commas:
MPV
,SUV
,Hybrid
,Electric
,Other
Defining A Purchased Flag
- Name: Change to
Purchased
. Note that as you type this, the ID will be updated - ID:
flag_purchased
- Type: Change to
Integer
- True value: Change to
1
- True label: Change to
Purchased
- False value: Change to
0
- False label: Change to
Not purchased
Built-in Types
A number of common type definitions are built in to the Metadata Provider node.
Type ID | Description | Storage | Measure |
---|---|---|---|
sys_int_range_0_10 | Integer range (0 – 10) | integer | range |
sys_int_range_1_10 | Integer range (1 – 10) | integer | range |
sys_int_range_1_12 | Integer range (1 – 12) | integer | range |
sys_int_range_0_23 | Integer range (0 – 23) | integer | range |
sys_int_range_0_59 | Integer range (0 – 59) | integer | range |
sys_int_range_0_100 | Integer range (0 – 100) | integer | range |
sys_int_range_1_100 | Integer range (1 – 100) | integer | range |
sys_real_range_0_1 | Real range (0.0 – 1.0) | real | range |
sys_real_range_0_10 | Real range (0.0 – 10.0) | real | range |
sys_real_range_1_10 | Real range (1.0 – 10.0) | real | range |
sys_real_range_0_100 | Real range (0.0 – 100.0) | real | range |
sys_real_range_1_100 | Real range (1.0 – 100.0) | real | range |
sys_set_hours_0_23 | 24 Hours: integer set (0 – 23) | integer | set |
sys_set_hours_1_24 | 24 Hours: integer set (1 – 24) | integer | set |
sys_set_time_0_59 | Minutes/seconds: integer set (0 – 59) | integer | set |
sys_set_time_1_60 | Minutes/seconds: integer set (1 – 60) | integer | set |
sys_set_days_0_6_mon | Days of week: integer set (0=Monday – 6=Sunday) | integer | set |
sys_set_days_1_7_mon | Days of week: integer set (1=Monday – 7=Sunday) | integer | set |
sys_set_days_0_6_sun | Days of week: integer set (0=Sunday – 6=Saturday) | integer | set |
sys_set_days_1_7_sun | Days of week: integer set (1=Sunday – 7=Saturday) | integer | set |
sys_set_month_day_1_31 | Day in month: integer set (1 – 31) | integer | set |
sys_set_month_1_12 | Months: integer set (1=January – 12=December) | integer | set |
sys_flag_1_0_tf | True/False: integer flag (1=True, 0=False) | integer | flag |
sys_flag_1_0_yn | Yes/No: integer flag (1=Yes, 0=No) | integer | flag |
Scripting
Settings
Node type name: metadata_provider
Setting | Property | Type | Comment |
---|---|---|---|
Metadata mode | metadata_mode | provider or consumer | – |
Provider node | metadata_provider | String | – |
Field types | field_types | List of [field-name, type-id] | See below |
Local Types: Ranges | fixed_ranges | List of Range Definition | See below |
Local Types: Sets | fixed_sets | List of Set Definition | See below |
Local Types: Flags | fixed_flags | List of Flag Definition | See below |
– | rebuild_metadata | Boolean | Allows a script to force a rebuild of the node’s output data model |
Type Definitions
All type definitions contain the following values:
- ID (string): a unique ID for this type
- Name (string): a display name
- Last modified (string): a time stamp
- Value type (string): one of
integer
orreal
for ranges; one ofstring
,integer
orreal
for sets and flags
The remaining values depend on which type is being defined.
Note: the values in the type definitions are always defined as strings, regardless of what the defined value type is. This is needed because IBM SPSS Modeler requires a fixed type definition.
Range Definition
A range definition consists of the following fields:
- ID (string): a unique ID for this type
- Name (string): a display name
- Last modified (string): a time stamp (see Note On Last modified below)
- Value type (string): one of
integer
orreal
- Lower bound (string): the lower bound as a string
- Upper bound (string): the upper bound as a string
For example:
# Assumes the function 'now()' has already been defined
rangeDef = ["my_range", "My range", now(), "real", "-100.0", "100.0"]
Set Definition
A set definition consists of the following fields:
- ID (string): a unique ID for this type
- Name (string): a display name
- Last modified (string): a time stamp (see Note On Last modified below)
- Value type (string): one of
string
,integer
orreal
- Values (list of string): the valid set values
- Value labels (list of string): the set labels in the same order as the values
Note Unfortunately there appears to be an issue in IBM SPSS Modeler scripting which prevents value lists within structures being processed correctly. For example:
# Assumes the function 'now()' has already been defined
setDef = ["my_set", "My set", now(), "integer", ["0", "1", "2"], ["Low", "Medium", "High"]]
does not work as expected and results in a set type with no values or value labels. To work around this, it is necessary to convert the list definition into an IBM SPSS Modeler structured value, and ensure the list values are also converted to valid Java® values (i.e. Java list containing Java strings).
The following Python script snippet defines functions to do these tasks:
import java.lang.String
import java.util.ArrayList
def createStructure(node, propertyName):
typeDef = node.getStructuredPropertyDefinition(propertyName)
pf = modeler.script.session().getPropertyFactory()
return pf.createDefaultStructuredValue(typeDef)
def fillStructure(structure, values):
index = 0
for value in values:
structure = structure.changeAttributeValue(index, value)
index += 1
return structure
def toStringList(values):
jlist = java.util.ArrayList()
for value in values:
jlist.add(java.lang.String(value))
return jlist
# Convert the values and value labels to Java objects by calling
# "toStringList()" on them.
setDef = ["my_set", "My set", now(), "integer", toStringList(["0", "1", "2"]), toStringList(["Low", "Medium", "High"])]
# Now convert the definition into an SPSS Modeler structure.
# This requires access to a metadata node in order to get the
# structure definition e.g. if one already exists:
node = modeler.script.stream().findByType("metadata_provider", None)
# Need to specify which property to make sure the correct structure
# definition is created.
setTypeDef = createStructure(node, "fixed_sets")
setStructure = fillStructure(setTypeDef, setDef)
Flag Definition
A flag definition consists of the following fields:
- ID (string): a unique ID for this type
- Name (string): a display name
- Last modified (string): a time stamp (see Note On Last modified below)
- Value type (string): one of
string
,integer
orreal
- True value (string): the value representing true
- True label (string): the label for the true value
- False value (string): the value representing false
- False label (string): the label for the false value
For example:
# Assumes the function 'now()' has already been defined
flagDef = ["my_flag", "My flag", now(), "string", "y", "Yes", "n", "No"]
Note On Last modified
The Last modified value is represented as the number of seconds since Jan 1st 1970 displayed as a string. The requirement for the value to be stored as a string is because IBM SPSS Modeler does not support a long
value type which would be required to represent the value accurately as a number.
The following Python script snippet defines a function called now()
which generates a string in the correct format using the current time:
import time
def now():
return str(int(round(time.time() * 1000)))
# Call now() when create a type definition such as a flag:
flagDef = ["my_flag", "My flag", now(), "string", "y", "Yes", "n", "No"]
Scripting Example
import time
import java.lang.String
import java.util.ArrayList
def now():
return str(int(round(time.time() * 1000)))
def createStructure(node, propertyName):
typeDef = node.getStructuredPropertyDefinition(propertyName)
pf = modeler.script.session().getPropertyFactory()
return pf.createDefaultStructuredValue(typeDef)
def fillStructure(structure, values):
index = 0
for value in values:
structure = structure.changeAttributeValue(index, value)
index += 1
return structure
def toStringList(values):
jlist = java.util.ArrayList()
for value in values:
jlist.add(java.lang.String(value))
return jlist
main_node = modeler.script.stream().createAt("metadata_provider", u"Metadata Provider", 192, 192)
# Note that values are specified as strings even though the storage is real
rangeDef = ["temperature_range", "Temperature Range", now(), "real", "-20.0", "50.0"]
# Note that values are specified as strings even though the storage is integer
setDef = ["question_response", "User response", now(), "integer", \
toStringList(["0", "1", "2", "3", "4"]), \
toStringList(["Strongly disagree", "Disagree", "Neutral", "Agree", "Strongly agree"])]
flagDef = ["yn_indicator", "Y/N indicator", now(), "string", "y", "Yes", "n", "No"]
# Remember to convert the basic set definition into a structure
setStructure = fillStructure(createStructure(main_node, "fixed_sets"), setDef)
main_node.setPropertyValue("fixed_flags", [flagDef])
main_node.setPropertyValue("fixed_sets", [setStructure])
main_node.setPropertyValue("fixed_ranges", [rangeDef])
# Finally map input fields to the types
main_node.setPropertyValue("field_types", [ \
["external_temp", "temperature_range"], \
["internal_temp", "temperature_range"], \
["question1", "question_response"], \
["question2", "question_response"], \
["question3", "question_response"], \
["would_buy_now", "yn_indicator"], \
["would_buy_later", "yn_indicator"] \
])
# Now create a consumer that can re-use the metadata definitions from the main metadata node
consumer_node = modeler.script.stream().createAt("metadata_provider", u"Metadata Consumer", 192, 384)
consumer_node.setPropertyValue("metadata_provider", main_node.getID())
consumer_node.setPropertyValue("metadata_mode", "consumer")
# Need to map the fields at this node to the consumed types
consumer_node.setPropertyValue("field_types", [ \
["indoor_temp", "temperature_range"], \
["outdoor_temp", "temperature_range"], \
["q1", "question_response"], \
["q2", "question_response"], \
["q3", "question_response"], \
])
Trademarks
IBM and SPSS are registered trademarks of International Business Machines Corp. Java is a registered trademark of Oracle Corp. and/or its affiliates.