Population Synthesis
Overview
GTAModelV4 has a population synthesized from the TTS (Transportation for Tomorrow Survey) records.
Required Software and Dependencies
- MySql Server 5.5 or greater
- Python V3 or greater with required package dependencies dependencies
Installation and Setup
Mysql Server Installation
If you do not have a copy of MySQL Server - an installer is available for download at https://dev.mysql.com/downloads/mysql/. A minimum version of version 5.5 of the Community Edition is required.
During the installation process, keep note of the server isntallation's root password that you provide. A valid database user and password is required is required to connect to the database server.
Server Configuration
It may be necessary to increase the size of the max_allowed_packet
property before running the server. This property can be set in the server's configuration file. On Windows, the default property file loaded is my.ini
, located in the root of the server installation directory.
Note
For more information about this server variable, please see the MySql documentation at https://dev.mysql.com/doc/refman/5.5/en/server-system-variables.html#sysvar_max_allowed_packet.
Creating a Database
Apart from the MySQL Server installation, a database must be created with the proper access privileges for the user account used to connect to the server.
(TODO)
Python Installation
A valid installation of Python 3.5+ is required. Python can be downloaded at https://www.python.org/downloads/ or using the anaconda distrubtion: https://www.anaconda.com/distribution/. The Anaconda distrubtion is preferred, as there are some performance improvements available that are applicable to the packages used in gtamodel_popsyn (pandas, numpy).
Installing Package Dependencies
Restore package requirements with pip:
>pip install -r requirements.txt
Input Configuration
GTAModel's population synthesis procedure requires defining its inputs through a JSON formatted configuration file. A blank / schema version of the required input format is provided for guidance in creating the properly structured input.
The default configuration JSON has the following format:
{
"DatabaseName": "",
"DatabasePassword": "",
"DatabaseUser": "",
"DatabaseServer": "",
"PersonsSeedFile": "",
"HouseholdsSeedFile": "",
"OutputFolder": "",
"MazLevelControls": "",
"TazLevelControls": "",
"MetaLevelControls": "",
"Java64Path": "C:\\Path\\To\\Jre",
"PopSyn3SettingsFile": "runtime\\config\\settings.xml",
"CategoryMapping": {
"Persons": [
{
"FieldName": "",
"FieldValue": "",
"MappedValue": 0
}
],
"Households": [
{
"FieldName": "",
"FieldValue": "",
"MappedValue": 0
}
]
}
}
Configuration Value | Description |
---|---|
DatabaseName | The database name to be used during the population synthesis procedure. |
DatabasePassword | The password to be used to connect to the database. |
DatabaseUser | The user account to be used to connect to the database. |
DatabaseServer | The IP address or host name of the database server. |
PersonsSeedFile | File path to the persons seed data to be used. This file is expected to be in CSV format. |
HouseholdsSeedFile | File path to the households seed data to be used. This file is expected to be in CSV format. |
OutputFolder | Path to the folder where all output files will be written, including log files. |
MazLevelControls | Input file path specifying control totals at MAZ level of geography. This file is autogenerated as part of the preconfiguration setup. |
TazLevelControls | Input file path specifying control totals at TAZ level of geography. This file is autogenerated as part of the preconfiguration setup. |
MetaLevelControls | Input file path specifying control totals at the META or regional level of geography. This file is autogenerated as part of the preconfiguration setup. |
Java64Path | Path location of an available JRE installation. There is no need to specify the installation bin folder, only the base path of the installation. |
PopSyn3SettingsFile | File location of the settings configuration file used by PopSyn3. This is an XML document that is transformed to include configuration values specified in config.json alongside other PopSyn3 specific configuration values. |
Category Mapping | Contains a list of mapping values for households and persons that transforms input attributes from a character value to an integer mapping |
Population Synthesis Procedure
Pre-processing Input Data
Starting a Run
The python module gtamodel_popsyn
is used from the command line to perform all steps required to complete the population synthesis. The default behaviour when executing the gtamodel_popsyn
module from the command line is perform all steps of the synthesis procedure:
- Input transformations.
- Control total calculations.
- Synthesize records.
- Write synthesized records to file.
- Generate a summary report.
There are various several command line arguments available when running the module to target a specific step of the synthesis procedure. Command line arguments and their descriptions are listed below.
usage: __main__.py [-h] [-c CONFIG] [-i] [-d] [-o OUTPUT_ONLY]
[-r VALIDATION_REPORT_ONLY]
optional arguments:
-h, --help show this help message and exit
-c CONFIG, --config CONFIG
Path of the configuration file to use.
-i, --input-process-only
Only generate synthesis files and don't run synthesis
procedure.
-d, --database-only Only initialize the database and tables required for
PopSyn3.
-o OUTPUT_ONLY, --output-only OUTPUT_ONLY
Only write synthesized population from existing
database data.
-r VALIDATION_REPORT_ONLY, --validation-report-only VALIDATION_REPORT_ONLY
Only generate a summary report from existing output
files. Pass the generated output folder to use.
Note
Please ensure that your MySQL server is started before starting the synthesis procedure.
Post-Processing
Post-processing is performed automatically in the standard execution of the gtamodel_popsyn module. To specifically target the post-processining step of gtamodel_popsyn
, use either the -r
or -o
command line arguments to only generate a report, or only generate output files.
When choosing to create the validation report only, the path to a previous run is required as part of the command line arguments. The relative path to the output folder is necessary.
Output Files
After the population synthesis procedure has completed, the synthesized population will be written to the output folder specified in the configuration settings.
Population Files
Households
Persons
Zonal Residence
Employment and Occupation Vectors
Log Files
Logginging information from different stages of execution are appended to several different log files in the output directory: 1. event.log - Output from PopSyn3 2. post-process.log - Output from the post process stage of execution. 3. pre-process.log - Output from the pre-process stage of execution. 4. run.log - Output during execution, which contains a mix of GTAModel specific information alongside PopSyn3 output information.