Title: | Setup "targets" Workflows for "iquizoo" Data Processing |
---|---|
Description: | For "iquizoo" data processing, there is already a package called "preproc.iquizoo", but eventually the use of it is relied on a workflow. This package is used to build such workflows based on tools provided by "targets" package which mimics the logic of "make", automating the building processes. |
Authors: | Liang Zhang [aut, cre] , Yujian Dai [ctb] |
Maintainer: | Liang Zhang <[email protected]> |
License: | Apache License (>= 2) |
Version: | 3.12.6 |
Built: | 2024-10-29 02:27:00 UTC |
Source: | https://github.com/psychelzh/tarflow.iquizoo |
Check if the database based on the given data source is ready
check_source(group = getOption("tarflow.group"))
check_source(group = getOption("tarflow.group"))
group |
Section identifier in the |
TRUE if the database is ready, FALSE otherwise.
Clean users properties
clean_users_props(users, props)
clean_users_props(users, props)
users |
A data.frame contains the users properties. |
props |
A character vector of the users properties to keep. |
A data.frame contains the cleaned users properties.
This function is a wrapper of fetch_iquizoo()
, which is used as a helper
function to fetch data from the iQuizoo database.
fetch_data( project_id, game_id, ..., what = c("raw_data", "scores"), query = NULL, suffix_format = "%Y0101" )
fetch_data( project_id, game_id, ..., what = c("raw_data", "scores"), query = NULL, suffix_format = "%Y0101" )
project_id |
The project id to be bound to the query. |
game_id |
The game id to be bound to the query. |
... |
Further arguments passed to |
what |
What to fetch. Can be either "raw_data" or "scores". |
query |
A parameterized SQL query. A default query file is stored in the package, which is often enough for most cases. You can also specify your own query file by this argument. See details for more information. |
suffix_format |
The format of the date suffix. See details for more information. |
The data essentially means one of the two types of data: raw data or scores. The raw data is the original data collected from the game, while the scores are the scores calculated by the iQuizoo system. While scores can also be calculated from the raw data, the pre-calculated scores are used to for some quick analysis.
The data is separated by project date, so the table name is suffixed by the
project date, which is automatically fetched from the database by this
function. You could set the format of the date suffix by suffix_format
,
although currently you should not need to change it because it probably will
not change in the future. Finally, this suffix should be substituted into the
query, which should contain an expression to inject the table name, i.e.,
"{table_name}"
.
A data.frame contains the fetched data.
Fetch result of query from iQuizoo database
fetch_iquizoo(query, ..., params = NULL, group = getOption("tarflow.group"))
fetch_iquizoo(query, ..., params = NULL, group = getOption("tarflow.group"))
query |
A character string containing SQL. |
... |
Further arguments passed to |
params |
The parameters to be bound to the query. Default to |
group |
Section identifier in the |
A data.frame contains the fetched data.
fetch_iquizoo_mem()
for a memoised version of this function.
fetch_iquizoo()
This function is a memoised version of fetch_iquizoo()
. It is useful when
the same query is called multiple times or you want to cache the result. See
memoise::memoise()
and fetch_iquizoo()
for more details.
fetch_iquizoo_mem(cache = NULL)
fetch_iquizoo_mem(cache = NULL)
cache |
The cache to be used. Default cache could be configured by
setting the environment variable |
A memoised version of fetch_iquizoo()
.
fetch_iquizoo()
for the original function.
Get the names of the user properties.
get_users_props_names()
get_users_props_names()
A character vector of the names.
Raw data fetched from iQuizoo database is stored in json string format. This
function is used to parse raw json string data as data.frame()
and store
them in a list column.
parse_data(data, col_raw_json = "game_data", name_raw_parsed = "raw_parsed")
parse_data(data, col_raw_json = "game_data", name_raw_parsed = "raw_parsed")
data |
The raw data. |
col_raw_json |
The column name storing raw json string data. |
name_raw_parsed |
The name used to store parsed data. |
A data.frame contains the parsed data.
This function will create a MySQL option file at the given path. To ensure it works, set these environment variables before calling this function:
MYSQL_HOST
: The host name of the MySQL server.
MYSQL_USER
: The user name of the MySQL server.
MYSQL_PASSWORD
: The password of the MySQL server.
setup_option_file(path = NULL, overwrite = FALSE, quietly = FALSE)
setup_option_file(path = NULL, overwrite = FALSE, quietly = FALSE)
path |
The path to the option file. Default location is operating system
dependent. On Windows, it is |
overwrite |
Whether to overwrite the existing option file. |
quietly |
A logical indicates whether message should be suppressed. |
NULL (invisible).
If you want to extract data based on your own parameters, you should use this function to set up your own SQL templates. Note that the SQL queries should be parameterized.
setup_templates( contents = NULL, users = NULL, raw_data = NULL, scores = NULL, progress_hash = NULL )
setup_templates( contents = NULL, users = NULL, raw_data = NULL, scores = NULL, progress_hash = NULL )
contents |
The SQL template file used to fetch contents. At least
|
users |
The SQL template file used to fetch users. Usually you don't need to change this. |
raw_data |
The SQL template file used to fetch raw data. See
|
scores |
The SQL template file used to fetch scores. See |
progress_hash |
The SQL template file used to fetch progress hash. Usually you don't need to change this. |
A S3 object of class tarflow.template
with the options.
This target factory is the main part of the tar_prep_iquizoo
function. It
fetches the raw data and scores for each project and task/game combination.
tar_fetch_data( contents, what = c("raw_data", "scores"), templates = setup_templates(), check_progress = TRUE )
tar_fetch_data( contents, what = c("raw_data", "scores"), templates = setup_templates(), check_progress = TRUE )
contents |
The contents structure used as the configuration of data fetching. |
what |
What to fetch. |
templates |
The SQL template files used to fetch data. See
|
check_progress |
Whether to check the progress hash. If set as |
A list of target objects.
The user information is used to identify the users involved in the project.
tar_fetch_users( contents, subset_users_props = get_users_props_names(), templates = setup_templates(), check_progress = TRUE )
tar_fetch_users( contents, subset_users_props = get_users_props_names(), templates = setup_templates(), check_progress = TRUE )
contents |
The contents structure used as the configuration of data fetching. |
subset_users_props |
The subset of user properties to be fetched. See
|
templates |
The SQL template files used to fetch data. See
|
check_progress |
Whether to check the progress hash. Set it as |
A list of target objects.
The progress hash stores the progress of the project, which is used to check whether the project is updated.
tar_prep_hash(contents, templates = setup_templates())
tar_prep_hash(contents, templates = setup_templates())
contents |
The contents structure used as the configuration of data fetching. |
templates |
The SQL template files used to fetch data. See
|
These objects are named as progress_hash_{project_id}
for each project.
A list of target objects.
This target factory prepares a set of target objects used to fetch data from iQuizoo database, separated into static branches so that each is for a specific project and task/game combination. Further pre-processing on the fetched data can also be added if requested.
tar_prep_iquizoo( params, contents, ..., what = c("raw_data", "scores"), action_raw_data = c("all", "parse", "none"), combine = NULL, subset_users_props = get_users_props_names(), templates = setup_templates(), check_progress = TRUE, cache = NULL )
tar_prep_iquizoo( params, contents, ..., what = c("raw_data", "scores"), action_raw_data = c("all", "parse", "none"), combine = NULL, subset_users_props = get_users_props_names(), templates = setup_templates(), check_progress = TRUE, cache = NULL )
params , contents
|
Used as the configuration of data fetching. These two
arguments are mutually exclusive. If |
... |
For future usage. Should be empty. |
what |
What to fetch. There are basically two types of data, i.e., raw
data and scores. The former is the logged raw data for each trial of the
tasks/games, and further actions on the fetched raw data can be specified
by |
action_raw_data |
The action to be taken on the fetched raw data. There
are two consecutive actions, i.e., raw data parsing and pre-processing. The
former will parse the |
combine |
Specify which targets to be combined. Note you should only
specify names from |
subset_users_props |
The subset of user properties to be fetched. See
|
templates |
The SQL template files used to fetch data. See
|
check_progress |
Whether to check the progress hash. Set it as |
cache |
The cache to be used in |
A list of target objects.
This target factory is the main part of the tar_prep_iquizoo
function. It
wrangles the raw data into a tidy format and calculates indices based on the
parsed data.
tar_prep_raw( contents, action_raw_data = c("parse", "preproc"), name_data = "raw_data", name_parsed = "raw_data_parsed", name_indices = "indices" )
tar_prep_raw( contents, action_raw_data = c("parse", "preproc"), name_data = "raw_data", name_parsed = "raw_data_parsed", name_indices = "indices" )
contents |
The contents structure used as the configuration of data fetching. |
action_raw_data |
The action to be taken on the fetched raw data. |
name_data |
The name of the raw data target. |
name_parsed |
The name of the parsed data target. |
name_indices |
The name of the indices target. |
A list of target objects.
This function creates a standard data fetching targets pipeline script for you to fill in.
use_targets_pipeline()
use_targets_pipeline()
NULL (invisible). This function is called for its side effects.