Skip to content

Redesign key quantity extraction/insertion pipeline#205

Open
sbreitbart-NOAA wants to merge 23 commits intomainfrom
redesign-kqs
Open

Redesign key quantity extraction/insertion pipeline#205
sbreitbart-NOAA wants to merge 23 commits intomainfrom
redesign-kqs

Conversation

@sbreitbart-NOAA
Copy link
Copy Markdown
Collaborator

Since we've redesigned the figures and tables to be more flexible on the user's end (i.e., they can more powerfully customize what their plots show by specifying plot arguments), the existing pipeline to extract key quantities and then insert them into the captions/alt text template doesn't work for several quantities anymore. Basically, key quantities need to reflect the users' data summarization, grouping specifications, filtering, etc.

To fix this, I've begun redesigning this pipeline so that key quantities are:

  1. extracted in each plotting function (rather than most at once with write_captions(), and then the remainders with add_more_key_quants())
  2. exported to a csv listing ALL key quantities
  3. inserted into a csv with captions and alt text for figures and tables.

This pipeline should also resolve issues of key quantity calculation dependency, where the calculation of one kq depends on the prior calculation of another. If they're all calculated and stored in a single place, all can be referenced for future calculations of other quantities.

This will also facilitate sharing the kqs with SIS.

Ultimately, write_captions() and add_more_key_quants() should become redundant and removed.

@Schiano-NOAA I've just made a pilot draft focused on plot_landings(), which was a simple example. Could you please take a look and let me know if this is on the right track?

@github-actions
Copy link
Copy Markdown
Contributor

New version checklist

  • Package version in DESCRIPTION has been updated
  • Release notes have been drafted/published
  • Cheatsheet content has been updated (if applicable)
  • Cheatsheet version has been updated


### Make RDA ----
if (make_rda) {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this chunk going to be unique to all plots? Like we won't be able to generalize this and make it more concise?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line 102-108

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lines 102-108 are where key quantities are calculated for this plot (and I'd expect this is where each plot/table's KQs will be calculated too). The next couple of functions are where the generalization happens: in the exporting of the KQs into a csv, and the insertion of the KQs into the alt text/captions csv.

@Schiano-NOAA
Copy link
Copy Markdown
Collaborator

Schiano-NOAA commented Mar 16, 2026

I think this is a good start! Minor questions/comments in the code base.

I also noticed that there is an extra space after the year quantity and it would be nice to have similar kqs grouped together. In the example below, the year kqs are not placed together.

image

Oh! Also, please add the new csv created to the gitignore

@Schiano-NOAA
Copy link
Copy Markdown
Collaborator

Schiano-NOAA commented Mar 16, 2026

Also, what is the use of the key_quantities.csv for the user/function? Could it be a temporary file? Maybe you can make it an excel file and use tabs or something? Just a thought in order to reduce the number of files a user has to keep track of.

@sbreitbart-NOAA
Copy link
Copy Markdown
Collaborator Author

Also, what is the use of the key_quantities.csv for the user/function? Could it be a temporary file? Maybe you can make it an excel file and use tabs or something? Just a thought in order to reduce the number of files a user has to keep track of.

It's a file where they can find their model's key quantities. I think it should be permanent, especially because it could be useful when sending information to SIS. They won't have to edit it unless they need to add or edit a key quantity, but I don't expect they'll need to do that very often

@sbreitbart-NOAA
Copy link
Copy Markdown
Collaborator Author

I think this is a good start! Minor questions/comments in the code base.

I also noticed that there is an extra space after the year quantity and it would be nice to have similar kqs grouped together. In the example below, the year kqs are not placed together.
image

Oh! Also, please add the new csv created to the gitignore

Thanks for your review! I'll group the KQs and remove that space.

…accurate when relative = TRUE; update plot_indices() documentation
Copy link
Copy Markdown
Collaborator

@Schiano-NOAA Schiano-NOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like it functions well and is a nice adjustment to the extrraction and replacement of key quantities. I am highly advocating to use a function to extract the quantities instead of repeating the same extraction for each function. I think it could be done as such:

# data = either returned data frame from process_data or a ggplot
# x = name of x axis column
# y = name of y-axis column
# additional_columns = a string of names of columns that are also included within the plot
# topic = label found in the key quantities template to be replaced
extract_quantities <- function(data, x, y, additional_columns = NULL, topic = "label") {
  if (class(x)[1] == "gg" | class(x)[2] == "ggplot") {
    # this indicates that the plot is relative
    my_quantities <- list(
     glue::glue("{label}_{y}_min") = ggplot2::ggplot_build(final)@data[[2]] |>
        as.data.frame() |>
        dplyr::pull(y) |>
        min() |>
        round(digits = 2)
      glue::glue("{label}_{y}_max") = ggplot2::ggplot_build(final)@data[[2]] |>
        as.data.frame() |>
        dplyr::pull(y) |>
        max() |>
        round(digits = 2)
    )
  } else {
    # indicate plot not relative
    my_quantities <- list(
      glue::glue("{label}_{x}_min") = min(data[[x]])
      glue::glue("{label}_{x}_max")  = max(data[[x]])
      glue::glue("{label}_{y}_min")  = min(data[[y]])
      glue::glue("{label}_{y}_max") = max(data[[y]])
    )
    if (!is.null(additional_columns)) {
      for (i in seq_along(additional_columns) {
        append(my_quantities,
          glue::glue("{label}_{i}_min") = min(data[[i]])
          glue::glue("{label}_{i}_max") = max(data[[i]])
        )
      } # close for loop
    } # close if add col
  } # close ifelse
  # potentially add check if rounding is needed and perform the round(x, digits = 3)
  my_quantities 
} # close fxn

quantities <- extract_quantities(data, x = "age", y = "year", additional_columns = "estimate", topic = "pop.baa")

# calculate & export key quantities
# need to change fxn to accept list
export_kqs(quantities)
    
# Add key quantities to captions/alt text
# need to change fxn to accept list
insert_kqs(quantities)

Hopefully this is a start to this fxn to build off of. I have not tested it and wrote it in this comment, so apologies if it's a little wack 😆

Thanks for doing this!

@sbreitbart-NOAA
Copy link
Copy Markdown
Collaborator Author

NOTE: Uncomment tests that were previously failing in asar::create_tables_doc() and asar::export_split_tbls() once this is merged; they should work after merging

sbreitbart-NOAA added a commit to nmfs-ost/asar that referenced this pull request Mar 27, 2026
Schiano-NOAA added a commit to nmfs-ost/asar that referenced this pull request Mar 27, 2026
* Update "ESP" meaning in glossary as per #423

* update cheatsheet to remove 'withr()' wrapper from add_a11y function (#424)

* Update special characters in glossary (#425)

* Update glossary so that meanings with underscores have backslash to escape underscore

* Update glossary so that meanings and acronyms have backslashes to escape special characters, and underscores in labels are converted to hyphens

* Update glossary so that underscores and parentheses in labels are converted to hyphens

* update affiliation with duplicate city for afsc seattle offices (#426)

* Update preamble prompt (#430)

* Update prompt about preamble as per #416 (comment)

* Update prompt about preamble to exactly match suggestion in #416 (comment)

* Update SAFE template table structure from flextable to asar (#432)

* Update a11y guide with info about how to escape special characters in alt text/captions csv

* Update readme example create_template() arg from 'author' to 'authors'

* Update SAFE template table structure from flextable to asar as per #428

* Update functions that relied on 'flextable' and convert to 'gt' (#429)

* Update a11y guide with info about how to escape special characters in alt text/captions csv

* Update readme example create_template() arg from 'author' to 'authors'

* Update ID_tbl_width_class() to work with gt tables, vs flextables

* Update create_tables_doc() and related functions so begin replacing flextable-based pipeline with gt (for identifying table width, and splitting/rotating where needed). Next step: split super-long tables where necessary

* create new function to get table length; update documentation

* Redesign tables doc workflow to work with >1 tables shown on portrait orientation and 1 table shown on landscape orientation

* Redesign tables doc workflow to work with >1 table shown on landscape orientation

* Redesign tables doc workflow to work with remaining types of tables

* update tests by removing flextable and replacing with gt

* sub flextable with gt in DESCRIPTION and devcontainer

* create tests for ID_tbl_length_class()

* Update tests

* Update captions for split tables by removing individual column names shown, and instead state number of table vs total number of tables

* Update render_lg_table() to work with varying essential column values, and redesign to better work with gt tables vs flextables; update documentation and tests

* Delete unusable, duplicated license

* Address warnings in R cmd check

* Comment out code that restricts table column width unnecessarily, since tab_options(table.width) is sufficient; add "tbl-pos: 't'" to table chunk options to anchor tables

* Edit add_child() to add line to tables chunk in skeleton to add  "clearpage" latex command

* Update captions for tables split by columns to remove mention of specific columns in the table

* Set col widths for tables; fix issue with extra-wide tables

* Add missing pipes to create_tables_doc()

* add gt_split into utils (#433)

* add gt_split into utils

* remove header from gt and add function dependencies in gt_split

* update documentation for gt_split in export

* update create_tables_doc to reference asar::gt_split and not gt

---------

Co-authored-by: Sam Schiano <125507018+Schiano-NOAA@users.noreply.github.com>

* Fix typo and update WORDLIST

* Allow messages about tables when running create_template() (#431)

* Update a11y guide with info about how to escape special characters in alt text/captions csv

* Update readme example create_template() arg from 'author' to 'authors'

* Allow messages in create_template() stating if tables aren't found, and how to make them, to match messaging produced for figures

* add float package to in-header tex (#427)

* update documentation

* Comment out tests that should pass once nmfs-ost/stockplotr#205 is merged

* change action so only comments when the PR is first opened (#435)

* update version in pkg and cheatsheet

* fix date in cheatsheet

* Add missing leading zeroes back to affiliation zip codes

---------

Co-authored-by: sbreitbart-NOAA <sophie.breitbart@noaa.gov>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants