{"id":12885,"date":"2025-08-20T13:21:52","date_gmt":"2025-08-20T18:21:52","guid":{"rendered":"https:\/\/wcftr.commarts.wisc.edu\/?p=12885"},"modified":"2025-08-20T13:21:52","modified_gmt":"2025-08-20T18:21:52","slug":"teaching-a-computer-to-read-a-pressbook","status":"publish","type":"post","link":"https:\/\/wcftr.commarts.wisc.edu\/index.php\/2025\/08\/20\/teaching-a-computer-to-read-a-pressbook\/","title":{"rendered":"Teaching a Computer to Read a Pressbook"},"content":{"rendered":"<p><strong>Ben Pettis<\/strong><\/p>\n<p>From the 1910s through the 1980s, Hollywood studios promoted their movies through the creation and dissemination of <a href=\"https:\/\/mediahist.org\/collections\/pressbooks\/\">pressbooks<\/a>\u2014bound pamphlets containing publicity materials, advertising layouts, accessories for sale, and other promotional tactics. These promotional booklets were sent to exhibitors and press outlets, making them vital nodes within the wider networks of film circulation and culture.<\/p>\n<figure id=\"attachment_12886\" class=\"wp-caption alignleft\" style=\"max-width: 216px;\" aria-label=\"Pressbook for Sh! The Octopus, Warner Bros. 1937\"><a href=\"https:\/\/wcftr.commarts.wisc.edu\/wp-content\/uploads\/2025\/08\/pressbook-wb-sh-the-octopus_0000-scaled.jpg\"><img loading=\"lazy\" class=\" wp-image-12886\" src=\"https:\/\/wcftr.commarts.wisc.edu\/wp-content\/uploads\/2025\/08\/pressbook-wb-sh-the-octopus_0000-194x300.jpg\" alt=\"\" width=\"216\" height=\"334\" srcset=\"https:\/\/wcftr.commarts.wisc.edu\/wp-content\/uploads\/2025\/08\/pressbook-wb-sh-the-octopus_0000-194x300.jpg 194w, https:\/\/wcftr.commarts.wisc.edu\/wp-content\/uploads\/2025\/08\/pressbook-wb-sh-the-octopus_0000-662x1024.jpg 662w, https:\/\/wcftr.commarts.wisc.edu\/wp-content\/uploads\/2025\/08\/pressbook-wb-sh-the-octopus_0000-768x1188.jpg 768w, https:\/\/wcftr.commarts.wisc.edu\/wp-content\/uploads\/2025\/08\/pressbook-wb-sh-the-octopus_0000-993x1536.jpg 993w, https:\/\/wcftr.commarts.wisc.edu\/wp-content\/uploads\/2025\/08\/pressbook-wb-sh-the-octopus_0000-1324x2048.jpg 1324w, https:\/\/wcftr.commarts.wisc.edu\/wp-content\/uploads\/2025\/08\/pressbook-wb-sh-the-octopus_0000-1200x1856.jpg 1200w, https:\/\/wcftr.commarts.wisc.edu\/wp-content\/uploads\/2025\/08\/pressbook-wb-sh-the-octopus_0000-scaled.jpg 1655w\" sizes=\"(max-width: 216px) 100vw, 216px\" \/><\/a><figcaption class=\"wp-caption-text\">Pressbook for Sh! The Octopus, Warner Bros. 1937<\/figcaption><\/figure>\n<p>One of the WCFTR\u2019s most prominent collections contains thousands of <a href=\"https:\/\/mediahist.org\/collections\/pressbooks\/\">pressbooks<\/a>, many of which have been digitized and are freely available online. The <a href=\"https:\/\/mediahist.org\/\">Media History Digital Library<\/a> has pressbooks from over 20 major and minor studios, ranging from throughout the 20th century. We are grateful to Matthew and Natalie Bernstein, Kelly and Kimberly Kahl, and Stephen P. Jarchow for their support of this multi-year digitization initiative.<\/p>\n<p>One of our current projects is examining how these publicity materials were used in newspapers across the United States. This project really has it all: extensive digital collections, traditional archival research, and sophisticated computational workflows\u2014including a custom machine learning model to identify and separate individual articles, images, and other components from each pressbook page.<\/p>\n<p>Wait, machine learning? Really?<\/p>\n<p>Yes! But that\u2019s not to say that the WCFTR team hasn\u2019t still been incredibly busy. The use of computational methods doesn\u2019t erase the work of archival research. It just changes what it looks like.<\/p>\n<figure id=\"attachment_12887\" class=\"wp-caption aligncenter\" style=\"max-width: 863px;\" aria-label=\"Data annotation involved manually reviewing thousands of images and tagging individual page components. Screenshot from Areyana Proctor\"><a href=\"https:\/\/wcftr.commarts.wisc.edu\/wp-content\/uploads\/2025\/08\/Screenshot-2025-07-15-at-10.00.52\u202fAM.png\"><img loading=\"lazy\" class=\" wp-image-12887\" src=\"https:\/\/wcftr.commarts.wisc.edu\/wp-content\/uploads\/2025\/08\/Screenshot-2025-07-15-at-10.00.52\u202fAM-300x132.png\" alt=\"\" width=\"863\" height=\"380\" srcset=\"https:\/\/wcftr.commarts.wisc.edu\/wp-content\/uploads\/2025\/08\/Screenshot-2025-07-15-at-10.00.52\u202fAM-300x132.png 300w, https:\/\/wcftr.commarts.wisc.edu\/wp-content\/uploads\/2025\/08\/Screenshot-2025-07-15-at-10.00.52\u202fAM-1024x451.png 1024w, https:\/\/wcftr.commarts.wisc.edu\/wp-content\/uploads\/2025\/08\/Screenshot-2025-07-15-at-10.00.52\u202fAM-768x338.png 768w, https:\/\/wcftr.commarts.wisc.edu\/wp-content\/uploads\/2025\/08\/Screenshot-2025-07-15-at-10.00.52\u202fAM-1536x676.png 1536w, https:\/\/wcftr.commarts.wisc.edu\/wp-content\/uploads\/2025\/08\/Screenshot-2025-07-15-at-10.00.52\u202fAM-2048x902.png 2048w, https:\/\/wcftr.commarts.wisc.edu\/wp-content\/uploads\/2025\/08\/Screenshot-2025-07-15-at-10.00.52\u202fAM-1200x528.png 1200w\" sizes=\"(max-width: 863px) 100vw, 863px\" \/><\/a><figcaption class=\"wp-caption-text\">Data annotation involved manually reviewing thousands of images and tagging individual page components. Screenshot from Areyana Proctor<\/figcaption><\/figure>\n<p><a href=\"https:\/\/luminosoa.org\/chapters\/e\/10.1525\/luminos.212.t\">Earlier work<\/a> found that this smaller-scale segmentation would be necessary to make meaningful comparisons; comparing entire pressbook pages against entire newspaper pages, though less computationally demanding, yielded limited results. Separating the individual articles from each pressbook will enable more detailed comparisons. Our plan is to use AI to detect separate elements from each pressbook page, which would then be compared against articles from newspapers in the Library of Congress\u2019 <a href=\"https:\/\/www.loc.gov\/collections\/chronicling-america\/about-this-collection\/\">Chronicling America<\/a> collection. Before we can apply machine learning to the pressbooks, though, we had to \u201cteach\u201d the computer how to make sense of a pressbook. For several weeks, a team of incredible Comm Arts graduate students (<a href=\"https:\/\/commarts.wisc.edu\/staff\/whittemore-lore\/\">Lore FitzWhittemore<\/a>, <a href=\"https:\/\/commarts.wisc.edu\/staff\/proctor-areyana\/\">Areyana Proctor<\/a>, and <a href=\"https:\/\/commarts.wisc.edu\/staff\/riley-olivia\/\">Olivia Riley<\/a>) has been carefully reviewing scanned pressbooks and annotating thousands of pages to train a computer vision model.<\/p>\n<p>On paper, this data annotation process is straightforward. The team put together a sample of pressbooks\u2013nearly 4,000 individual pages! Lore, Areyana, and Olivia then reviewed each page and used an <a href=\"https:\/\/roboflow.com\/\">online tool<\/a> to draw boundaries for each object on the page while also categorizing each object (e.g. as an article, image, headline, etc.).<\/p>\n<p>In practice, the data annotation is an incredibly time-consuming and demanding task. Even working at a steady pace, the total process took nearly two months to complete. And while the work is repetitive and tedious, the annotators all recognized how simple decisions \u2013 such as whether text like \u201cTurn to the next page for an exciting layout\u2026\u201d should be categorized as an caption or a headline \u2013 could have important implications for how the later computer vision model would work. As Lore puts it, \u201cdata annotation is not a passive process; the data annotator is constantly making microdecisions about how content should be understood and classified.\u201d<\/p>\n<blockquote><p>\u201cIt really was surprising how infinite a set of micro-decisions appeared as we dug into the work.\u201d<\/p><\/blockquote>\n<p>Working with a team of UW-Madison graduate students with working familiarity with the pressbooks was advantageous for ensuring that all three annotators were assessing the pressbook scans similarly. \u201cGetting us all in the same room\u2014material or virtual\u2014was the best way to make sure we all labeled this diverse set of objects consistently,\u201d said Olivia. \u201cIt really was surprising how infinite a set of micro-decisions appeared as we dug into the work.\u201d<\/p>\n<figure id=\"attachment_12890\" class=\"wp-caption alignright\" style=\"max-width: 229px;\" aria-label=\"An initial test of the computer vision model was able to accurately identify elements on a page, as shown in this page from the pressbook for Disney\u2019s The Absent-Minded Professor (Buena Vista Distribution, 1961).\"><a href=\"https:\/\/wcftr.commarts.wisc.edu\/wp-content\/uploads\/2025\/08\/25_result.jpg\"><img loading=\"lazy\" class=\"size-medium wp-image-12890\" src=\"https:\/\/wcftr.commarts.wisc.edu\/wp-content\/uploads\/2025\/08\/25_result-229x300.jpg\" alt=\"Scanned page with several colored boxes identifying various page components\" width=\"229\" height=\"300\" srcset=\"https:\/\/wcftr.commarts.wisc.edu\/wp-content\/uploads\/2025\/08\/25_result-229x300.jpg 229w, https:\/\/wcftr.commarts.wisc.edu\/wp-content\/uploads\/2025\/08\/25_result-782x1024.jpg 782w, https:\/\/wcftr.commarts.wisc.edu\/wp-content\/uploads\/2025\/08\/25_result-768x1006.jpg 768w, https:\/\/wcftr.commarts.wisc.edu\/wp-content\/uploads\/2025\/08\/25_result.jpg 823w\" sizes=\"(max-width: 229px) 100vw, 229px\" \/><\/a><figcaption class=\"wp-caption-text\">An initial test of the computer vision model was able to accurately identify elements on a page, as shown in this page from the pressbook for Disney\u2019s The Absent-Minded Professor (Buena Vista Distribution, 1961).<\/figcaption><\/figure>\n<p>After the data annotation process was completed, it was <a href=\"https:\/\/wcftr.commarts.wisc.edu\/index.php\/staff\/hansen-samuel\/\">Sam Hansen<\/a>&#8216;s turn to work with the data. They wrote code to process the annotated page scans using the <a href=\"https:\/\/en.wikipedia.org\/wiki\/You_Only_Look_Once\">YOLO (You Only Look Once)<\/a> object detection algorithm. The YOLO algorithm uses a neural network to efficiently \u201clook\u201d at an image and detect separate objects. With adequate training data, these models can accurately classify these objects as well. For the pressbooks project, we hope to use this approach so that a computer can \u201cread\u201d the pressbook page and automatically detect the separate articles and elements from each page. By using the Center<a href=\"https:\/\/chtc.cs.wisc.edu\/\"> for High Throughput Computing<\/a>\u2019s resources, we will be able to process thousands of pressbooks in a matter of hours\u2014rather than the months and years that it might take to conduct such work entirely by hand. This work is currently ongoing, and we look forward to sharing more in the coming weeks.<\/p>\n<p>High-throughput computing infrastructure and sophisticated machine vision workflows enable us to ask questions about Hollywood pressbooks, including who used them, how, and whether the publicity text, promotional photos, and ads from the pressbooks permeated American newspapers and magazines as intended. Computational methods can intake and parse copious amounts of information, identify large-scale patterns that would be difficult if not impossible to recognize without such a significant scale of data.<\/p>\n<p>But simultaneously, as our annotators became acutely aware of, it is nevertheless an incredibly zoomed-out view of the source material. As Olivia put it, \u201cThe super-fast, surface-level work of annotation relies on a certain amount of gut instinct which is often precisely the hegemonic habit critical scholars of identity seek to combat in their analysis.\u201d As the project continues to unfold, we will continue to explore new technologies such as machine learning and computer vision while carefully assessing how to incorporate them alongside traditional methods of archival research.<\/p>\n<p>Keep an eye on the WCFTR blog as we share more project updates. Please be sure to also follow along via our social media or subscribe to our newsletter to keep up with all of the WCFTR\u2019s news and events!<\/p>\n<ul>\n<li aria-level=\"1\">Newsletter: <a href=\"https:\/\/wcftr.commarts.wisc.edu\/index.php\/newsletter\/\">Subscribe<\/a><\/li>\n<li aria-level=\"1\">Mastodon (Fediverse):<a href=\"https:\/\/hcommons.social\/@wcftr\"> @wcftr@hcommons.social<\/a><\/li>\n<li aria-level=\"1\">Facebook:<a href=\"https:\/\/www.facebook.com\/wicenterforfilmandtheaterresearch\/\"> facebook.com\/wicenterforfilmandtheaterresearch<\/a><\/li>\n<li aria-level=\"1\">Instagram:<a href=\"https:\/\/www.instagram.com\/wcftr_archive\/\"> @wcftr_archive<\/a><\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>Ben Pettis From the 1910s through the 1980s, Hollywood studios promoted their movies through the creation and dissemination of pressbooks\u2014bound pamphlets containing publicity materials, advertising layouts, accessories for sale, and other promotional tactics. These promotional &hellip;<\/p>\n","protected":false},"author":4,"featured_media":12890,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[4],"tags":[],"acf":[],"_links":{"self":[{"href":"https:\/\/wcftr.commarts.wisc.edu\/index.php\/wp-json\/wp\/v2\/posts\/12885"}],"collection":[{"href":"https:\/\/wcftr.commarts.wisc.edu\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/wcftr.commarts.wisc.edu\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/wcftr.commarts.wisc.edu\/index.php\/wp-json\/wp\/v2\/users\/4"}],"replies":[{"embeddable":true,"href":"https:\/\/wcftr.commarts.wisc.edu\/index.php\/wp-json\/wp\/v2\/comments?post=12885"}],"version-history":[{"count":5,"href":"https:\/\/wcftr.commarts.wisc.edu\/index.php\/wp-json\/wp\/v2\/posts\/12885\/revisions"}],"predecessor-version":[{"id":12893,"href":"https:\/\/wcftr.commarts.wisc.edu\/index.php\/wp-json\/wp\/v2\/posts\/12885\/revisions\/12893"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/wcftr.commarts.wisc.edu\/index.php\/wp-json\/wp\/v2\/media\/12890"}],"wp:attachment":[{"href":"https:\/\/wcftr.commarts.wisc.edu\/index.php\/wp-json\/wp\/v2\/media?parent=12885"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/wcftr.commarts.wisc.edu\/index.php\/wp-json\/wp\/v2\/categories?post=12885"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/wcftr.commarts.wisc.edu\/index.php\/wp-json\/wp\/v2\/tags?post=12885"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}