U.S. flag

An official website of the United States government

NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Journal Article Tag Suite Conference (JATS-Con) Proceedings 2015 [Internet]. Bethesda (MD): National Center for Biotechnology Information (US); 2015.

Cover of Journal Article Tag Suite Conference (JATS-Con) Proceedings 2015

Journal Article Tag Suite Conference (JATS-Con) Proceedings 2015 [Internet].

Show details

Using BITS for Non-standard Content

.

Author Information and Affiliations

Many objects published in a codex format in print do not easily translate as "books" online. After adopting BITS as its core model for traditional book content, Silverchair has created an initiative to leverage the same tag library and expand its implementation to load other book-like objects, non-standard content objects, and born-digital content onto a shared platform.

This paper is a case study of how BITS was adapted for our system to encode a library of defined "types" of non-standard content.

What we call a book in the print world can mean very different things in a digital format. Setting aside the usual variations in structure, many objects we call "books" have such fundamentally different needs in terms of design and interface that they resist standardization.

Fig. 1. American Academy of Pediatrics' Red Book in codex format.

Fig. 1American Academy of Pediatrics' Red Book in codex format.

At the risk of venturing into highly theoretical terrain, the task of defining what constitutes a digital “book” is an essential first step in preparing a digital platform to represent such objects. Indeed, it also frames a system for identifying outliers to that definition. In the print world, books are generally described as printed paper bound together along a spine. They have chapters that may be organized into larger, thematic, Parts, and they tend to come with some front and back matter, such as forewords, tables of contents and indices. The importance of representing that paratextual matter in an online version depends upon the context of the resource’s intended use or audience: some publishers are invested in archival representations of their content and preserving every single aspect of the printed resource in its online version. Others are focused on delivering the body of the content to the user, with limited amounts of historical data translated online. For example, all of the books currently hosted on Silverchair’s SCM6 platform have their Tables of Contents dynamically generated based on the structure of the XML. But older editions of those same books, if they were they to be saved for posterity in a resource like the Digital Public Library of America, might be more strictly described according to the original print resource. No matter how close the translation from print to digital, however, I must note that a digital book is not and cannot be the same thing as a “codex.” There will always be aspects of print resources that are not easily remediated for consumption on a screen. For example, note the grey indicators on the pages of the American Academy of Pediatrics’ Red Book, which serve to help the reader target the “Section” of the book they need (Fig. 1). These are extremely helpful in the context of a busy doctor’s office, but useless as special markers encoded in an XML file. And then there are other reference books, such as exam prep guides, encyclopedias, and point-of-care references defy the organizational logic I outlined above. Finally, we cannot forget the emerging genre of born-digital content: it was never intended to be bound by covers, but often retains the structures and formatting of its print ancestors. For our clients, it was becoming important that their online host be flexible enough to accommodate all of these variations in content. For our own purposes, it was essential that we do so on a shared platform, in the most efficient and standardized method possible.

For Silverchair, the difference between a “book” and non-standard content objects boils down to differences in metadata and general user experience. A standard “TOC” page consists of the familiar thematic and hierarchical organization that accompanies a (generally) sequential reading experience. Although there may be variations in how that information is structured (Parts / Chapters / Sections), a standardized page template can be used to represent that material by default. Likewise, a standardized content display page can be employed to enable reading the body of the document. Across our platform, Silverchair hosts over 1200 books, many of which fall neatly into a similar template to the one shown in Figure 2.

Fig. 2. Standard Table of Contents.

Fig. 2Standard Table of Contents

Fig. 3. Standard Content Display.

Fig. 3Standard Content Display

Conversely, non-standard content has much more substantial browsing information, and requires a more modular organizational strategy. There were three main criteria considered:

  • Does the object have different browse needs or does it require a sophisticated landing page?
  • Are there requirements around how the material is displayed, shown in search, or reported in analytics that differ from existing book templates?
  • Is the content actually ancillary textual material to a book or journal resource?

Take, for example, a glossary. Glossaries can be bound as individual resources of their own, or they can be included as back matter for larger book resources. In one scenario, we had a client in need of a mostly-book oriented site, but also wished to include a glossary feature that would serve as a reference for all the books on the site, but maintained separately from the actual book XML. In the course of fulfilling these requirements for one client, we sought a solution that could be applicable to many others on our platform.

Taking into account some variation in requirements and design elements across clients and across sites, a glossary still has a predictable structure for which a template can be built. Looking at the roadmap of other projects for the year, our analysts began to amass a library of other non-standard content types like the glossary, and we worked to outline the particular requirements of each of those ‘types’– from XML to display. We referred to them all as “generic buckets of sections,” or, in acronym form, GBOS. GBOS objects, such as glossary entries, could then be assembled into GBOS Containers, which might refer to an original print resource, a product, or some other more general organizing principal. This generic container is the equivalent of a book or journal in our database.

Fig. 4. Visualizing Content Hierarchy.

Fig. 4Visualizing Content Hierarchy

It is at this point that we can descend from the lofty heights of theory and site architecture to consider textual encoding, and how the non-standard, or GBOS, approach dove-tails with XML requirements. Without dwelling too much on the history of content at Silverchair, it is worth noting that the company began as a provider of highly customized individual sites. Content-wise, this meant that we dealt with numerous proprietary DTDs that were managed by our clients, or created internally based on the specific needs of each site. With the launch of the SCM6 platform (originally developed for journal –oriented content) the utility of implementing a common standard across clients became clear, and we adopted JATS (the “blue” tag set, for Journal Publishing) as the common language for journals and proceedings. When we expanded SCM6 to include book content in 2013, BITS was a logical choice for a common tag library: because it built upon existing JATS elements, it allowed us to expand SCM6’s capabilities incrementally, rather than starting all over with something new.

Even as the first BITS-encoded book went live on SCM6 in January 2014, it was becoming apparent that our client’s needs were quickly evolving beyond journal and book content. The GBOS model offered us a library of ‘types’ to focus on, but no matter how unique the needs of each GBOS type might appear to be, we did not want an encoding solution that was highly customized or proprietary. It was essential that we continue on our trajectory of standardizing our content approach in such a way that it would still be straightforward for vendors and clients without deviating too far from accepted libraries and standards.

We formed an initiative to investigate the best way forward, with the goal of determining how best to use BITS for GBOS content. It was clear that we did not want to craft anything brand-new, and folding in some other tag library could be prohibitively expensive. After all, many of the resources that seemed to fall into the category of non-standard content already had lives in print as books. If there were a way to used BITS – which was already quite useful for describing those common features – then progress could be made rapidly.

In the end, two approaches were evaluated: 1) extend BITS to include more generic elements that could be shared across non-standard objects, or 2) work within the current BITS library and change the meanings of some existing elements based on a content-type declaration in the XML.

Fig. 5. Option 1: Extending BITS.

Fig. 5Option 1: Extending BITS

At first, extending BITS to have a generic root element, <document> was attractive. As a solution, it was intellectually tidy, and allowed our tools an immediate method for differentiating content types in the XML: <journal> for JATS, <book> for BITS, and <document> for GBOS. However, such an ambitious initiative was resource-intensive, required significant updates to our content management and loading tools, and necessitated a deviation from the BITS as a tag library. The preliminary documentation required to get our clients and vendors up to speed on our own extensions was one challenge – constantly staying in sync as BITS evolved on a parallel track seemed a potential hazard.

The second option involved adapting existing BITS elements and structures to fit GBOS as well as books. This strategy would require no new elements to be added – only some new attributes and new rules governing those elements – and we would be able to stay in touch with BITS as a tag library. However, because we do import XML into a shared relational database, there was some risk identified in having the same element mean different things depending upon content type.

Fig. 6. Option 2: Adapting BITS.

Fig. 6Option 2: Adapting BITS

In the end, we chose to implement the second option. Figure 6 shows the metadata elements for a point-of-care topic card in the American Academy of Pediatrics’ Point-of-Care Quick Reference. The <book-part-wrapper> element serves as the root element for this GBOS object, with a now-required “content-type” attribute specifying this as GBOS not book content. The <book-meta> element is streamlined to include only two required elements: a <book-id> and <book-title>. These values determine the GBOS Container (or organizing principle) for this object. It is the first <book-part> element in this file that contains metadata about the GBOS object itself (note the required book-part-type=”object” attribute), including Publisher ID, DOI (if there were one) subject group information, title, publication history, et cetera. Whereas a glossary entry might be succinct, (requiring one <sec> element at most) a point-of-care topic card such as the one shown here could be much more complex, requiring further "chapters" or other main divisions within the GBOS. Another required attribute was set aside for <book-part> in GBOS objects: content-type="main-division" flagged important headers in the content, so that they could potentially serve a unique function in display. From this level down in the XML tagging, the elements are all exactly the same between books and GBOS.

The next two figures illustrate how this XML is displayed on the site. The “table of contents” is actually an A-Z browse feature powered by client-submitted subjects. These are organized by two main categories: Topics and Symptoms, with the list of links in the main column being semantic equivalents (or, in the XML, <subj-group> values) rather than the titles of the GBOS objects themselves. Note also that this particular GBOS Container has Front Matter associated with it, which indicates just how closely books and GBOS can align structurally. If you click through the “Agoraphobia” link on the landing page, the system takes the user to the “Phobias and Anxieties” topic which looks quite book-like. The column on the left offers jump-links to the “main divisions” and section titles within the document for quick navigation.

Fig. 7. The Point-of-Care Quick Reference Landing Page.

Fig. 7The Point-of-Care Quick Reference Landing Page

Fig. 8. A Quick Reference Content Page.

Fig. 8A Quick Reference Content Page

I’d also like to show you another object with the same GBOS type (point-of-care topic card) from a different site, to give you a sense of the flexibility of the encoding. This topic card from Wolters Kluwer’s 5 Minute Consult is structured almost identically to the previous slide on the XML level – but many of the components have been re-arranged.

Fig. 9. Variations on a Point-of-Care Topic Content Page.

Fig. 9Variations on a Point-of-Care Topic Content Page

There are other GBOS types, however, that are not so book-like. To revisit the example I gave at the beginning of my talk, a glossary can be structured similarly in the XML, but act very differently on the site. In this image, the user has clicked a link to a glossary entry within the text of a book, which opens a modal displaying the contents of that object. Elsewhere on the site, a user could also access this information from the GBOS Container, or landing page for the full glossary. Each of these entries is an individual GBOS object (and, coincidentally, an individual XML file), but they are assembled in a simple A-Z browse, driven by title. By now the XML structure will look familiar: GBOS Container declarations in the <book-meta> element, glossary entry specifics (including <related-object> links to relevant book chapters) in the <book-part book-part-type="object"> element, and the short body of the entry itself in the <body> element.

Fig. 10. Glossary Link from Book.

Fig. 10Glossary Link from Book

Fig. 11. Glossary Landing Page.

Fig. 11Glossary Landing Page

On a practical level, when we receive XML for these two, very different non-standard content types (the point-of-care topic and the glossary entry) we are able to leverage most of the same XSLTs we use for book loading, and the same basic tool framework. The only aspect of GBOS content loading that is not as dynamic as that of books is the original creation of the GBOS Container. These are not made on the fly, but are mapped out in the beginning of a project with content analysts, designers and developers working in partnership with the client to identify the non-standard content needs of a given site. GBOS is relatively new to our system, however, and as our library of GBOS types and display widgets grows, we anticipate streamlining and expediting this process. As of March 2015 we have a dozen GBOS Containers on the SCM6 platform, and we are learning more about how non-standard content interacts with our platform every day.

The digital realm is where non-standard content can really shine – interactions that were clumsy in print can be engineered online in ways that enhance a user’s experience. Alternately, many aspects of the print world are deeply embedded in our conception – and encoding – of that content for online use. Throughout the process of researching GBOS, and as we sought to continue using BITS to represent those documents in our system, again and again I returned to the concept of the skeuomorph – a design feature or structure that was useful at one point in an object’s history, but is retained long past its necessity. Think of a digital calendar designed to look like one you might hang on a wall, a smart phone that reproduces the rotary experience of dialing, or faux-wood grain on a station wagon. Using the <book> and <book-part-wrapper> elements to wrap content that is not a book is definitely a holdover from the print world. But for our case, I believe that we have managed to make the skeuomorph work to our advantage. However inaccurate the adaptation may seem at first glance, using BITS for non-standard content allowed Silverchair more functionality and flexibility in content remediation and display, in a fraction of the time it would have taken to conceive a completely new model. It is a solution that fits this interesting intermediary period in scholarly publishing, in which we inhabit both the print and digital worlds, and the two are inextricably connected.

Copyright 2015 by Dana Wheeles.

The copyright holder grants the U.S. National Library of Medicine permission to archive and post a copy of this paper on the Journal Article Tag Suite Conference proceedings website.

This work is licensed under a Creative Commons Attribution-Noncommercial 3.0 United States License

Bookshelf ID: NBK279829

Views

  • PubReader
  • Print View
  • Cite this Page

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...