A List Apart: The Full Feed

Articles for people who make web sites.

http://feeds.feedburner.com/alistapart/main?format=xml

created: 2 ago 2014 21:14:04 UTC ~ updated: 21 abr 2019 00:26:41 UTC ~ rssv2 ~ TTL 15 min.

Just as we need to understand our content before we can recategorize it, we need to understand the system before we try to rebuild it.

Enter the structural audit: a review of the site focused solely on its menus, links, flows, and hierarchies. I know you thought we were done with audits back in Chapter 2, but hear me out! Structural audits have an important and singular purpose: to help us build a new sitemap.

This isn’t about recreating the intended sitemap—no, this is about experiencing the site the way users experience it. This audit is meant to track and record the structure of the site as it really works.

Setting up the template

First, we’re gonna need another spreadsheet. (Look, it is not my fault that spreadsheets are the perfect system for recording audit data. I don’t make the rules.)

Because this involves building a spreadsheet from scratch, I keep a “template” at the top of my audit files—rows that I can copy and paste into each new audit (Fig 4.1). It’s a color-coded outline key that helps me track my page hierarchy and my place in the auditing process. When auditing thousands of pages, it’s easy to get dizzyingly lost, particularly when coming back into the sheet after a break; the key helps me stay oriented, no matter how deep the rabbit hole.

Fig 4.1: I use a color-coded outline key to record page hierarchy as I move through the audit. Wait, how many circles did Dante write about?

Color-coding

Color is the easiest, quickest way to convey page depth at a glance. The repetition of black text, white cells, and gray lines can have a numbing effect—too many rows of sameness, and your eyes glaze over. My coloring may result in a spreadsheet that looks like a twee box of macarons, but at least I know, instantly, where I am.

The exact colors don’t really matter, but I find that the familiar mental model of a rainbow helps with recognition—the cooler the row color, the deeper into the site I know I must be.

The nested rainbow of pages is great when you’re auditing neatly nested pages—but most websites color outside the lines (pun extremely intended) with their structure. I leave my orderly rainbow behind to capture duplicate pages, circular links, external navigation, and other inconsistencies like:

  • On-page navigation. A bright text color denotes pages that are accessible via links within page content—not through the navigation. These pages are critical to site structure but are easily overlooked. Not every page needs to be displayed in the navigation menus, of course—news articles are a perfect example—but sometimes this indicates publishing errors.
  • External links. These are navigation links that go to pages outside the domain. They might be social media pages, or even sites held by the same company—but if the domain isn’t the one I’m auditing, I don’t need to follow it. I do need to note its existence in my spreadsheet, so I color the text as the red flag that it is. (As a general rule, I steer clients away from placing external links in navigation, in order to maintain a consistent experience. If there’s a need to send users offsite, I’ll suggest using a contextual, on-page link.)
  • Files. This mostly refers to PDFs, but can include Word files, slide decks, or anything else that requires downloading. As with external links, I want to capture anything that might disrupt the in-site browsing experience. (My audits usually filter out PDFs, but for organizations that overuse them, I’ll audit them separately to show how much “website” content is locked inside.)
  • Unknown hierarchy. Every once in a while, there’s a page that doesn’t seem to belong anywhere—maybe it’s missing from the menu, while its URL suggests it belongs in one section and its navigation scheme suggests another. These pages need to be discussed with their owners to determine whether the content needs to be considered in the new site.
  • Crosslinks. These are navigation links for pages that canonically live in a different section of the site—in other words, they’re duplicates. This often happens in footer navigation, which may repeat the main navigation or surface links to deeper-but-important pages (like a Contact page or a privacy policy). I don’t want to record the same information about the page twice, but I do need to know where the crosslink is, so I can track different paths to the content. I color these cells gray so they don’t draw my attention.

Note that coloring every row (and indenting, as you’ll see in a moment) can be a tedious process—unless you rely on Excel’s formatting brush. That tool applies all the right styles in just two quick clicks.

Outlines and page IDs

Color-coding is half of my template; the other half is the outline, which is how I keep track of the structure itself. (No big deal, just the entire point of the spreadsheet.)

Every page in the site gets assigned an ID. You are assigning this number; it doesn’t correspond to anything but your own perception of the navigation. This number does three things for you:

  1. It associates pages with their place in the site hierarchy. Decimals indicate levels, so the page ID can be decoded as the page’s place in the system.
  2. It gives each page a unique identifier, so you can easily refer to a particular page—saying “2.4.1” is much clearer than “you know that one page in the fourth product category?”
  3. You can keep using the ID in other contexts, like your sitemap. Then, later, when your team decides to wireframe pages 1.1.1 and 7.0, you’ll all be working from the same understanding.

Let me be completely honest: things might get goofy sometimes with the decimal outline. There will come a day when you’ll find yourself casually typing out “1.2.1.2.1.1.1,” and at that moment, a fellow auditor somewhere in the universe will ring a tiny gong for you.

In addition to the IDs, I indent each level, which reinforces both the numbers and the colors. Each level down—each digit in the ID, each change in color—gets one indentation.

I identify top-level pages with a single number: 1.0, 2.0, 3.0, etc. The next page level in the first section would be 1.1, 1.2, 1.3, and so on. I mark the homepage as 0.0, which is mildly controversial—the homepage is technically a level above—but, look: I’ve got a lot of numbers to write, and I don’t need those numbers to tell me they’re under the homepage, so this is my system. Feel free to use the numbering system that work best for you.

Criteria and columns

So we’ve got some secret codes for tracking hierarchy and depth, but what about other structural criteria? What are our spreadsheet columns (Fig 4.2)? In addition to a column for Page ID, here’s what I cover:

  • URL. I don’t consistently fill out this column, because I already collected this data back in my automated audit. I include it every twenty entries or so (and on crosslinks or pages with unknown hierarchy) as another way of tracking progress, and as a direct link into the site itself.
  • Menu label/link. I include this column only if I notice a lot of mismatches between links, labels, and page names. Perfect agreement isn’t required; but frequent, significant differences between the language that leads to a page and the language on the page itself may indicate inconsistencies in editorial approach or backend structures.
  • Name/headline. Think of this as “what does the page owner call it?” It may be the H1, or an H2; it may match the link that brought you here, or the page title in the browser, or it may not.
  • Page title. This is for the name of the page in the metadata. Again, I don’t use this in every audit—particularly if the site uses the same long, branded metadata title for every single page—but frequent mismatches can be useful to track.
  • Section. While the template can indicate your level, it can’t tell you which area of the site you’re in—unless you write it down. (This may differ from the section data you applied to your automated audit, taken from the URL structure; here, you’re noting the section where the page appears.)
  • Notes. Finally, I keep a column to note specific challenges, and to track patterns I’m seeing across multiple pages—things like “Different template, missing subnav” or “Only visible from previous page.” My only caution here is that if you’re planning to share this audit with another person, make sure your notes are—ahem—professional. Unless you enjoy anxiously combing through hundreds of entries to revise comments like “Wow haha nope” (not that I would know anything about that).
Fig 4.2: A semi-complete structural audit. This view shows a lot of second- and third-level pages, as well as pages accessed through on-page navigation.

Depending on your project needs, there may be other columns, too. If, in addition to using this spreadsheet for your new sitemap, you want to use it in migration planning or template mapping, you may want columns for new URLs, or template types. 

You can get your own copy of my template as a downloadable Excel file. Feel free to tweak it to suit your style and needs; I know I always do. As long as your spreadsheet helps you understand the hierarchy and structure of your website, you’re good to go.

Gathering data

Setting up the template is one thing—actually filling it out is, admittedly, another. So how do we go from a shiny, new, naive spreadsheet to a complete, jaded, seen-some-stuff spreadsheet? I always liked Erin Kissane’s description of the process, from The Elements of Content Strategy:

Big inventories involve a lot of black coffee, a few late nights, and a playlist of questionable but cheering music prominently featuring the soundtrack of object-collecting video game Katamari Damacy. It takes quite a while to exhaustively inventory a large site, but it’s the only way to really understand what you have to work with.

We’re not talking about the same kind of exhaustive inventory she was describing (though I am recommending Katamari music). But even our less intensive approach is going to require your butt in a seat, your eyes on a screen, and a certain amount of patience and focus. You’re about to walk, with your fingers, through most of a website.

Start on the homepage. (We know that not all users start there, but we’ve got to have some kind of order to this process or we’ll never get through it.) Explore the main navigation before moving on to secondary navigation structures. Move left to right, top to bottom (assuming that is your language direction) over each page, looking for the links. You want to record every page you can reasonably access on the site, noting navigational and structural considerations as you go.

My advice as you work:

  • Use two monitors. I struggle immensely without two screens in this process, which involves constantly switching between spreadsheet and browser in rapid, tennis-match-like succession. If you don’t have access to multiple monitors, find whatever way is easiest for you to quickly flip between applications.
  • Record what you see. I generally note all visible menu links at the same level, then exhaust one section at a time. Sometimes this means I have to adjust what I initially observed, or backtrack to pages I missed earlier. You might prefer to record all data across a level before going deeper, and that would work, too. Just be consistent to minimize missed links.
  • Be alert to inconsistencies. On-page links, external links, and crosslinks can tell you a lot about the structure of the site, but they’re easy to overlook. Missed on-page links mean missed content; missed crosslinks mean duplicate work. (Note: the further you get into the site, the more you’ll start seeing crosslinks, given all the pages you’ve already recorded.)
  • Stick to what’s structurally relevant. A single file that’s not part of a larger pattern of file use is not going to change your understanding of the structure. Neither is recording every single blog post, quarterly newsletter, or news story in the archive. For content that’s dynamic, repeatable, and plentiful, I use an x in the page ID to denote more of the same. For example, a news archive with a page ID of 2.8 might show just one entry beneath it as 2.8.x; I don’t need to record every page up to 2.8.791 to understand that there are 791 articles on the site (assuming I noted that fact in an earlier content review).
  • Save. Save frequently. I cannot even begin to speak of the unfathomable heartbreak that is Microsoft Excel burning an unsaved audit to the ground.  

Knowing which links to follow, which to record, and how best to untangle structural confusion—that improves with time and experience. Performing structural audits will not only teach you about your current site, but will help you develop fluency in systems thinking—a boon when it comes time to document the new site.

<![CDATA[ The FAQ as Advice Column ]]> . 17 abr 2019 13:51:16.A List Apart: The Full Feed.

Dear A List Apart,

I have a problem that may be harming my content strategy career. In my current position, no one likes FAQs … except for me. The question-and-answer format is satisfying and efficient. Whenever I mention adding an FAQ section to a website, though, I receive numerous suggestions that I should wean myself off FAQs one question at a time or go cold turkey.

Perhaps that is overdoing it, but sometimes I feel like defending FAQs by pen, sword, or Dothraki horde. Should I keep my addiction to myself, or should I embrace this oddity and champion a format I believe in? Signed, FAQ Fanatic Dear FAQ Fanatic, You’re not wrong: FAQs are as out of vogue as a fat footer. You’re also not alone. As an aspiring advice columnist, I’ve been wondering why the format is so unpopular even though it remains on many websites. In fact, in a recent A List Apart piece, Richard Rabil, Jr. listed the FAQ as one of many legitimate patterns of organization that you can use when writing. To address your query properly, I propose some soul-searching through a series of FAQs about FAQs. Let’s tackle the toughest question first.

Can I trust FAQs?

If you are a content strategist or information architect, chances are good you’ve been burned. Lisa Wright nails every single reason why the FAQ can be bad news. It is a poor excuse for a proper content strategy that would generate “purposeful information” across a website. For example, if you see an FAQ, you know right away that the website is duplicating content, which often leads to discrepancies. FAQs may also lead to a bigger design issue: accordion abuse. The typical FAQ design involves expand-and-collapse features. In theory, this makes it easier for users to scan to find what they need. But in a content migration or consolidation, I’ve seen desperate freelancers or webmasters shove entire web pages under a question just to make an old page fit a new design. If a user is coming to an FAQ for a quick-hit answer, as is often the case, imagine how horrifying it can be to expand a question and see an answer the length of David Foster Wallace’s Infinite Jest tucked underneath.
Example of long text displaying in an expanded accordion within an FAQ.
How many times have you opened an FAQ accordion and been overwhelmed by the novella beneath?

Can the FAQ and I still be friends?

Ah, you must be a content author. If you’re a content author on a budget, under a deadline, or both, the FAQ will become your bestie—whether you planned on it or not. In my experience, teams bust out the FAQs not because they are lazy but because they find them to be a reliable way to structure content. When I worked at an agency, a few of my projects were microsites that weren’t so “micro.” Some clients wanted a small site on a minimal CMS within an even more minimal timeline, but the content kept ballooning, leaving no time for true content modeling. The only way to build the content on time was to use the FAQ as a content model or spine. Like you, I now work with people who avoid FAQs. Since my current agency specializes in site redesigns for higher-ed clients, it’s expected that the information has more structure to begin with—and it usually does. Plus, my particular agency gives information architects and content strategists more time than the norm. From the get-go, our sitemaps, wireframes, and patterns serve as a stable foundation for the content. Sometimes, though, even the most stable foundations won’t prevent the appearance of an FAQ. If a content team doesn’t get enough time to inventory their content, they’ll probably encounter numerous FAQs. They’ll need to figure out a way to get that content over to the new site somehow … which means those FAQs aren’t going anywhere.

So, do I have to quit FAQs cold turkey?

No. The FAQ structure has held up for so long because it is a brilliant pattern. Think the Socratic method. Or the catechism. Or Usenet. Or “F.A.Q.s about F.A.Q.s.” Or—you guessed it—“Dear Prudence,” “Dear Sugar,” or any other popular advice column. Users will always have questions, and they will always want answers. What makes FAQs troublesome is incorrect or lazy use. Lisa Wright has already shared what not to do, but perhaps the best way to start an FAQ is to choose each question with great care. For example, advice columnists spend plenty of time selecting what questions they will answer each week. In general, listeners want to hear the advice columnist flexing their mental muscles to resolve the most complicated situations. If you’re using FAQs correctly, start with the best content possible, and align that content with what both content authors and content consumers want. Content authors can rely on the Q&A structure to deliver quality content on a regular basis while reassuring content consumers that they are receiving the best answers.

FAQ-appropriate content

What is the best content for an FAQ? Thus far, I’ve discussed choosing your questions wisely and keeping your answers short (and, yes, shorter than the answers of a typical advice columnist). Since I’ve worked in higher ed, I’ve had the chance to speak with people who support admissions and enrollment, and they spend most of their time answering frequently asked questions from students and parents. In one stakeholder interview session with the staff of a community college, it became clear that the questions the staff handled fell into two camps: the questions people ask over and over and the head-scratching edge cases. For example, questions about transcripts or financial aid awards are timeless. As for the edge cases, a full-time student might ask if he or she can get a discount if they want to take a yoga class through a community education program. For the common content, FAQs shouldn’t repeat what’s already on the website, but they are called “frequently asked questions” for a reason. As long as you provide the content only twice—once in the FAQ and once on a relevant content page—you’re fine. Your authors shouldn’t have to manage anything past that. With an edge case, the question might be so specific that the answer wouldn’t have a clear home on any page—in my example, the yoga class question would straddle full-time registration and community education. Therefore, even though the off-the-wall question isn’t “frequently asked,” it can still live in the FAQ, and if the edge cases pile up (as they can in the world of higher ed), then you could shift these questions to a blog, which could provide a source of fresh content. I wouldn’t have known about the full-time student who wants to take a community ed class if it hadn’t emerged during the stakeholder interview. For that reason, you want to talk to customers or students and ask them what questions they’ve had in the past. If you don’t have time for that, read over user research to find out what users typically ask. Or use Google Search Console to look at the search queries that lead to your site, and figure out how well your site answers those questions. You may find that many of the queries leading to your site are written as questions. In fact, according to a study by Moz and Jumpshot, questions make up approximately 8% of search queries, so this research may help you populate your FAQ. And if you’re looking for inspiration, you could try a tool like Answer the Public (free to access UK data; a monthly payment required for other regions). Type in a keyword like “college applications,” and the tool will serve up a range of questions people have asked in their search queries. The final way to articulate what works for an FAQ is to describe what doesn’t work. If your answer begins to spin into a narrative instead of a straightforward answer, you might need to add a separate page of content to your sitemap. And if your answer starts to sound too much like marketing copy, then it belongs elsewhere on the site. FAQs exist for those who are further along in the sales process or those who are already sold. Continuing to sell to that audience in an FAQ will only annoy them. When you know which questions you’re going to cover, you can start to refine the language for your main audiences: authors and consumers.

FAQs for content authors: your in-house reference desk

A clever content author can use an FAQ as a core research document. Armed with a CMS that has a decent back-end search, a content author will have a much easier time keeping content aligned and fact-checked if the FAQ itself is treated as a trustworthy source of information. For that reason, what Wright calls “documentation-by-FAQ” might not be the worst situation in the world, depending on how much content you’re working with. If you actually have someone tending the FAQ like a garden, your content will always change, but you can be sure of its accuracy. To convince your more skeptical peers of the value of maintaining your FAQ page or database, tell them that the FAQ is a content opportunity that may save them time. Think of how delightful it is when you get your “Dear Prudence” newsletter or a podcast notification for the latest Han and Matt Know It All. Whenever you add a new question or update a new fact, spread the word among your users. These updates can help feed the social-media-marketing content beast while proving that you want to keep users informed and engaged.

FAQs for content consumers: give them power

Speaking of keeping users informed and engaged, a good FAQ can help the audience even more than it helps content authors. The best way to ensure that the FAQ works for the audience is to give them more control over the questions and answers they see. Most FAQs, including those on higher-ed sites, chunk up the FAQs by content category, tuck answers into accordions, and stop right there. More effective FAQs, though, provide other forms of interaction. Some allow users to refine information through filters, searches, and tags so the user isn’t stuck opening and closing accordion windows. For example, the website for Columbia Undergraduate Admissions has a fairly standard format, but the FAQ answers are tagged, so users have another option for navigating through the information. Other higher-ed services, like the syndicated financial aid web channel FATV, answer common FAQs with videos. Changing up the format and providing text, video, and audio options help prospective students feel like they are receiving more personal attention. Beyond higher-ed FAQs, Amazon encourages users to vote FAQs up and down, Reddit-style, which can lead to fun interactions and enables the users to rate the quality—or humor—of the information they receive.
Amazon’s customer questions and answers feature, which includes the ability to search and vote on answers.
Amazon’s more interactive FAQ model, in which users can vote on answers and search for questions.
You can also remind skeptics that FAQs aren’t always what they expect. The FAQ format has experienced a renaissance in the form of our newly beloved voice gadgets. Some content creators are even using their existing FAQs as the foundation for their Alexa skills. For example, georgia.gov created an Alexa skill by transforming its “Popular Topics” FAQ database, working with Acquia to structure the Q&A format so Alexa can answer common questions from Georgia residents. When describing the project, user experience designer Rachel Hart writes:

If you say “No” to an FAQ question, Alexa skips to the next FAQ, and the next, until you say something sounds helpful or Alexa runs out of questions.

When the user chooses what they want to hear, they need to know exactly what they’re committing to. We need to make sure that our [labeling]—for both titles and FAQs—is clear.

Read that again, dear FAQ fanatic. The complications for Alexa skills arise in the labeling, not in the FAQ itself. In fact, it’s the FAQ content that makes Alexa skills like the one for georgia.gov possible.

So I can make peace with my quirky love of the FAQ?

Indeed. Let your FAQ flag fly. FAQs—or dialogues that convey information—will always exist in some way, shape, or form. As for the accordion, though, the jury is out.

<![CDATA[ Writing for Designers ]]> . 17 abr 2019 13:51:16.A List Apart: The Full Feed.

Shit. The writing. We forgot about the writing. The thing, the design thing…it needs words! Oh man, so many words. I thought somebody…wasn’t the client going to...shit. We’ve got to get the writing done. We’ve got to get the writing done! How are we going to get the writing done?! Don’t worry, friend. I’m here. We’ll get the writing done. The first step is to accept a hard truth: someone has to do the writing. Some teams seem to build their whole process around not writing. They fill wireframes with lorem ipsum (that fake Latin text that confuses stakeholders) and write CTA goes here on their buttons. I’ve been handed my share of comps where anything remotely word-based was represented by a bunch of squiggly lines. You know that comic about how to draw an owl? Step one: draw some circles. Step two: draw the rest of the fucking owl. That’s you with your squiggly lines. Rude. Everything left unwritten is a mystery box of incomplete design. These mysteries beget other mysteries, and pretty soon you’ve got dozens of screens of things that kinda-sorta-maybe make sense but none of them can really be final because you never wrote the words. Choosing words and writing what appears in an interface forces us to name components, articulate choices, and explain things to the user. It’s part of design. We know this, don’t we? We knew it at the beginning of the design project, and yet here we are. Why did we wait?

Writing is part of design

Words are one of the most powerful design materials available. They convey deeply complex meanings in a compact space. They load fast. They’re easy to manipulate and easy to transmit. And the best part is, you don’t have to invent any of them! You just have to use them. Sometimes words get written off (see what I did there) as mere “details” in our designs. True details can wait until the end of your design process. Words, however, are deeply integrated throughout the user’s experience of your design. Look at your favorite app, site, or interface. Take all the words away and what do you have? Not much! Even if the particular thing you’re designing seems light on words, take a broader view and you’ll find words hiding everywhere:
  • error messages and recovery flows
  • confirmation screens
  • user-visible metadata like page titles and search engine descriptions
  • transactional emails
  • in-app user assistance
  • support documentation
  • changelogs
  • feature descriptions and marketing copy
These are as much a part of the design as the layout, graphics, and animations. Designs depend on words. Even if your design were simple, beautiful, and intuitive, writing can take it one step further. Writing can reinforce how you want users to think about your design. Writing can explain the approach or philosophy that underpins your design. Writing can guide users through complex processes. Writing can even help cover for the quirks and compromises in our designs—hopefully not our first resort, but valuable nonetheless. Sometimes the writing isn't done because we’re trying to solve everything with “pure design.” Supposed UX thought leaders throw around baloney like “Good design doesn’t need explanation” and “If you have to use words, you’ve failed.” Come on. I hope my pilot knows what all those switches in the cockpit do, but I also hope they’re labeled, just in case. To keep things simple in this book, we’ll be talking about three general categories of writing you might have to do to support your design work:
  • Interface copy: Often referred to as UI copy or microcopy, this is the text that’s deeply integrated within the interface, like labels for form fields, text on buttons, navigation labels on a website, error messages, and similar. It’s often made of single words or short phrases. If the interface would “break” or be extremely hard to use if you removed this text, we’ll call it interface copy.
  • Product copy: Writing that’s integral to the function of the site/product/app/experience, but not necessarily a direct part of the interface—the body of an onboarding email, for instance, or a description of updates to an application in a changelog. This is content focused on helping/supporting the reader.
  • Marketing copy: Longer-form writing that is primarily filling a sales or promotional sort of role. This is content focused on persuading the reader.
Depending on your product and organization, you might have many more buckets of content, or you may find the lines especially blurry even between these three. That’s okay! These buckets will just make things easier while we talk about writing in this book. Cool? Cool. (Oh, and “copy” is just a way to distinguish words written by a designer from the more generic idea of “text,” which could be just about anything in your system, including user-generated input.)

Writing is always hard

If you know someone who makes writing look easy, you’re right. They make it look easy. You can’t plan well for a difficult journey if you assume it’s going to be an easy journey. Accepting that writing is hard is an important step toward making it easier and getting it done. Writing is hard because it’s personal. Even if you’re writing about something you don’t feel strongly about, or even something you disagree with, it’s still your writing. The words you write carry a little echo of you. To get the writing done, you’re going to have to be a little vulnerable. Maybe a lot. Writing is even hard for writers—and since most people don’t realize that, they make it even harder on writers. They don’t give writers enough time to write. They don’t provide enough information to work with. They say things that minimize the difficulty of the task and the skill required to complete it. “You’re so creative! This should be easy, right? Shoot me something back before lunch.” Ugh. Unfortunately, there’s no special potion you can take to help you get the writing done, and even the most beautifully retro hipster typewriter still needs you to operate the keys.

Workflow gets the writing done

So if magic won’t help you get the writing done, what will? In design contexts, a useful way to think about writing is workflow. Workflow is a big-picture idea that accommodates all kinds of different processes, techniques, and tools. If following a recipe is a process, making dinner is a workflow. A dinner-making workflow has obvious phases—plan the meal, prep the ingredients, mix and cook things, finish and serve the meal. The specific steps and outcomes vary depending on the meal, but the basic workflow remains the same. This is also a useful way to think about design writing. No matter what you’re cooking up—no matter how custom the request and how many dietary restrictions your stakeholders might have—you’ll follow the same basic workflow each time you do the writing:
  1. Prepare (to write)
  2. Compose (the words)
  3. Edit (what you wrote)
  4. Finish (the damn writing)
Planning your workflow means choosing the tools, techniques, people, and processes that will be part of each of these four phases. Until this framework becomes old hat, I recommend explicitly planning your writing workflow. Planning is how you avoid getting stuck. You might not immediately know every single tool, step, and person you’ll need to get the writing done. But knowing even a few things, and giving yourself a basic map to follow to get the writing done, will help you learn what’s missing. Planning your workflow doesn’t need to be a long process—or even something you share with other people. You can create a formal, structured worksheet to plan it out (Fig 0.1), you could sketch it out on a whiteboard or in a notebook (Fig 0.2), or simply make some notes at the top of a new document. The important thing is to think about how you’re going to get the writing done before you start writing.
An example of a structured worksheet with the assignment and writer details at the top, and space to add details for preparation, composition, editing, and finishing including tools, steps, processes, and more.
Fig 0.1: A structured worksheet can help you plan your writing workflow before you start, and serve as an anchor in the storm throughout the project. If you go this route, I recommend customizing it to suit the particulars of your organization.
[caption id="attachment_1011806" align="alignnone" width="960"]A simpler example of a simpler workflow with four quadrants for preparation, composition, editing, and finishing handwritten on a piece of paper. Fig 0.2: Planning your workflow on paper doesn’t have to take long, and it’s a nice break from staring at screens. Plus, you can cross things off as you go! (Always satisfying.)[/caption]

You can write

Mr. Hays, my high school choir teacher, was a great recruiter. When he’d ask people to try out for choir, they’d protest with some version of “Oh, no, I can’t sing.” Nonsense, he’d say: “If you can talk, you can sing. It’s all the same muscles!” And, more often than not, he’d pull that student right over to a piano and demonstrate to them that they could, in fact, sing. In case you’re skeptical, worried, or unsure about whether or not you can handle this, here’s my pitch for writing: writing is just thinking plus typing. You can think. You can type (or otherwise get text into a computer). So yes, you can write. We’re going to get into all kinds of methods about how to compose and refine text throughout this book. But at the end of the day, writing is just thinking plus typing. Have some thoughts in your head, then write them down. Do this over and over until the writing is done. Every other tip, trick, method, and process is just an improvement or distillation of this basic approach. And more good news: writing is more like design than you might think. Common design activities like framing the problem, identifying constraints, and exploring solutions are part of writing, too. Many of the methodologies one might use in UX work can be part of a writing workflow: stakeholder interviews, user research, content auditing, ideation workshops, critiques, and more. Writing is always hard, yes. But it gets easier. Good? Good. We're making progress already. It’s time to Prepare.

<![CDATA[ Designing for Cognitive Differences ]]> . 17 abr 2019 13:51:16.A List Apart: The Full Feed.

Inclusive design is designing to be inclusive of as many users as possible, considering all aspects of diversity in users. With increased understanding, compassionate discussions around how to design for disabilities are becoming increasingly common in the web industry. But even with this growth, there are misconceptions: accessibility is still frequently thought of as “design for blind people” when it’s so much more than that. Users with limited motor functions and those who are hearing-impaired require separate considerations, for instance. But accessibility and inclusiveness also mean considering more than just physical symptoms. What about users with cognitive differences like inattention, anxiety, and depression? Many affective and anxiety disorders qualify as disabilities, with inattention causing challenges on the web as well. Whatever the cause, inattention, anxiety, and depression can have a major impact on internet usage for users dealing with them. The unique issues presented by cognitive differences and the design considerations they require can be tricky to understand for people who have never dealt with them. Through this article, I’ll share some methods to accommodate these users’ unique needs.

Inattention

Inattention is often regarded as a joke in our industry (and just about everywhere else), but it can be a serious impediment for people who struggle with it. While Attention Deficit Hyperactivity Disorder (ADHD) is a common culprit, affecting 4.4% of adults, it’s not the only source of inattention. Bipolar disorder (estimated at 2.8% of adults), major depression (6.7% of adults), and anxiety disorders (19.1% of adults) can cause occasional inattention. More common conditions such as stress or sleep deprivation can cause inattention in people who don’t experience it as regularly. I’m quite familiar with inattention because I have bipolar disorder, which frequently causes inattention in manic phases. The term inattention is a bit of a misnomer, because it implies that those suffering from it have trouble paying attention to anything. It’s more accurate to say that we have to pay attention to everything—we have trouble tuning things out, and the more things that are competing for our attention, the harder it is for us to focus on anything. Designers who are able to focus normally rarely see the things that cause problems for users with inattention, but these things are everywhere, and they can make the web much harder for us to use. Some design considerations we can make to be more inclusive of users with inattention include adding an option to mute notifications at certain times, which is a more obvious solution while others are less so, such as giving users the ability to turn off design features that are distracting them.

Drowning in the ocean of motion

I was recently reading an article on search engine optimization, and the author saw fit to incorporate animated GIFs throughout the article. The GIFs, looped infinitely and placed prominently, didn’t add anything of substance. Worse, as I was already struggling through a manic episode, the GIFs actually prevented me from reading the article—I had to open Chrome DevTools and hide all of the GIFs to get through the content. Motion is everywhere. This simple fact of the modern internet makes designers smile, while users with inattention issues cringe. Motion distracts people with inattention even when neurotypical people, or those not characterized by neurological patterns, are fine. Most users struggling with inattention won’t use Chrome DevTools to make your site usable for them, they’ll simply leave and probably end up on a competitor’s site. I cringe anytime I see an article with pointless animation, and often just click the back button. Even though I’m sure the designer or author saw the motion as beneficial, it can distract users who struggle with inattention from what they came to your site to do. Motion isn’t always bad. Sometimes you need to use subtle motion to draw attention to something, such as when a user has to click a button before changes are applied. User-initiated motions, such as hover and click effects, usually don’t distract. Your website or app doesn’t need to be a static, motionless wasteland. But if you’re going to distract your users with motion that they don’t initiate, it had better accomplish something. Unnecessary motion, like the animated GIFs I mentioned above, are nothing but a barrier for these users. If the motion is actually accomplishing something, you have to ask if what you’re drawing attention to is worth sacrificing other content on the page in return. Designers and content developers tend to use motion—autoplayed videos, animated GIFs, and CSS animations—simply to be cute or expressive. Inclusive design would use motion only to improve clarity so as not to exclude users struggling with inattention. If motion would significantly improve the experience for neurotypical users, but hurt it for users with inattention, you can give users the option to turn off motion, allowing them to choose which would be best for them.

Designing forms for inattention

Forms add layers of interactivity and are often at the center of what we want users to do on our websites or apps; and yet, forms are often hard to use for users who struggle with inattention. Poor design reduces clarity and increases errors; some interactions take so long that they become extremely difficult for those of us with inattention. Rather than slapping on a quick fix or letting ease of implementation define the user experience, we need to fix design issues to be more inclusive of these users. In my twelve years in the industry, there’s a phrase I hear way too often: “Why can’t the users just follow the directions?” This doesn’t show a problem with the user, but with the site or app. The problem isn’t with the directions—it’s with the design. If users are making mistakes on a form, our first instinct is to add instructions before it. There are two problems here:
  • Most people will not read the instructions. Stats show that of the $13.8 billion of technical gadgets that were returned to the store by consumers in 20o7, only 5% were due to faulty products. The rest were because users did not understand how to use the products. Users hate reading instructions.
  • Your form is so complicated that it requires instructions. A better solution would be to fix the design of the form itself so you’re not attempting to solve a design problem with content.
If most users are making mistakes on a form, users with inattention will struggle even more. When this happens, figure out exactly where the errors are occurring, and fix the design of the form to target that error. For instance, if you’re receiving the wrong data for a field, it’s a sign that form labels are unclear; if you have inline-only labels, adding regular labels outside of the fields will do more than adding an explanatory note. Taking steps like this will make the process less confusing, reducing the need to have long instructions. If an explanation is needed, add it adjacent to the form field where users are having trouble, not at the top of the form where users will likely ignore it. The best option is to simplify the form so that explanations are not needed. Inattention also makes sustained concentration considerably more difficult, and the longer your form or process is, the harder it will be for users with inattention to complete in one sitting. If it is more than two steps or pages, add the functionality to save progress and come back later to finish it. Please, please, please don’t have your multi-page form time out quickly—if they come back from a break and find that your form has lost their progress, they probably won’t be starting over.

Anxiety

Anxiety is a fairly common problem for adults. Among adults, 19.1% have an anxiety disorder, but anxiety can also result from other things, like taking certain medications, withdrawal from drugs or alcohol, prolonged stress, or chronic pain. As common as anxiety is, you’d think we’d be better at designing for it than we are. Anxiety has been described as knowing that you turned off the stove, but having to turn your car around to check anyway. Users with anxiety fear that they will do something wrong when interacting with your site or app. To counteract this, provide reassurance that what they’re doing is the right thing, and make the experience forgiving if they do the wrong thing. Reassuring them reduces stress and helps to retain anxious users who are more likely to leave in the middle of a difficult process.

Let users think like users

Nobody goes to your site not knowing why they’re there. If users go to your site to solve a problem, they need to know where to find the solution. The problem may be common to all users, but users with anxiety will struggle more when they can’t find the answers they need or when the way forward is unclear. One of the biggest culprits of unclear user flow is basing the user experience on your company’s understanding of the problem. Companies have their own internal terminology and organizational structures to address these problems internally. Users likely won’t understand any of this and shouldn’t require a glossary of industry terms or internal structures in order to use your website or app. Define clear paths for users to solve common problems, and design them to address the user’s concerns; don’t give a list of the types of data you accept or organize things according to how your company receives them. If you have multiple types of users using your site (for instance, parents applying for school as well as school administrators), define clear user paths for each. Remember that many of your users will not always start on the homepage of your site. If the user paths are only clear on the homepage, then they’re not clear. Provide clear wayfinding. Even once anxious users are on the path to their solution, they need to know they’re heading in the right direction. On each step of a process, state not only what step they’re on, but what the end of that path is. Remember, anxious users may have a need to keep checking to make sure they’re in the right spot—don’t make them click the back button to do that.

There’s no anxiety like form anxiety

With a good chunk of anxiety being caused by the fear that you’re doing something wrong, forms are a huge stressor for anxious users. A lack of clarity on forms really harms usability and accessibility for users with anxiety, sometimes causing them to stop the process altogether. Improving clarity and providing reassurance can go a long way in reducing anxiety in these users. Every form and action should be clearly labeled with a headline that plainly states what the form does. I occasionally struggle with anxiety, and there are times when I have to glance up at the headline to double-check that I’m filling out the right form. Similarly, submit buttons should clearly state what happens when users click them. Submit buttons should have copy like “send message,” “complete purchase,” “continue to the next step,” or “sign up for our newsletter.” One of the worst things you can do with a submit button is have it just say “submit.” There’s a trend for designers to get overly clever with form labels: inline-only labels, labels that only appear when their field has focus, or even labels that start inside their field and then animate elsewhere. I’ve never encountered a situation where I was glad these overly clever solutions were in place. A label exists not only to tell users what information to put in the field, but also to confirm to users who have already filled out the form that their information is in the right place. Inline-only form labels make this impossible and cause undue stress to anxious users. Labels that cover up auto-filled text (common with labels that start inside form fields and then move somewhere else) cause similar problems. Form labels are not a medium for creative expression; they’re a tool for users to know how to use a form. This basic functionality should not be hindered. If you’re asking the user for any personal information, privacy is a huge concern, especially for users suffering from social anxiety who dread getting unexpected phone calls. Include a prominent link to your privacy policy on the form itself so it’s easy to find. Also, if it’s not immediately obvious why a piece of information is needed in your form, like a phone number, add a bit of help text to explain it. (For example, clicking a “Why do we need this?” link displays a “We need your phone number to call you in case of a mix-up with your order” tooltip.) If you don’t have a good reason for asking for a piece of personal information or can’t clearly explain why you need it, get rid of the field. And your job is not done once the user has submitted the form. Confirmation messages can be either a huge relief or a huge source of stress for anxious users. I can’t tell you how many times I’ve submitted a form online and the confirmation message just says, “Your data was submitted.” For users with anxiety, this can start the stress cycle all over again. What data? Submitted where? What if I messed something up? Confirmation messages should state:
  • what action was taken (“Thank you for signing up for our newsletter!”);
  • what data was posted (“Your email address, brandon.gregory@myemail.com, has been added to our distribution list.”);
  • and what the user should do if they made a mistake (“If you want to stop receiving our newsletter at any time, you can unsubscribe on your user profile.”).
Adding this little bit of reassurance can really help users struggling with anxiety to avoid undue stress.

Depression

Depression is not something we think about often in design, but it impacts how a lot of people use the web. About 6.7% of adults have major depression, and 2.8% of adults have bipolar disorder, which involves severe depression at times. Additionally, temporary or even long-term depression can be caused by traumatic events, drug use, or certain medications. The book Design for Real Life, by Sara Wachter-Boettcher and Eric Meyer (excerpt here), reminds us that we can’t just design for happy users. Some of our users will be in crisis: having their order mishandled, desperately needing information that’s not readily available, or just having an exceptionally bad day. For users with depression, any ordinary day has the potential to be an exceptionally bad day or crisis, and minor annoyances in user experience can become overwhelming.

Keep it easy

Depression is thought of as a psychological condition, but it also has physical side effects. For instance, depression actually impairs contrast perception—the world really does look gray for users dealing with depression. Fatigue and physical pain are common and can be hard to deal with. Everything is harder with depression. If your site or app is hard to use, many depressed users will simply not use it. A lot of the shortcuts we take in the web industry add up to insurmountable challenges for these users. A great example of this is unnecessary user registrations. Registering for a user account is a lengthy (and, for depressed users, exhausting) process. If it’s not absolutely required for a user task, you’re punishing depressed users (and probably everyone else too). If your site has a checkout process, make sure users can check out as a guest. Forcing a user to register for an account just to look at the content (I’m looking at you, Pinterest) is a great way to make sure depressed users will never look at your content. Long sign-up processes, unforgiving forms, and loss of data can quickly make depressed users give up altogether. Minor annoyances such as these can slide through the design-and-build process for our sites and apps, and impact depressed users much more than neurotypical ones. If content requires significant effort to locate, it will also be ignored by depressed users. Large blocks of endless content, like wall-to-wall tiles, force users to sift through it to find what they’re looking for. Long videos without accompanying text (that is searchable) can similarly be a deterrent. Assuming that users are so in love with your content that they will read or view every bit of it is naïve and creates a significant barrier for depressed users (and can also hinder users with inattention).

Chat can be a lifesaver

I get severely depressed three to six months out of the year, and talking to people is one of the hardest things I have to do. The effort required to carry on an actual conversation is immense, and it prevents me from doing a lot of things that I would ordinarily be doing. Add to that stress the stress of a botched order or customer service fiasco, and I sometimes get so stressed out, I can’t make phone calls that I need to. In these situations, any place that lets me contact them via chat instead of a phone call gains my eternal gratitude. A great example of this is the National Suicide Prevention Hotline (because if anyone knows how to design for depressed users, it’s this group), who opened their online chat in 2013. By 2015, the chat lines were open 24 hours a day. Chat lines are unfortunately frequently clogged, partly due to the influx of users and partly due to a lack of funding, but the number of chat operators is growing each year. Chat lines attract a different demographic: while the phone line is roughly a 50-50 split between male and female, the chat line is 78–80% female (70% of the total were women under 25). The article linked to in the paragraph above revealed some other interesting stats. The National Suicide Prevention Hotline is not the only crisis center that has caught onto this. The National Domestic Violence Hotline launched chat in 2013 and now receives 1,000–1,500 chats a month. The Rape, Abuse & Incest National Network (RAINN) has implemented a chat on their site, and they’ve found that chat users typically go into more depth about their traumatic issues than callers. And, like the National Suicide Prevention Hotline, both of these organizations are looking to scale up their chat services due to how popular they are. Businesses that regularly work with users in crisis have realized that chat is a vital tool for their users and are rapidly expanding their chat services to accommodate. Your business may not exclusively deal with crisis users, but with depression affecting a significant portion of the population, any day can be a crisis day for these users. If you have a phone line but not a chat, consider adding one. If you have a chat line and it’s constantly clogged, consider expanding the service.

Disability takes many forms, as should inclusive solutions

Far from being just about impaired vision and wheelchairs, disability takes many forms, and accessibility and inclusive design need to take just as many. In our industry, compassionate discussion around physical disabilities has been a huge benefit, and cognitive differences need to be part of the conversation too. Removing unnecessary distractions, reassuring users that they’re doing the right thing, and keeping things easy for users who are struggling are things we can do to accommodate these users and make them feel like you actually want them to use our sites and apps. As one of these users myself, I can say we would really appreciate your efforts. This can be just as important as including alt text for your images.

<![CDATA[ From URL to Interactive ]]> . 17 abr 2019 13:51:16.A List Apart: The Full Feed.

Imagine, if you will, that you’re behind the wheel of a gorgeous 1957 Chevy Bel Air convertible, making your way across the desert on a wide open highway. The sun is setting, so you’ve got the top down, naturally. The breeze caresses your cheek like a warm hand as your nose catches a faint whiff of … What was that? The car lurches and chokes before losing all power. You coast, ever more slowly, to a stop. There’s steam rising from the hood. Oh jeez. What the heck just happened? You reach down to pop the hood, and open the door. Getting out, you make your way around to the front of the car. As you release the latch and lift the bonnet, you get blasted in the face with even more steam. You hope it’s just water. Looking around, it’s clear the engine has overheated, but you have no idea what you’re looking at. Back home you’ve got a guy who’s amazing with these old engines, but you fell in love with the luxurious curves, the fins, the plush interior, the allure of the open road. A tumbleweed rolls by. In the distance a buzzard screeches.

What’s happening under the hood?

Years ago, my colleague Molly Holzschlag used a variant of this story to explain the importance of understanding our tools. When it comes to complex machines like cars, knowing how they work can really get you out of a jam when things go wrong. Fail to understand how they work and you could end up, well, buzzard food. At the time, Molly and I were trying to convince folks that learning HTML, CSS, and JavaScript was more important than learning Dreamweaver. Like many similar tools, Dreamweaver allowed you to focus on the look and feel of a website without needing to burden yourself with knowing how the HTML, CSS, and JavaScript it produced actually worked. This analogy still applies today, though perhaps more so to frameworks than WYSIWYG design tools. If you think about it, our whole industry depends on our faith in a handful of “black boxes” few of us fully understand: browsers. We hand over our HTML, CSS, JavaScript, images, etc., and then cross our fingers and hope they render the experience we have in our heads. But how do browsers do what they do? How do they take our users from a URL to a fully-rendered and interactive page? To get from URL to interactive, we’ve assembled a handful of incredibly knowledgeable authors to act as our guides. This journey will take place in four distinct legs, delivered over the course of a few weeks. Each will provide you with details that will help you do your job better.

Leg 1: Server to Client

Ali Alabbas understands the ins and outs of networking, and he kicks off this journey with a discussion of how our code gets to the browser in the first place. He discusses how server connections are made, caching, and how Service Workers factor into the request and response process. He also discusses the “origin model” and how to improve performance using HTTP2, Client Hints, and more. Understanding this aspect of how browsers work will undoubtedly help you make your pages download more quickly. Read the article

Leg 2: tags to DOM

In the second installment, Travis Leithead—a former editor of the W3C’s HTML spec—takes us through the process of parsing HTML. He covers how browsers create trees (like the DOM tree) and how those trees become element collections you can access via JavaScript. And speaking of JavaScript, he’ll even get into how the DOM responds to manipulation and to events, including touch and click. Armed with this information, you’ll be able to make smarter decisions about how and when you touch the DOM, how to reduce Time To Interactive (TTI), and how to eliminate unintended reflows. Read the article

Leg 3: braces to pixels

Greg Whitworth has spent much of his career in the weeds of browsers’ CSS mechanics, and he’s here to tell us how they do what they do. He explains how CSS is parsed, how values are computed, and how the cascade actually works. Then he dives into a discussion of layout, painting, and composition. He wraps things up with details concerning how hit testing and input are managed. Understanding how CSS works under the hood is critical to building resilient, performant, and beautiful websites. Read the article

Leg 4: var to JIT

One of JavaScript’s language designers, Kevin Smith, joins us for the final installment in this series to discuss how browsers compile and execute our JavaScript. For instance, what do browsers do when tearing down a page when users navigate away? How do they optimize the JavaScript we write to make it run even faster? He also tackles topics like writing code that works in multiple threads using workers. Understanding the inner processes browsers use to optimize and run your JavaScript can help you write code that is more efficient in terms of both performance and memory consumption. Read the article

Leg 5: Semantics to Screen Readers

Now that our page is generated, we need to understand how screen readers access it. Front-end developer Melanie Richards take us through a step-by-step journey. She covers a wide array of screen readers, which vary greatly and are highly customizable to users. Understanding the nuances of accessibility APIs, thorough testing approaches, and the wealth of resources available, site creators can create the most widely accessible content for the most users possible. Read the article

Let’s get going

I sincerely hope you’ll join us on this trip across the web and into the often foggy valley where browsers turn code into experience.

<![CDATA[ Server to Client ]]> . 17 abr 2019 13:51:16.A List Apart: The Full Feed.

Before anything can happen in a browser, it must first know where to go. There are multiple ways to get somewhere: entering a URL in the address bar, clicking (or tapping) on a link on a page or in another app, or clicking on a favorite. No matter the case, these all result in what’s called a navigation. A navigation is the very first step in any web interaction, as it kicks off a chain reaction of events that culminates in a web page being loaded.

Initiating the request

Once a URL has been provided to the browser to load, a few things happen under the hood.

Check for HSTS

First, the browser needs to determine if the URL specifies the HTTP (non-secure) scheme. If it’s an HTTP request, the browser needs to check if the domain is in the HSTS list (HTTP Strict Transport Security). This list is comprised of both a preloaded list and a list of previously visited sites that opted-in to using HSTS; both are stored in the browser. If the requested HTTP host is in the HSTS list, a request is made to the HTTPS version of the URL instead of HTTP. This is why you’ll notice that even if you try to type http://www.bing.com into a modern browser, it will send you to https://www.bing.com instead.

Check for service workers

Next, the browser needs to determine if a service worker is available to handle the request—this is especially important in the case that the user is offline and does not have a network connection. Service workers are a relatively new feature in browsers. They enable offline-capable web sites by allowing interception of network requests (including the top-level request) so the requests can be served from a script-controlled cache. A service worker can be registered when a page is visited, a process that records the service worker registration and URL mapping to a local database. Determining whether a service worker is installed is as simple as looking up the navigated URL in that database. If a service worker exists for that given URL, it will be allowed to handle responding to the request. In the case that the new Navigation Preload feature is available in the browser, and the site makes use of it, the browser will simultaneously also consult the network for the initial navigation request. This is beneficial because it allows the browser to not block on a potentially slower service worker start up. In a case where there is no service worker to handle the initial request (or if Navigation Preload is being used), the browser moves on to consulting the networking layer.

Check the network cache

The browser, via the network layer, will check if there’s a fresh response in its cache. This is usually defined by the Cache-Control header in the response, where setting a max-age can define how long the cached item is considered fresh, and setting no-store indicates whether it should be cached at all. And of course, if the browser finds nothing in its network cache, then a network request will be required. If there is a fresh response in the cache, it is returned back for the purposes of loading the page. If there’s a resource found but it’s not fresh, the browser may convert the request to a conditional revalidation request, which contains an If-Modified-Since or If-None-Match header that tells the server what version of the content the browser already has in its cache. The server can either tell the browser that its copy is still fresh by returning an HTTP 304 (Not Modified) with no body, or tell the browser that its copy is stale by returning an HTTP 200 (OK) response with the new version of the resource.

Check for connection

If there’s a previously established connection for the host and port for the request, the connection will be reused rather than establishing a new one. If not, the browser consults the networking layer to understand if it needs to do a DNS (Domain Name System) lookup. This would involve looking through the local DNS cache (which is stored on your device), and, depending on the freshness of that cache, remote name servers may also be consulted (they can be hosted by Internet Service Providers), which would eventually result in the correct IP address for the browser to connect to. In some cases, the browser may be able to predict which domains will be accessed, and connections to those domains can be primed. The page can hint to the browser which to prime connections to by using resource hints such as rel="preconnect” on the link tag. One such scenario where using resource hints is helpful is if a user is on a Bing search results page, and there is an expectation that the first few search results are the most likely to be visited. In this case, priming connections to those domains can help with not having to pay the cost of a DNS lookup and connection setup later on when those links are clicked.

Establish connection

The browser can now establish a connection with the server so the server knows it will be both sending to and receiving from the client. If we’re using TLS, we need to perform a TLS handshake to validate the certificate provided by the server.

Send the request to the server

The first request that will go over this connection is the top-level page request. Typically, this will be an HTML file that gets served from the server back to the client.

Handle the response

As the data is being streamed over to the client, the response data is analyzed. First, the browser checks the headers of the response. HTTP headers are name-value pairs that are sent as part of the HTTP response. If the headers of the response indicate a redirect (e.g., via the Location header), the browser starts the navigation process all over again and returns to the very first step of checking if an HSTS upgrade is required. If the server response is compressed or chunked, the browser will attempt to decompress and dechunk it. As the response is being read, the browser will also kick off writing it to the network cache in parallel. Next, the browser will attempt to understand the MIME type of the file being sent to the browser, so it can appropriately interpret how to load the file. For instance, an image file will just be loaded as an image, while HTML will be parsed and rendered. If the HTML parser is engaged, the contents of the response are scanned for URLs of likely resources to be downloaded so that the browser can start those downloads ahead of time before the page even begins to render. This will be covered in more detail by the next post in this series. By this point, the requested navigation URL has been entered into the browser history, which makes it available for navigation in the back and forward functionality of the browser. Here’s a flowchart that gives you an overview of what’s been discussed so far, with a bit more detail:
Flowchart showing the path from server to client
Click for full-size image
As you know, the page will continue to make requests, because there are many sub-resources on the page that are important for the overall experience, including images, JavaScript, and style sheets. Additionally, resources that are referenced within those sub-resources, such as background images (referenced in CSS) or other resources initiated by fetch(), import(), or AJAX calls. Without these, we would just have a plain page without much interactivity. As you’ve seen in both the explanation earlier and the flowchart, each resource that is requested is in part impacted by the browser’s caching policies.

Caching

As mentioned previously, the browser manages a network cache, which allows previously downloaded resources to be reused in many cases. This is particularly useful for largely unchanging resources, such as logos and JavaScript from frameworks. It’s important to take advantage of this cache as much as possible, because it can help reduce the number of outgoing network requests by instead reusing the locally available cached resource. In turn, this helps minimize the otherwise laborious and latent-prone operations that are required, improving the page load time. Of course, the network cache has a quota that impacts both how many items will be stored and how long they’ll be stored for. This doesn’t mean that the website doesn’t get a say in the matter. Cache-Control headers in responses control the browser’s caching logic. In some cases, it’s prudent to tell the browser to not cache an item at all (such as with Cache-Control: no-store), because it is expected to always be different. In other cases, it makes sense to have the browser cache the item indefinitely via Cache-Control: immutable, because the response for a given URL will never change. In such a case, it makes sense to use different URLs to point to different versions of the same resource rather than making a change to a resource of the same URL since the cached version would always be used. Of course, the network cache is not the only type of cache in the browser. There are programmatic caches that can be leveraged via JavaScript. Specifically, in the example of the service worker given above, an initial resource request for the top-level page can be intercepted by the service worker and can then use a cached item that was defined by the site by one of its programmatic caches. This is useful, because it gives the web site more control over what cached items to use when. These caches are origin-bound, which means that each domain has its own sandboxed set of caches it can control that are isolated from the caches of another domain.

Origin model

An origin is simply a tuple consisting of the scheme/protocol, the hostname, and the port. For instance, https://www.bing.com:443 has the HTTPS protocol, www.bing.com hostname, and 443 as the port. If any of those are different when compared to another origin, they are considered to be different origins. For instance, https://images.bing.com:443 and http://www.bing.com:80 are different origins. The origin is an important concept for the browser, because it defines how data is sandboxed and secured. In most cases, for security purposes, the browser enforces a same-origin policy, which means that one origin cannot access the data of another origin—both would need to be the same origin. Specifically, in the caching case presented earlier, neither https://images.bing.com:443 nor http://www.bing.com:80 can see the programmatic cache of the other. If bing.com wanted to load a JavaScript file that is from microsoft.com, it would be making a cross-origin resource request on which the browser would enforce the same-origin policy. To allow this behavior, microsoft.com would need to cooperate with bing.com by specifying CORS (Cross-Origin Resource Sharing) headers that enable bing.com to be able to load the JavaScript file from microsoft.com. It’s good practice to set the correct CORS headers so browsers can appropriately deal with the cross-origin resource requests.

Conclusion

Now that you know how we go from the server to the client—and all the details in between—stay tuned to learn about the next step in loading a web page: how we go from HTML tags to the DOM.

<![CDATA[ Tags to DOM ]]> . 17 abr 2019 13:51:16.A List Apart: The Full Feed.

In our previous segment, “Server to Client,” we saw how a URL is requested from a server and learned all about the many conditions and caches that help optimize delivery of the associated resource. Once the browser engine finally gets the resource, it needs to start turning it into a rendered web page. In this segment, we focus primarily on HTML resources, and how the tags of HTML are transformed into the building blocks for what will eventually be presented on screen. To use a construction metaphor, we’ve drafted the blueprints, acquired all the permits, and collected all the raw materials at the construction site; it’s time to start building!

Parsing

Once content gets from the server to the client through the networking system, its first stop is the HTML parser, which is composed of a few systems working together: encoding, pre-parsing, tokenization, and tree construction. The parser is the part of the construction project metaphor where we walk through all the raw materials: unpacking boxes; unbinding pallets, pipes, wiring, etc.; and pouring the foundation before handing off everything to the experts working on the framing, plumbing, electrical, etc.

Encoding

The payload of an HTTP response body can be anything from HTML text to image data. The first job of the parser is to figure out how to interpret the bits just received from the server. Assuming we’re processing an HTML document, the decoder must figure out how the text document was translated into bits in order to reverse the process.
Binary-to-text representation
Characters D O M
ASCII Values 68 79 77
Binary Values 01000100 01001111 01001101
Bits 8 8 8
(Remember that ultimately even text must be translated to binary in the computer. Encoding—in this case ASCII encoding—defines that a binary value such as “01000100” means the letter “D,” as shown in the figure above.) Many possible encodings exist for text—it’s the browser’s job to figure out how to properly decode the text. The server should provide hints via Content-Type headers, and the leading bits themselves can be analyzed (for a byte order mark, or BOM). If the encoding still cannot be determined, the browser can apply its best guess based on heuristics. Sometimes the only definitive answer comes from the (encoded) content itself in the form of a <meta> html tag. Worst case scenario, the browser makes an educated guess and then later finds a contradicting <meta> tag after parsing has started in earnest. In these rare cases, the parser must restart, throwing away the previously decoded content. Browsers sometimes have to deal with old web content (using legacy encodings), and a lot of these systems are in place to support that. When saving your HTML documents for the web today, the choice is clear: use UTF-8 encoding. Why? It nicely supports the full Unicode range of characters, has good compatibility with ASCII for single-byte characters common to languages like CSS, HTML, and JavaScript, and is likely to be the browser’s fallback default. You can tell when encoding goes wrong, because text won’t render properly (you will tend to get garbage characters or boxes where legible text is usually visible).

Pre-parsing/scanning

Once the encoding is known, the parser starts an initial pre-parsing step to scan the content with the goal of minimizing round-trip latency for additional resources. The pre-parser is not a full parser; for example, it doesn’t understand nesting levels or parent/child relationships in HTML. However, the pre-parser does recognize specific HTML tag names and attributes, as well as URLs. For example, if you have an <img src="https://somewhere.example.com/​images/​dog.png" alt=""> somewhere in your HTML content, the pre-parser will notice the src attribute, and queue a resource request for the dog picture via the networking system. The dog image is requested as quickly as possible, minimizing the time you need to wait for it to arrive from the network. The pre-parser may also notice certain explicit requests in the HTML such as preload and prefetch directives, and queue these up for processing as well.

Tokenization

Tokenization is the first half of parsing HTML. It involves turning the markup into individual tokens such as “begin tag,” “end tag,” “text run,” “comment,” and so forth, which are fed into the next state of the parser. The tokenizer is a state machine that transitions between the different states of the HTML language, such as “in tag open state” (<|video controls>), “in attribute name state” (<video con|trols>), and “after attribute name state” (<video controls|>), doing so iteratively as each character in the HTML markup text document is read. (In each of those example tags, the vertical pipe illustrates the tokenizer’s position.)
Diagram showing HTML tags being run through a tokenizer to create tokens
The HTML spec (see “12.2.5 Tokenization”) currently defines eighty separate states for the tokenizer. The tokenizer and parser are very adaptable: both can handle and convert any text content into an HTML document—even if code in the text is not valid HTML. Resiliency like this is one of the features that has made the web so approachable by developers of all skill levels. However, the drawback of the tokenizer and parser’s resilience is that you may not always get the results you expect, which can lead to some subtle programming bugs. (Checking your code in the HTML validator can help you avoid bugs like this.) For those who prefer a more black-and-white approach to markup language correctness, browsers have an alternate parsing mechanism built in that treats any failure as a catastrophic failure (meaning any failure will cause the content to not render). This parsing mode uses the rules of XML to process HTML, and can be enabled by sending the document to the browser with the “application/xhtml+xml” MIME type (or any XML-based MIME type that uses elements in the HTML namespace). Browsers may combine the pre-parser and tokenization steps together as an optimization.

Parsing/tree construction

The browser needs an internal (in-memory) representation of a web page, and, in the DOM standard, web standards define exactly what shape that representation should be. The parser’s responsibility is to take the tokens created by the tokenizer in the previous step, and create and insert the objects into the Document Object Model (DOM) in the appropriate way (specifically using the twenty-three separate states of its state machine; see “12.2.6.4 The rules for parsing tokens in HTML content”). The DOM is organized into a tree data structure, so this process is sometimes referred to as tree construction. (As an aside, Internet Explorer did not use a tree structure for much of its history.)
Diagram showing tokens being turned into the DOM
HTML parsing is complicated by the variety of error-handling cases that ensure that legacy HTML content on the web continues to have compatible structure in today’s modern browsers. For example, many HTML tags have implied end tags, meaning that if you don’t provide them, the browser auto-closes the matching tag for you. Consider, for instance, this HTML:
<p>sincerely<p>The authors</p>
The parser has a rule that will create an implied end tag for the paragraph, like so:
<p>sincerely</p><p>The authors</p>
This ensures the two paragraph objects in the resulting tree are siblings, as opposed to one paragraph object by ignoring the second open tag. HTML tables are perhaps the most complicated where the parser’s rules attempt to ensure that tables have the proper structure. Despite all the complicated parsing rules, once the DOM tree is created, all of the parsing rules that try to create a “correct” HTML structure are no longer enforced. Using JavaScript, a web page can rearrange the DOM tree in almost any way it likes, even if it doesn’t make sense! (For example, adding a table cell as the child of a <video> tag). The rendering system becomes responsible for figuring out how to deal with any weird inconsistencies like that. Another complicating factor in HTML parsing is that JavaScript can add more content to be parsed while the parser is in the middle of doing its job. <script> tags contain text that the parser must collect and then send to a scripting engine for evaluation. While the script engine parses and evaluates the script text, the parser waits. If the script evaluation includes invoking the document.write API, a second instance of the HTML parser must start running (reentrantly). To quickly revisit our construction metaphor, <script> and document.write require stopping all in-progress work to go back to the store to get some additional materials that we hadn’t realized we needed. While we’re away at the store, all progress on the construction is stalled. All of these complications make writing a compliant HTML parser a non-trivial undertaking.

Events

When the parser finishes, it announces its completion via an event called DOMContentLoaded. Events are the broadcast system built into the browser that JavaScript can listen and respond to. In our construction metaphor, events are the reports that various workers bring to the foreman when they encounter a problem or finish a task. Like DOMContentLoaded, there are a variety of events that signal significant state changes in the web page such as load (meaning parsing is done, and all the resources requested by the parser, like images, CSS, video, etc., have been downloaded) and unload (meaning the web page is about to be closed). Many events are specific to user input, such as the user touching the screen (pointerdown, pointerup, and others), using a mouse (mouseover, mousemove, and others), or typing on the keyboard (keydown, keyup, and keypress). The browser creates an event object in the DOM, packs it full of useful state information (such as the location of the touch on the screen, the key on the keyboard that was pressed, and so on), and “fires” that event. Any JavaScript code that happens to be listening for that event is then run and provided with the event object. The tree structure of the DOM makes it convenient to “filter” how frequently code responds to an event by allowing events to be listened for at any level in the tree (i.e.., at the root of the tree, in the leaves of the tree, or anywhere in between). The browser first determines where to fire the event in the tree (meaning which DOM object, such as a specific <input> control), and then calculates a route for the event starting from the root of the tree, then down each branch until it reaches the target (the <input> for example), and then back along the same path to the root. Each object along the route then has its event listeners triggered, so that listeners at the root of the tree will “see” more events than specific listeners at the leaves of the tree.
Diagram showing a route being calculated for an event, and then event listeners being called
Some events can also be canceled, which provides, for example, the ability to stop a form submission if the form isn’t filled out properly. (A submit event is fired from a <form> element, and a JavaScript listener can check the form and optionally cancel the event if fields are empty or invalid.)

DOM

The HTML language provides a rich feature set that extends far beyond the markup that the parser processes. The parser builds the structure of which elements contain other elements and what state those elements have initially (their attributes). The combination of the structure and state is enough to provide both a basic rendering and some interactivity (such as through built-in controls like <textarea>, <video>, <button>, etc.). But without the addition of CSS and JavaScript, the web would be very boring (and static). The DOM provides an additional layer of functionality both to the elements of HTML and to other objects that are not related to HTML at all. In the construction metaphor, the parser has assembled the final building—all the walls, doors, floors, and ceilings are installed, and the plumbing, electrical, gas, and such, are ready. You can open the doors and windows, and turn the lights on and off, but the structure is otherwise quite plain. CSS provides the interior details—color on the walls and baseboards, for example. (We’ll get to CSS in the next installment.) JavaScript enables access to the DOM—all the furniture and appliances inside, as well as the services outside the building, such as the mailbox, storage shed and tools, solar panels, water well, etc. We describe the “furniture” and outside “services” next.

Element interfaces

As the parser is constructing objects to put into the tree, it looks up the element’s name (and namespace) and finds a matching HTML interface to wrap around the object. Interfaces add features to basic HTML elements that are specific to their kind or type of element. Some generic features include:
  • access to HTML collections representing all or a subset of the element’s children;
  • the ability to search the element’s attributes, children, and parent elements;
  • and importantly, ways to create new elements (without using the parser), and attach them to (or detach them from) the tree.
For specific elements like <table>, the interface contains additional table-specific features for locating all the rows, columns, and cells within the table, as well as shortcuts for removing and adding rows and cells from and to the table. Likewise, <canvas> interfaces have features for drawing lines, shapes, text, and images. JavaScript is required to use these APIs—they are not available using HTML markup alone. Any DOM changes made to the tree via the APIs described above (such as the hierarchical position of an element in the tree, the element’s state by toggling an attribute name or value, or any of the API actions from an element’s interface) after parsing ends will trigger a chain-reaction of browser systems whose job is to analyze the change and update what you see on the screen as soon as possible. The tree maintains many optimizations for making these repeated updates fast and efficient, such as:
  • representing common element names and attributes via a number (using hash tables for fast identification);
  • collection caches that remember an element’s frequently-visited children (for fast child-element iteration);
  • and sub-tree change-tracking to minimize what parts of the whole tree get “dirty” (and will need to be re-validated).

Other APIs

The HTML elements and their interfaces in the DOM are the browser’s only mechanism for showing content on the screen. CSS can affect layout, but only for content that exists in HTML elements. Ultimately, if you want to see content on screen, it must be done through HTML interfaces that are part of the tree." (For those wondering about Scalable Vector Graphics (SVG) and MathML languages—those elements must also be added to the tree to be seen—I’ve skipped them for brevity.) We learned how the parser is one way of getting HTML from the server into the DOM tree, and how element interfaces in the DOM can be used to add, remove, and modify that tree after the fact. Yet, the browser’s programmable DOM is quite vast and not scoped to just HTML element interfaces. The scope of the browser’s DOM is comparable to the set of features that apps can use in any operating system. Things like (but not limited to):
  • access to storage systems (databases, key/value storage, network cache storage);
  • devices (geolocation, proximity and orientation sensors of various types, USB, MIDI, Bluetooth, Gamepads);
  • the network (HTTP exchanges, bidirectional server sockets, real-time media streaming);
  • graphics (2D and 3D graphics primitives, shaders, virtual and augmented reality);
  • and multithreading (shared and dedicated execution environments with rich message passing capabilities).
The capabilities exposed by the DOM continue to grow as new web standards are developed and implemented by major browser engines. Most of these “extra” APIs of the DOM are out of scope for this article, however.

Moving on from markup

In this segment, you’ve learned how parsing and tree construction create the foundation for the DOM: the stateful, in-memory representation of the HTML tags received from the network. With the DOM model in place, services such as the event model and element APIs enable web developers to change the DOM structure at any time. Each change begins a sequence of “re-building” work of which updating the DOM is only the first step. Going back to the construction analogy, the on-site raw materials have been formed into the structural framing of the building and built to the right dimensions with internal plumbing, electrical, and other services installed, but with no real sense yet of the building’s final look—its exterior and interior design. In the next installment, we’ll cover how the browser takes the DOM tree as input to a layout engine that incorporates CSS and transforms the tree into something you can finally see on the screen.

<![CDATA[ Braces to Pixels ]]> . 17 abr 2019 13:51:16.A List Apart: The Full Feed.

Doesn’t CSS seem like magic? Well, in this third installment of “URL to Interactive” we’ll look at the journey that your browser goes through to take your CSS from braces to pixels. As a bonus, we’ll also quickly touch on how end-user interaction affects this process. We have a lot of ground to cover, so grab a cup of <insert your favorite drink’s name here>, and let’s get going.

Parsing

Similar to what we learned about HTML in “Tags to DOM,” once CSS is downloaded by the browser, the CSS parser is spun up to handle any CSS that it encounters. This can be CSS within individual documents, inside of <style> tags, or inline within the style attribute of a DOM element. All the CSS is parsed out and tokenized in accordance with the syntax specification. At the end of this process, we have a data structure with all the selectors, properties, and properties’ respective values. For example, consider the following CSS:
.fancy-button {
	background: green;
	border: 3px solid red;
	font-size: 1em;
}
That will result in the following data structure for easy utilization later in the process:
Selector Property Value
.fancy-button background-color rgb(0,255,0)
.fancy-button border-width 3px
.fancy-button border-style solid
.fancy-button border-color rgb(255,0,0)
.fancy-button font-size 1em
One thing that is worth noting is that the browser exploded the shorthands of background and border into their longhand variants, as shorthands are primarily for developer ergonomics; the browser only deals with the longhands from here on. After this is done, the engine continues constructing the DOM tree, which Travis Leithead also covers in “Tags to DOM”; so go read that now if you haven’t already, I’ll wait.

Computation

Now that we have parsed out all styles within the readily available content, it’s time to do style computation on them. All values have a standardized computed value that we try to reduce them to. When leaving the computation stage, any dimensional values are reduced to one of three possible outputs: auto, a percentage, or a pixel value. For clarity, let’s take a look at a few examples of what the web developer wrote and what the result will be following computation:
Web Developer Computed Value
font-size: 1em font-size: 16px
width: 50% width: 50%
height: auto height: auto
width: 506.4567894321568px width: 506.46px
line-height: calc(10px + 2em) line-height: 42px
border-color: currentColor border-color: rgb(0,0,0)
height: 50vh height: 540px
display: grid display: grid
Now that we’ve computed all the values in our data store, it’s time to handle the cascade.

Cascade

Since the CSS can come from a variety of sources, the browser needs a way to determine which styles should apply to a given element. To do this, the browser uses a formula called specificity, which counts the number of tags, classes, ids, and attribute selectors utilized in the selector, as well as the number of !important declarations present. Styles on an element via the inline style attribute are given a rank that wins over any style from within a <style> block or external style sheet. And if a web developer utilizes !important on a value, the value will win over any CSS no matter its location, unless there is a !important inline as well.
Graphic showing a hierarchy for determining CSS priority
To make this clear, let’s show a few selectors and their resulting specificity scores:
Selector Specificity Score
li 0 0 0 0 1
li.foo 0 0 0 1 1
#comment li.foo.bar 0 0 1 2 1
<li style="color: red"> 0 1 0 0 0
color: red !important 1 0 0 0 0
So what does the engine do when the specificity is tied? Given two or more selectors of equal specificity, the winner will be whichever one appears last in the document. In the following example, the div would have a blue background.
div {
	background: red;
}

div {
	background: blue;
}
Let’s expand on our .fancy-button example a little bit:
.fancy-button {
	background: green;
	border: 3px solid red;
	font-size: 1em;
}

div .fancy-button {
	background: yellow;
}
Now the CSS will produce the following data structure. We’ll continue building upon this throughout the article.
Selector Property Value Specificity Score Document Order
.fancy-button background-color rgb(0,255,0) 0 0 0 1 0 0
.fancy-button border-width 3px 0 0 0 1 0 1
.fancy-button border-style solid 0 0 0 1 0 2
.fancy-button border-color rgb(255,0,0) 0 0 0 1 0 3
.fancy-button font-size 16px 0 0 0 1 0 4
div .fancy-button background-color rgb(255,255,0) 0 0 0 1 1 5

Understanding origins

In “Server to Client,” Ali Alabbas discusses origins as they relate to browser navigation. In CSS, there are also origins, but they serve different purposes:
  • user: any styles set globally within the user agent by the user;
  • author: the web developer’s styles;
  • and user agent: anything that can utilize and render CSS (to most web developers and users, this is a browser).
The cascade power of each of these origins ensures that the greatest power lies with the user, then the author, and finally the user agent. Let’s expand our dataset a bit further and see what happens when the user sets their browser’s font size to a minimum of 2em:
Origin Selector Property Value Specificity Score Document Order
Author .fancy-button background-color rgb(0,255,0) 0 0 0 1 0 0
Author .fancy-button border-width 3px 0 0 0 1 0 1
Author .fancy-button border-style solid 0 0 0 1 0 2
Author .fancy-button border-color rgb(255,0,0) 0 0 0 1 0 3
Author .fancy-button font-size 16px 0 0 0 1 0 4
Author div .fancy-button background-color rgb(255,255,0) 0 0 0 1 1 5
User * font-size 32px 0 0 0 0 1 0

Doing the cascade

When the browser has a complete data structure of all declarations from all origins, it will sort them in accordance with specification. First it will sort by origin, then by specificity, and finally, by document order.
Origin ⬆ Selector Property Value Specificity Score ⬆ DocumentOrder ⬇
User * font-size 32px 0 0 0 0 1 0
Author div .fancy-button background-color rgb(255,255,0) 0 0 0 1 1 5
Author .fancy-button background-color rgb(0,255,0) 0 0 0 1 0 0
Author .fancy-button border-width 3px 0 0 0 1 0 1
Author .fancy-button border-style solid 0 0 0 1 0 2
Author .fancy-button border-color rgb(255,0,0) 0 0 0 1 0 3
Author .fancy-button font-size 16px 0 0 0 1 0 4
This results in the “winning” properties and values for the .fancy-button (the higher up in the table, the better). For example, from the previous table, you’ll note that the user’s browser preference settings take precedence over the web developer’s styles. Now the browser finds all DOM elements that match the denoted selectors, and hangs the resulting computed styles off the matching elements, in this case a div for the .fancy-button:
Property Value
font-size 32px
background-color rgb(255,255,0)
border-width 3px
border-color rgb(255,0,0)
border-style solid
If you wish to learn more about how the cascade works, take a look at the official specification.

CSS Object Model

While we’ve done a lot up to this stage, we’re not done yet. Now we need to update the CSS Object Model (CSSOM). The CSSOM resides within document.stylesheets, we need to update it so that it represents everything that has been parsed and computed up to this point. Web developers may utilize this information without even realizing it. For example, when calling into getComputedStyle(), the same process denoted above is run, if necessary.

Layout

Now that we have a DOM tree with styles applied, it’s time to begin the process of building up a tree for visual purposes. This tree is present in all modern engines and is referred to as the box tree. In order to construct this tree, we traverse down the DOM tree and create zero or more CSS boxes, each having a margin, border, padding and content box. In this section, we’ll be discussing the following CSS layout concepts:
  • Formatting context (FC): there are many types of formatting contexts, most of which web developers invoke by changing the display value for an element. Some of the most common formatting contexts are block (block formatting context, or BFC), flex, grid, table-cells, and inline. Some other CSS can force a new formatting context, too, such as position: absolute, using float, or utilizing multi-column.
  • Containing block: this is the ancestor block that you resolve styles against.
  • Inline direction: this is the direction in which text is laid out, as dictated by the element’s writing mode. In Latin-based languages this is the horizontal axis, and in CJK languages this is the vertical axis.
  • Block direction: this behaves exactly the same as the inline direction but is perpendicular to that axis. So, for Latin-based languages this is the vertical axis, and in CJK languages this is the horizontal axis.

Resolving auto

Remember from the computation phase that dimension values can be one of three values: auto, percentage, or pixel. The purpose of layout is to size and position all the boxes in the box tree to get them ready for painting. As a very visual person myself, I find examples can make it easier to understand how the box tree is constructed. To make it easier to follow, I will not be showing the individual CSS boxes, just the principal box. Let’s look at a basic “Hello world” layout using the following code:
<body>
<p>Hello world</p>
<style>
	body {
		width: 50px;
	}
</style>
</body>
Diagram showing an HTML body, a CSS box, and a property of width with a value of 50 pixels
The browser starts at the body element. We produce its principal box, which has a width of 50px, and a default height of auto.
Diagram showing a tree with a CSS box for the body and a CSS box for a paragraph
Now the browser moves on to the paragraph and produces its principal box, and since paragraphs have a margin by default, this will impact the height of the body, as reflected in the visual.
Diagram showing a tree with a CSS box for the body and a CSS box for a paragraph, and now a line box appended to the end
Now the browser moves onto the text of “Hello world,” which is a text node in the DOM. As such, we produce a line box inside of the layout. Notice that the text has overflowed the body. We’ll handle this in the next step.
Diagram showing a tree with a CSS box for the body and a CSS box for a paragraph, and now a line box appended to the end, which has an arrow pointing back to the paragraph CSS box
Because “world” does not fit and we haven’t changed the overflow property from its default, the engine reports back to its parent where it left off in laying out the text.
Diagram showing a tree with a CSS box for the body and a CSS box for a paragraph, and now two line boxes appended to the end
Since the parent has received a token that its child wasn’t able to complete the layout of all the content, it clones the line box, which includes all the styles, and passes the information for that box to complete the layout. Once the layout is complete, the browser walks back up the box tree, resolving any auto or percentage-based values that haven’t been resolved. In the image, you can see that the body and the paragraph is now encompassing all of “Hello world” because its height was set to auto.

Dealing with floats

Now let’s get a little bit more complex. We’ll take a normal layout where we have a button that says “Share It,” and float it to the left of a paragraph of Latin text. The float itself is what is considered to be a “shrink-to-fit” context. The reason it is referred to as “shrink-to-fit” is because the box will shrink down around its content if the dimensions are auto. Float boxes are one type of box that matches this layout type, but there are many other boxes, such as absolute positioned boxes (including position: fixed elements) and table cells with auto-based sizing, for example. Here is the code for our button scenario:
<article>
	<button>SHARE IT</button>
	<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nullam pellentesq</p>
</article>
<style>
	article {
		min-width: 400px;
		max-width: 800px;
		background: rgb(191, 191, 191);
		padding: 5px;
	}

	button {
		float: left;
		background: rgb(210, 32, 79);
		padding: 3px 10px;
		border: 2px solid black;
		margin: 5px;
	}

	p {
		margin: 0;
	}
</style>
Diagram of a box tree with a CSS box for an article, a CSS box for a button floated left, and a line box
The process starts off by following the same pattern as our “Hello world” example, so I’m going to skip to where we begin handling the floated button.
Diagram of a box tree with a CSS box and a line box that calculates the maximum and minimum width for the button
Since a float creates a new block formatting context (BFC) and is a shrink-to-fit context, the browser does a specific type of layout called content measure. In this mode, it looks identical to the other layout but with an important difference, which is that it is done in infinite space. What the browser does during this phase is lay out the tree of the BFC in both its largest and smallest widths. In this case, it is laying out a button with text, so its narrowest size, including all other CSS boxes, will be the size of the longest word. At its widest, it will be all of the text on one line, with the addition of the CSS boxes. Note: The color of the buttons here is not literal. It is for illustrative purposes only.
Diagram of a box tree with a CSS box for an article, a CSS box for a button floated left, and a line box, with the CSS box for the button now communicating the min and max width back up to the CSS box for the article
Now that we know that the minimum width is 86px, and the maximum width is 115px, we pass this information back to the parent box for it to decide the width and to place the button appropriately. In this scenario, there is space to fit the float at max size so that is how the button is laid out.
Diagram of a box tree with a CSS box for an article with two branches: a CSS box for a button floated left and a CSS box for a paragraph. The CSS box for the article is communicating the min and max width for the button to the paragraph.
In order to ensure that the browser adheres to the standard and the content wraps around the float, the browser changes the geometry of the article BFC. This geometry is passed to the paragraph to use during its layout.
Diagram of a box tree with a CSS box for an article with two branches: a CSS box for a button floated left and a CSS box for a paragraph. The paragraph has not been parsed yet and is on one line overflowing the parent container.
From here the browser follows the same layout process as it did in our first example—but it ensures that any inline content’s inline and block starting positions are outside of the constraint space taken up by the float.
Diagram of a box tree with a CSS box for an article with two branches: a CSS box for a button floated left and a CSS box for a paragraph. The paragraph has now been parsed and broken into four lines, and there are four line boxes in the diagram to show this.
As the browser continues walking down the tree and cloning nodes, it moves past the block position of the constraint space. This allows the final line of text (as well as the one before it) to begin at the start of the content box in the inline direction. And then the browser walks back up the tree, resolving auto and percentage values as necessary.

Understanding fragmentation

One final aspect to touch on for how layout works is fragmentation. If you’ve ever printed a web page or used CSS Multi-column, then you’ve taken advantage of fragmentation. Fragmentation is the logic of breaking content apart to fit it into a different geometry. Let’s take a look at the same example utilizing CSS Multi-column:
<body>
	<div>
		<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Cras nibh orci, tincidunt eget enim et, pellentesque condimentum risus. Aenean sollicitudin risus velit, quis tempor leo malesuada vel. Donec consequat aliquet mauris. Vestibulum ante ipsum primis in faucibus
		</p>
	</div>
<style>
	body {
		columns: 2;
		column-fill: auto;
		height: 300px;
	}
</style>
</body>
Diagram of a box tree showing a CSS box for a body and a multicol box for a div
Once the browser reaches the multicol formatting context box, it sees that it has a set number of columns.
Diagram of a box tree showing a CSS box for a body and a multicol box for a div, now with a fragmentainer CSS box created under the div
It follows the similar cloning model from before, and creates a fragmentainer with the correct dimensions to adhere to the authors desire for their columns.
Diagram of a box tree showing a CSS box for a body and a multicol box for a div, now with a CSS box for each column and a line box for each line within each column
The browser then lays out as many lines as possible by following the same pattern as before. Then the browser creates another fragmentainer and continues the layout to completion.

Painting

OK, so let’s recap where we’re at to this point. We’ve taken out all the CSS content, parsed it, cascaded it onto the DOM tree, and completed layout. But we haven’t applied color, borders, shadows, and similar design treatments to the layout–adding these is known as painting. Painting is roughly standardized by CSS, and to put it concisely (you can read the full breakdown in CSS 2.2 Appendix E), you paint in the following order:
  • background;
  • border;
  • and content.
So if we take our “SHARE IT” button from earlier and follow this process, it will look something like this:
Graphic showing progressive passes of a box: first the background, then the border, the the content
Once this is completed, it is converted to a bitmap. That’s right—ultimately every layout element (even text) becomes an image under the hood.

Concerning the z-index

Now, most of our websites don’t consist of a single element. Moreover, we often want to have certain elements appear on top of other elements. To accomplish this, we can harness the power of the z-index to superimpose one element over another. This may feel like how we work with layers in our design software, but the only layers that exist are within the browser’s compositor. It might seem as though we’re creating new layers using z-index, but we’re not—so what are we doing? What we’re doing is creating a new stacking context. Creating a new stacking context effectively changes the order in which you paint elements. Let’s look at an example:
<body>
<div id="one">
	Item 1
</div>
<div id="two">
	Item 2
</div>
<style>
body {
	background: lightgray;
}
div {
	width: 300px;
	height: 300px;
	position: absolute;
	background: white;
	z-index: 2;
}
#two {
	background: green;
	z-index: 1;
}
</style>
</body>
Without z-index utilization, the document above would be painted in document order, which would place “Item 2” on top of “Item 1.” But because of the z-index, the painting order is changed. Let’s step through each phase, similar to how we stepped through our earlier layouts.
Diagram of a box tree with a basic layout representing a root stacking context. One box has a z-index of one, another box has a z-index of 2.
The browser starts with the root box; we paint in the background.
The same layout, but the box with the z-index of 1 is now rendering.
The browser then traverses, out of document order to the lower level stacking context (which in this case is “Item 2”) and begins to paint that element following the same rules from above.
The same layout, but the box with the z-index of 2 is now rendering on top of the previous box
Then it traverses to the next highest stacking context (which in this case is “Item 1”) and paints it according to the order defined in CSS 2.2.
The z-index has no bearing on color, just which element is visible to users, and hence, which text and color is visible.

Composition

At this stage, we have a minimum of a single bitmap that is passed from painting to the compositor. The compositor’s job is to create a layer, or layers, and render the bitmap(s) to the screen for the end user to see. A reasonable question to ask at this point is, “Why would any site need more than one bitmap or compositor layer?” Well, with the examples that we’ve looked at thus far, we really wouldn’t. But let’s look at an example that’s a little bit more complex. Let’s say that in a hypothetical world, the Office team wants to bring Clippy back online, and they want to draw attention to Clippy by having him pulsate via a CSS transform. The code for animating Clippy could look something like this:
<div class="clippy"></div>
<style>
.clippy {
	width: 100px;
	height: 100px;
	animation: pulse 1s infinite;
	background: url(clippy.svg);
}

@keyframes pulse {
	from {
		transform: scale(1, 1);
	}
	to {
		transform: scale(2, 2);
	}
}
</style>
When the browser reads that the web developer wants to animate Clippy on infinite loop, it has two options:
  • It can go back to the repaint stage for every frame of the animation, and produce a new bitmap to send back to the compositor.
  • Or it can produce two different bitmaps, and allow the compositor to do the animation itself on only the layer that has this animation applied.
In most circumstances, the browser will choose option two and produce the following (I have purposefully simplified the amount of layers Word Online would produce for this example):
Diagram showing a root composite layer with Clippy on his own layer
Then it will re-compose the Clippy bitmap in the correct position and handle the pulsating animation. This is a great win for performance as in many engines the compositor is on its own thread, and this allows the main thread to be unblocked. If the browser were to choose option one above, it would have to block on every frame to accomplish the same result, which would negatively impact performance and responsiveness for the end user.
A diagram showing a layout with Clippy, with a chart of the process of rendering. The Compose step is looping.

Creating the illusion of interactivity

As we’ve just learned, we took all the styles and the DOM, and produced an image that we rendered to the end user. So how does the browser create the illusion of interactivity? Welp, as I’m sure you’ve now learned, so let’s take a look at an example using our handy “SHARE IT” button as an analogy:
button {
    float: left;
    background: rgb(210, 32, 79);
    padding: 3px 10px;
    border: 2px solid black;
}

button:hover {
    background: teal;
    color: black;
}
All we’ve added here is a pseudo-class that tells the browser to change the button’s background and text color when the user hovers over the button. This begs the question, how does the browser handle this? The browser constantly tracks a variety of inputs, and while those inputs are moving it goes through a process called hit testing. For this example, the process looks like this:
A diagram showing the process for hit testing. The process is detailed below.
  1. The user moves the mouse over the button.
  2. The browser fires an event that the mouse has been moved and goes into the hit testing algorithm, which essentially asks the question, “What box(es) is the mouse touching?”
  3. The algorithm returns the box that is linked to our “SHARE IT” button.
  4. The browser asks the question, “Is there anything I should do since a mouse is hovering over you?”
  5. It quickly runs style/cascade for this box and its children and determines that, yes, there is a :hover pseudo-class with paint-only style adjustments inside of the declaration block.
  6. It hangs those styles off of the DOM element (as we learned in the cascade phase), which is the button in this case.
  7. It skips past layout and goes directly to painting a new bitmap.
  8. The new bitmap is passed off to the compositor and then to the user.
To the user, this effectively creates the perception of interactivity, even though the browser is just swapping an orange image to a green one.

Et voilà!

Hopefully this has removed some of the mystery from how CSS goes from the braces you’ve written to rendered pixels in your browser. In this leg of our journey, we discussed how CSS is parsed, how values are computed, and how the cascade actually works. Then we dove into a discussion of layout, painting, and composition. Now stay tuned for the final installment of this series, where one of the designers of the JavaScript language itself will discuss how browsers compile and execute our JavaScript.

<![CDATA[ var to JIT ]]> . 17 abr 2019 13:51:16.A List Apart: The Full Feed.

In our previous article we described how the browser uses CSS to render beautiful pixels to the user’s screen. Although modern CSS can (and should!) be used to create highly interactive user experiences, for the last mile of interactivity, we need to dynamically update the HTML document. For that, we’re going to need JavaScript.

Bundle to bytecode

For a modern web application, the JavaScript that the browser first sees will typically not be the JavaScript written by a developer. Instead, it will most likely be a bundle produced by a tool such as webpack. And it will probably be a rather large bundle containing a UI framework such as React, various polyfills (libraries that emulate new platform features in older browsers), and an assortment of other packages found on npm. The first challenge for the browser’s JavaScript engine is to convert that big bundle of text into instructions that can be executed on a virtual machine. It needs to parse the code, and because the user is waiting on JavaScript for all that interactivity, it needs to do it fast. At a high level, the JavaScript engine parses code just like any other programming language compiler. First, the stream of input text is broken up into chunks called tokens. Each token represents a meaningful unit within the syntactic structure of the language, similar to words and punctuation in natural written language. Those tokens are then fed into a top-down parser that produces a tree structure representing the program. Language designers and compiler engineers like to call this tree structure an AST (abstract syntax tree). The resulting AST can then be analyzed to produce a list of virtual machine instructions called bytecode.
JavaScript is run through the abstract syntax tree, which produces byte code
The process of generating an AST is one of the more straightforward aspects of a JavaScript engine. Unfortunately, it can also be slow. Remember that big bundle of code we started out with? The JavaScript engine has to parse and build syntax trees for the entire bundle before the user can start interacting with the site. Much of that code may be unnecessary for the initial page load, and some may not even be executed at all! Fortunately, our compiler engineers have invented a variety of tricks to speed things up. First, some engines parse code on a background thread, freeing up the main UI thread for other computations. Second, modern engines will delay the creation of in-memory syntax trees for as long as possible by using a technique called deferred parsing or lazy compilation. It works like this: if the engine sees a function definition that might not be executed for a while, it will perform a fast, “throwaway” parse of the function body. This throwaway parse will find any syntax errors that might be lurking within the code, but it will not generate an AST. Later, when the function is called for the first time, the code will be parsed again. This time, the engine will generate the full AST and bytecode required for execution. In the world of JavaScript, doing things twice can sometimes be faster than doing things once! The best optimizations, though, are the ones that allow us to bypass doing any work at all. In the case of JavaScript compilation, this means skipping the parsing step completely. Some JavaScript engines will attempt to cache the generated bytecode for later reuse in case the user visits the site again. This isn’t quite as simple as it sounds. JavaScript bundles can change frequently as websites are updated, and the browser must carefully weigh the cost of serializing bytecode against the performance improvements that come from caching.

Bytecode to runtime

Now that we have our bytecode, we’re ready to start execution. In today’s JavaScript engines, the bytecode that we generated during parsing is first fed into a virtual machine called an interpreter. An interpreter is a bit like a CPU implemented in software. It looks at each bytecode instruction, one at a time, and decides what actual machine instructions to execute and what to do next. The structure and behavior of the JavaScript programming language is defined in a document formally known as ECMA-262. Language designers like to call the structure part “syntax” and the behavior part “semantics.” The semantics of almost every aspect of the language is defined by algorithms that are written using prose-like pseudo-code. For instance, let’s pretend we are compiler engineers implementing the signed right shift operator (>>). Here’s what the specification tells us: ShiftExpression : ShiftExpression >> AdditiveExpression
  1. Let lref be the result of evaluating ShiftExpression.
  2. Let lval be ? GetValue(lref).
  3. Let rref be the result of evaluating AdditiveExpression.
  4. Let rval be ? GetValue(rref).
  5. Let lnum be ? ToInt32(lval).
  6. Let rnum be ? ToUint32(rval).
  7. Let shiftCount be the result of masking out all but the least significant 5 bits of rnum, that is, compute rnum & 0x1F.
  8. Return the result of performing a sign-extending right shift of lnum by shiftCount bits. The most significant bit is propagated. The result is a signed 32-bit integer.
In the first six steps we convert the operands (the values on either side of the >>) into 32-bit integers, and then we perform the actual shift operation. If you squint, it looks a bit like a recipe. If you really squint, you might see the beginnings of a syntax-directed interpreter. Unfortunately, if we implemented the algorithms exactly as they are described in the specification, we’d end up with a very slow interpreter. Consider the simple operation of getting a property value from a JavaScript object. Objects in JavaScript are conceptually like dictionaries. Each property is keyed by a string name. Objects can also have a prototype object.
A JavaScript object with a prototype, an arrow pointing to an object.prototype, an arrow pointing to obj, an arrow pointing to obj2
If an object doesn’t have an entry for a given string key, then we need to look for that key in the prototype. We repeat this operation until we either find the key that we’re looking for or get to the end of the prototype chain. That’s potentially a lot of work to perform every time we want to get a property value out of an object! The strategy used in JavaScript engines for speeding up dynamic property lookup is called inline caching. Inline caching was first developed for the language Smalltalk in the 1980s. The basic idea is that the results from previous property lookup operations can be stored directly in the generated bytecode instructions. To see how this works, let’s imagine that the JavaScript engine is a towering gothic cathedral. As we step inside, we notice that the engine is chock full of objects swarming around. Each object has an identifiable shape that determines where its properties are stored. Now, imagine that we are following a series of bytecode instructions written on a scroll. The next instruction tells us to get the value of the property named “x” from some object. You grab that object, turn it over in your hands a few times to figure out where “x” is stored, and find out that it is stored in the object’s second data slot. It occurs to you that any object with this same shape will have an “x” property in its second data slot. You pull out your quill and make a note on your bytecode scroll indicating the shape of the object and the location of the “x” property. The next time you see this instruction you’ll simply check the shape of the object. If the shape matches what you’ve recorded in your bytecode notes, you’ll know exactly where the data is located without having to inspect the object. You’ve just implemented what’s known as a monomorphic inline cache! But what happens if the shape of the object doesn’t match our bytecode notes? We can get around this problem by drawing a small table with a row for each shape we’ve seen. When we see a new shape, we use our quill to add a new row to the table. We now have a polymorphic inline cache. It’s not quite as fast as the monomorphic cache, and it takes up a little more space on the scroll, but if there aren’t too many rows, it works quite well. If we end up with a table that’s too big, we’ll want to erase the table, and make a note to remind ourselves to not worry about inline caching for this instruction. In compiler terms, we have a megamorphic callsite. In general, monomorphic code is very fast, polymorphic code is almost as fast, and megamorphic code tends to be rather slow. Or, in haiku form:
One shape, flowing wind
Several shapes, jumping fox
Many shapes, turtle

Interpreter to just-in-time (JIT)

The great thing about an interpreter is that it can start executing code quickly, and for code that is run only once or twice, this “software CPU” performs acceptably fast. But for “hot code” (functions that are run hundreds, thousands, or millions of times) what we really want is to execute machine instructions directly on the actual hardware. We want just-in-time (JIT) compilation. As JavaScript functions are executed by the interpreter, various statistics are gathered about how often the function has been called and what kinds of arguments it is called with. If the function is run frequently with the same kinds of arguments, the engine may decide to convert the function’s bytecode into machine code. Let’s step once again into our hypothetical JavaScript engine, the gothic cathedral. As the program executes, you dutifully pull bytecode scrolls from carefully labeled shelves. For each function, there is roughly one scroll. As you follow the instructions on each scroll, you record how many times you’ve executed the scroll. You also note the shapes of the objects encountered while carrying out the instructions. You are, in effect, a profiling interpreter. When you open the next scroll of bytecode, you notice that this one is “hot.” You’ve executed it dozens of times, and you think it would run much faster in machine code. Fortunately, there are two rooms full of scribes that are ready to perform the translation for you. The scribes in the first room, a brightly lit open office, can translate bytecode into machine code quite fast. The code that they produce is of good quality and is concise, but it’s not as efficient as it could be. The scribes in the second room, dark and misty with incense, work more carefully and take a bit longer to finish. The code that they produce, however, is highly optimized and about as fast as possible. In compiler-speak, we refer to these different rooms as JIT compilation tiers. Different engines have different numbers of tiers depending on the tradeoffs they’ve chosen to make. You decide to send the bytecode to the first room of scribes. After working on it for a bit, using your carefully recorded notes, they produce a new scroll containing machine instructions and place it on the correct shelf alongside the original bytecode version. The next time you need to execute the function, you can use this faster set of instructions. The only problem is that the scribes made quite a few assumptions when they translated our scroll. Perhaps they assumed that a variable would always hold an integer. What happens if one of those assumptions is invalidated? In that case we must perform what’s known as a bailout. We pull the original bytecode scroll from the shelf, and figure out which instruction we should start executing from. The machine code scroll disappears in a puff of smoke and the process starts again.

To infinity and beyond

Today’s high-performance JavaScript engines have evolved far beyond the relatively simple interpreters that shipped with Netscape Navigator and Internet Explorer in the 1990s. And that evolution continues. New features are incrementally added to the language. Common coding patterns are optimized. WebAssembly is maturing. A richer standard module library is being developed. As developers, we can expect modern JavaScript engines to deliver fast and efficient execution as long as we keep our bundle sizes in check and try to make sure our performance-critical code is not overly dynamic.

<![CDATA[ Progressive Web Apps: The Case for PWAs ]]> . 17 abr 2019 13:51:16.A List Apart: The Full Feed.

Now that you know what a progressive web app is, you’re probably wondering if your organization would benefit from one. To determine if it makes sense for your organization, ask yourself two questions:
  1. Does your organization have a website? If so, you would probably benefit from a progressive web app. This may sound flippant, but it’s true: nearly every website should be a progressive web app, because they represent best practices for the web.
  2. Does your organization make money on your website via ecommerce, advertising, or some other method? If so, you definitely need a progressive web app, because progressive web apps can have a significant impact on revenue.
This doesn’t mean that your site needs to have every possible feature of progressive web apps. You may have no need to provide offline functionality, push notifications, or even the ability for people to install your website to their homescreen. You may only want the bare minimum: a secure site, a service worker to speed up the site, and a manifest file—things that benefit every website. Of course, you may decide that your personal website or side project doesn’t warrant the extra effort to make it into a progressive web app. That’s understandable—and in the long run, even personal websites will gain progressive web app features when the underlying content management systems add support for them. For example, both Magento and WordPress have already announced their plans to bring progressive web apps to their respective platforms. Expect other platforms to follow suit. But if you’re running any kind of website that makes money for your organization, then it would behoove you to start planning for how to convert your website to a progressive web app. Companies that have deployed progressive web apps have seen increases in conversion, user engagement, sales, and advertising revenue. For example, Pinterest saw core engagement increase by 60 percent and user-generated ad revenue increase by 44 percent (Fig 2.1). West Elm saw a 15 percent increase in average time spent on their site and a 9 percent lift in revenue per visit.
Comparing old mobile web to the progressive web version of Pinterest, the time spent that was greater than 5 minutes increased by 40%, the user-generated ad revenue increased by 44%, ad clickthroughs increased by 50%, and core engagement metrics improved by 60%. Even comparing to the native app, most of these same metrics increased between 2-5%.
Fig 2.1: Addy Osmani, an engineering manager for Google, wrote a case study about Pinterest’s progressive web app, comparing it to both their old mobile website and their native app.
The success stories for progressive web apps are so abundant that my company, Cloud Four, started a website called PWA Stats to keep track of them (Fig 2.2). There’s a good chance that we’ve collected a case study from an organization similar to yours that you can use to convince your coworkers that building a progressive web app makes sense.
A screenshot of the PWA Stats homepage, showing case studies for Uber, Trivago, Petlove, and the Grand Velas Riviera Maya resort.
Fig 2.2: PWAstats.com collects statistics and stories documenting the impact of progressive web apps.
And convincing them may be necessary. Despite the clear benefits of progressive web apps, many businesses still haven’t converted—often because they simply don’t know about PWAs yet. (So if you start building one now, you may get a jump on your competition!) But there is also a lot of confusion about what progressive web apps are capable of, where they can be used, and how they relate to native apps. This confusion creates fear, uncertainty, and doubt (FUD) that slow the adoption of progressive web apps. If you advocate for progressive web apps in your organization, you’ll likely find some confusion and possibly even encounter some resistance. So let’s equip you with arguments to cut through the FUD and convince your colleagues.

Native apps and PWAs can coexist

If your organization already has a native app, stakeholders may balk at the idea of also having a progressive web app—especially since the main selling point of PWAs is to enable native app features and functionality. It’s tempting to view progressive web apps as competition to native apps—much of the press coverage has adopted this storyline. But the reality is that progressive web apps make sense irrespective of whether a company has a native app. Set aside the “native versus web” debate, and focus on the experience you provide customers who interact with your organization via the web. Progressive web apps simply make sense on their own merits: they can help you reach more customers, secure your site, generate revenue, provide more reliable experiences, and notify users of updates—all as a complement to your native app.

Reach more customers

Not all of your current customers—and none of your potential customers—have your native app installed. Even your average customer is unlikely to have your app installed, and those customers who do have your app may still visit your site on a desktop computer. Providing a better experience on the website itself will increase the chances that current and future customers will read your content or buy your products (or even download your native app!). A progressive web app can provide that better experience. Despite what the tech press might have you believe, the mobile web is growing faster than native apps. comScore compared the top one thousand apps to the top one thousand mobile web properties and found that “mobile web audiences are almost 3x the size and growing 2x as fast as app audiences”. And while it’s true that people spend more time in their favorite apps than they do on the web, you may have trouble convincing people to install your app in the first place. Over half of smartphone users in the United States don’t download any apps in a typical month. Having a native app in an app store doesn’t guarantee that people will install it. It costs a lot to advertise an app and convince people to try it. According to app marketing company Liftoff, the average cost to get someone to install an app is $4.12, and that shoots up to $8.21 per install if you want someone to create an account in your app. If you’re lucky enough to get someone to install your app, the next hurdle is convincing them to continue to use it. When analyst Andrew Chen analyzed user retention data from 125 million mobile phones, he found that “the average app loses 77% of its DAUs [daily active users] within the first 3 days after the install. Within 30 days, it’s lost 90% of DAUs. Within 90 days, it’s over 95%” (Fig 2.3).
Chart: The average retention curve for Android apps drops precipitously within the first three days and continues to drop more slowly to near 0 over the next 90 days.
Fig 2.3: App loyalty remains a big issue for native apps. The average app loses over 95 percent of its daily active users within 90 days.
Progressive web apps don’t have those same challenges. They’re as easy for people to discover as your website is, because they are your website. And the features of a progressive web app are available immediately. There’s no need to jump through the hoops of visiting an app store and downloading the app. Installation is fast: it happens in the background during the first site visit, and can literally be as simple as adding an icon to the home screen. As Alex Russell wrote in a 2017 Medium post:
The friction of PWA installation is much lower. Our internal metrics at Google show that for similar volume of prompting for PWA banners and native app banners — the closest thing to an apples-to-apples comparison we can find — PWA banners convert 5–6x more often. More than half of users who chose to install a native app from these banners fail to complete installing the app whereas PWA installation is near-instant.
In short, a large and growing percentage of your customers interact with you on the web. Progressive web apps can lead to more revenue and engagement from more customers.

Secure your website

If you’re collecting credit cards or private information, providing a secure website for your web visitors is a must. But even if your website doesn’t handle sensitive data, it still makes sense to use HTTPS and provide a secure experience. Even seemingly innocuous web traffic can provide signals that can identify individuals and potentially compromise them. That’s not to mention the concerns raised by revelations of government snooping. It used to be that running a secure server was costly, confusing, and (seemingly) slower. Things have changed. SSL/TLS certificates used to cost hundreds of dollars, but now certificate provider Let’s Encrypt gives them out for free. Many hosting providers have integrated with certificate providers so you can set up HTTPS with a single click. And it turns out that HTTPS wasn’t as slow as we thought it was. Websites on HTTPS can also move to a new version of HTTP called HTTP/2. The biggest benefit is that HTTP/2 is significantly faster that HTTP/1. For many hosting providers and content delivery networks (CDNs), the moment you move to HTTPS, you get HTTP/2 with no additional work. If that wasn’t enough incentive to move to HTTPS, browser makers are using a carrot-and-stick approach for pushing websites to make the change. For the stick, Chrome has started warning users when they enter data on a site that isn’t running HTTPS. By the time you read this, Google plans to label all HTTP pages with a “Not secure” warning (Fig 2.4). Other browsers will likely follow suit and start to flag sites that aren’t encrypted to make sure users are aware that their data could be intercepted.
The eventual treatment of all HTTP pages in Chrome will be to show a red yield icon with the words 'Not secure'.
Fig 2.4: Google has announced its intention to label any website that isn’t running HTTPS as not secure. Different warning styles will be rolled out over time, until the label reaches the final state shown here.
For the HTTPS carrot, browsers are starting to require HTTPS to use new features. If you want to utilize the latest and greatest web tech, you’ll need to be running HTTPS. In fact, some features that used to work on nonsecure HTTP that are considered to contain sensitive data—for example, geolocation—are being restricted to HTTPS now. On second thought, perhaps this is a bit of a stick as well. A carrot stick? With all that in mind, it makes sense to set up a secure website for your visitors. You’ll avoid scary nonsecure warnings. You’ll get access to new browser features. You’ll gain speed benefits from HTTP/2. And: you’ll be setting yourself up for a progressive web app. In order to use service workers, the core technology for progressive web apps, your website must be on HTTPS. So if you want to reap the rewards of all the PWA goodness, you need to do the work to make sure your foundation is secure.

Generate more revenue

There are numerous studies that show a connection between the speed of a website and the amount of time and money people are willing to spend on it. DoubleClick found that “53% of mobile site visits are abandoned if pages take longer than 3 seconds to load.” Walmart found that for every 100 milliseconds of improvement to page load time, there was up to a one percent increase in incremental revenue. Providing a fast web experience makes a big difference to the bottom line. Unfortunately, the average load time for mobile websites is nineteen seconds on 3G connections. That’s where a progressive web app can help. Progressive web apps use service workers to provide an exceptionally fast experience. Service workers allow developers to explicitly define what files the browser should store in its local cache and under what circumstances the browser should check for updates to the cached files. Files that are stored in the local cache can be accessed much more quickly than files that are retrieved from the network. When someone requests a new page from a progressive web app, most of the files needed to render that page are already stored on the local device. This means that the page can load nearly instantaneously because all the browser needs to download is the incremental information needed for that page. In many ways, this is the same thing that makes native apps so fast. When someone installs a native app, they download the files necessary to run the app ahead of time. After that occurs, the native app only has to retrieve any new data. Service workers allow the web to do something similar. The impact of progressive web apps on performance can be astounding. For example, Tinder cut load times from 11.91 seconds to 4.69 seconds with their progressive web app—and it’s 90 percent smaller than their native Android app. Hotel chain Treebo launched a progressive web app and saw a fourfold increase in conversion rates year-over-year; conversion rates for repeat users saw a threefold increase, and their median interactive time on mobile dropped to 1.5 seconds.

Ensure network reliability

Mobile networks are flaky. One moment you’re on a fast LTE connection, and the next you’re slogging along at 2G speeds—or simply offline. We’ve all experienced situations like this. But our websites are still primarily built with an assumption that networks are reliable. With progressive web apps, you can create an app that continues to work when someone is offline. In fact, the technology used to create an offline experience is the same technology used to make web pages fast: service workers. Remember, service workers allow us to explicitly tell the browser what to cache locally. We can expand what is stored locally—not only the assets needed to render the app, but also the content of pages—so that people can continue to view pages offline (Fig 2.5).
Three screens from the housing.com site show how the design adapts to show when it is offline and that it can continue to show saved results even when offline.
Fig 2.5: The header in housing.com’s progressive web app changes from purple (left) to gray when offline (middle). Content the user has previously viewed or favorited is available offline (right), which is important for housing.com’s home market in India, where network connectivity can be slow and unreliable.
Using a service worker, we can even precache the shell of our application behind the scenes. This means that when someone visits a progressive web app for the first time, the whole application could be downloaded, stored in the cache, and ready for offline use without requiring the person to take any action to initiate it. For more on when precaching makes sense, see Chapter 5.

Keep users engaged

Push notifications are perhaps the best way to keep people engaged with an application. They prompt someone to return to an app with tantalizing nuggets of new information, from breaking news alerts to chat messages. So why limit push notifications to those who install a native application? For instance, if you have a chat or social media application, wouldn’t it be nice to notify people of new messages (Fig 2.6)?
Two screens: On the left, a list of system notifications including one from the Twitter website. On the right, the notification opened on the Twitter site to a funny tweet about WiFi passwords in a bar.
Fig 2.6: Twitter’s progressive web app, Twitter Lite, sends the same notifications that its native app sends. They appear alongside other app notifications (left). Selecting one takes you directly to the referenced tweet in Twitter Lite (right).
Progressive web apps—specifically our friend the service worker—make push notifications possible for any website to use. Notifications aren’t required for something to be a progressive web app, but they are often effective at increasing re-engagement and revenue: We’ll talk more about push notifications in Chapter 6. For now, it can be helpful to know that progressive web apps can send push notifications, just like a native app—which may help you make the case to your company. Whether you have a native app or not, a progressive web app is probably right for you. Every step toward a progressive web app is a step toward a better website. Websites should be secure. They should be fast. They would be better if they were available offline and able to send notifications when necessary. For your customers who don’t have or use your native app, providing them with a better website experience is an excellent move for your business. It’s really that simple.

<![CDATA[ Designing for Interaction Modes ]]> . 17 abr 2019 13:51:16.A List Apart: The Full Feed.

We humans have developed ways of coping with digital interfaces. We have tactics. We accept shortcomings. We make do. But why is it still so hard (on most of the internet) to avoid uphill struggles? Often, for example, a quality reading experience is only fully available via a hack, using Safari’s reader view or a browser plug-in. I use Instapaper to send articles to my Kindle—a device that’s devoted to reading mode—because reading is not just about getting the job done. The experience itself is also important. The best experiences result from designers matching the way the computer behaves with the way our users are thinking, feeling, and interacting. This is what user experience design is all about. And yet, because of pressures, competing priorities, and industry trends, interaction modes are often an afterthought.

Prioritizing interaction modes

A while back I created a persona for Cambridge University Press named Rachel Research Gatherer. The Rachel bit, I now understand, was irrelevant (unhelpful, even). But naming the research gatherer mode helped my team focus on what was needed to support the gathering of scholarly articles and books. The precise ordering and arrangement of citation data, author biographies, metrics, and publication metadata was all organized around this central thought: The user is trying to gather research. This focused our feature set and allowed for deprioritizing functionality related to other, non-essential modes (e.g., reading online—our research gatherers were only interested in saving PDFs to read later). In fact, the best personas I’ve seen have always included the interaction mode (or dominant behavior) in the title, which encourages a focus on software that supports that way of interacting. Thinking about roles or demographic attributes just isn’t as helpful. Presuming that lots of research gatherers are going to show up, you want them to converse with software that is appropriately trained and has a research gathering mode that can easily be found and switched on. The leap to be made is one from understanding a human behavior to designing its matching counterpart: an appropriate computer behavior. I’ve come to think of interaction modes as aspects of the persona you want your digital product or service to have. They can codify a specific set of behaviors, morals, and manners.

Moving between modes

In most cases, designers have to account for multiple possible interaction modes and, crucially, the shifts between them. Some shifts can be explicitly triggered by the user. Take Audible’s driving mode, which helps users stay safe by minimizing potential distractions: it filters out all but three controls and makes the entire upper part of the screen a giant play/pause button.
Mobile phone showing Audible's driving mode.
Fig. 1: Audible’s driving mode.
Driving mode is activated by tapping on a tiny icon, but modes could equally be switched on via a link. If a link took you to “lost or stolen card” on your bank’s website, you might welcome a mode that deals with stressful situations. This might involve an appropriately short quantity of text and guidance—hopefully a quick-fix option (e.g., “freeze my card”) and directions to the nearest human support. Modes can also shift in response to implied needs. The National Trust—an organization that maintains historic and natural sites across England, Wales, and Northern Ireland—has an app whose visit mode focuses on local events and information relevant to users’ geolocation. This mode is offered to the user when they approach a National Trust property. It’s a safe bet that they’re going to prefer this mode, but they’re offered the choice to activate it anyway. It’s good manners. There are also times when there is no need to ask. Let’s consider another familiar human-computer interaction: evaluating, a mode in which the human tries to assess the quality or fitness of something, as one might do when comparison shopping for a new laptop. The computer (if trained appropriately) helps by surfacing the right metadata, summary info, reviews, and so on. A bit like research gathering, it’s a mode that might lead to reading mode, in a move from “Shall I read this?” to “I’m reading this!” A well-presented article will start with different content and functional elements than it continues with. The top of this page is all about supporting evaluation mode; then those elements fall away when the user indicates that they’ve shifted into reading mode. They show this intent by, say, scrolling slowly down the page or clicking a “read more” button.
One box containing a series of smaller boxes to represent content, broken up to show the layout. Another box containing smaller boxes representing unbroken narrative text.
Fig. 2: Evaluating mode might include the author bio, date, number of comments, classification tags, lead image, and the first paragraph with a “read more” button. Reading mode might include just beautiful text and the occasional supporting image.
The object, in this case an article, looks different, sometimes very different, when supporting different modes. And the user might move between these modes without even registering the shift.

Interaction modes are computer behaviors

A simple but important distinction to make when thinking about modes: Interaction modes are something the software does, not something the user does; “reading” is just a shorthand for “reading mode.” This distinction is critical, because it allows us to be so much more precise about the behavior of the components in a design system. We don’t always need to rely on individual interpretation of personae or journey maps, or remember an agreed set of design principles. We can, instead, bake our values into our modes. We can, for example, name components according to the mode they’re intended to support rather than just to create more purposeful and consistent designs (though these are great things to aim for). Interaction modes also offer us a design tool that can help tame our technology, giving it manners that work in a variety of contexts. And in our world of agentive AI, chatbottery, and algorithms, getting a grip on this conduct is becoming increasingly important.

Two moral questions

As time goes on, we have more and more powerful controls available to us in fast digital mediums. There’s an increasing need to recognize that poor usability is not the only factor to watch out for. We need to be working design ethics into our decision making. It could start with a simple moral question for design teams: how much are you going to help your users interact in the way that they would prefer? If they’re reading, how many ads (or other distractions) are you going to throw in their faces? Even though users might prefer ads to paying for content, there are better and worse ways to show people ads. Deliberately designing for a reading mode will give you a better shot at reconciling this conflict in a good way, allowing you to create the best reading space possible within the constraints. Compromises would become more deliberate and (hopefully) less damaging. A second, less obvious, moral question is this: when should we use design to encourage a more appropriate way of interacting than the user’s default? In my article about meta-moments, I looked at some ways to slow the user’s experience (using roadblocks, speed bumps, or diversions in the design) when thoughtfulness is needed. The user could be agreeing to give a third party access to their data. Or they might be remortgaging their house. On these kinds of occasions, it’s right to encourage a slower, more attentive mode. One way to approach this question is to capture the interaction modes that your users want or need, on a chart like this:
A chart. More detail in the following footnote link.
Fig. 3 detailed description
To start off, this could just be an expert-led evaluation or a group forced ranking exercise along the following lines:
  1. Generate a list of interaction modes. Words ending in “-ing” can be useful.
  2. Score for how much thoughtfulness is currently required.
  3. Score for how much thoughtfulness is currently encouraged.
  4. Get charting.
Ideally, you want all your modes to fall close to a diagonal line that stretches from top right to bottom left of your chart. The amount of thoughtfulness we encourage (or leave space for) in each mode would then be roughly in proportion to the amount of thought required.
A chart. More detail in the following footnote link.
Fig. 4 detailed description
This ethical line makes it look a little like there is only one way to get things right: precisely x amount of thought required = x amount of thought encouraged. It’s hard (and maybe impossible) to measure such things precisely, but that shouldn’t put us off, considering the relative needs in play. If nothing else, the visualization reminds us that there are many more ways to get design wrong than to get it right. If you find the interaction modes you’re currently supporting don’t fall on the ethical line, you’ll want to move them with your team’s next design effort.
A chart. More detail in the following footnote link.
Fig. 5 detailed description
Ultimately, deciding where to move interaction modes requires a degree of honest reflection and the willingness to shift any annoying or abusive experiences toward assisting or awakening ones.
A chart. More detail in the following footnote link.
Fig. 6 detailed description
While most designers want to assist and awaken (and avoid abusing or annoying) our users, even a small team will have disagreements about the right path to take. And powerful drivers outside the team, from business models to technology trends, heavily influence design decisions. These factors need our greatest focus if we are to resist following zombie patterns (“our competitors have introduced sexy new feature X so we’d better do it too”) or the tendency to pander (often in the name of UX) to short-term interests. You might be rightly proud of being able to offer a current account opening experience that only takes minutes, but are you sure that this is right for everyone you’re offering it to? Don’t some folks need some extra guidance around how to configure things? Don’t people need to understand the commitments they’re making? Adding this kind of consideration to the design process can pull against the push for speed and help designers resist the deployment of persuasion techniques in inappropriate contexts.

Where to draw the line

But how can we know which techniques are inappropriate? It can be hard to make a call on this. For example, testimonials and reviews can help reassure the user and build trust. They could also discourage independent research, but they’re hardly an abusive play.
A chart. More detail in the following footnote link.
Fig. 7 detailed description
Someone might want to place a pop-up ad directly in the middle of your user’s reading experience, arguing that it is encouraging thoughtfulness about a product that would be in the user’s long-term interests to know about in that context. And they might be right. For me, this is where good research comes in. We need to know the contexts for which more thoughtful engagement is appropriate to help our users achieve their long-term goals. We have to test our designs for whether they deliver comprehension and fair outcomes over a long timeframe. Only then can we know which of our nudges fall on the ethical line.

Getting over your squeamishness

You might feel squeamish about defining your users’ best interests for them. How do we dare to presume? Shouldn’t we just lay the facts and choices out there, and let people make all the decisions themselves? I think there are two solid reasons for getting over this squeamishness. First, design decisions have moral consequences whether you intend them or not. As Tristan Harris puts it, “If you control the menu, you control the choices.” It is better, therefore, to make your decisions with some deliberateness and transparency. Second, people are not as individual as we like to think we are. There are common misconceptions that lead to poor choices. To support choosing the right mortgage, for example, designers might reasonably seek out the things that typically trip people up (and are important to know). We need to convey the mechanics of the product: how the interest gets calculated, where fees and charges might come into play, and so on. We do this so we can be sure that the user knows their commitments and that they have the best possible chance of selecting something that meets their long-term needs.

Bringing dark patterns into the light

To help prioritize these considerations, you might add an understanding or clarifying mode to your chart. Just adding it to the chart will help get designing for comprehension on the agenda. Making space for this conversation will help force dark patterns into the light. Where things are less clear-cut, we might at least acknowledge the need for further research to help add in richer consideration of users and their needs.

<![CDATA[ Taming Data with JavaScript ]]> . 17 abr 2019 13:51:16.A List Apart: The Full Feed.

I love data. I also love JavaScript. Yet, data and client-side JavaScript are often considered mutually exclusive. The industry typically sees data processing and aggregation as a back-end function, while JavaScript is just for displaying the pre-aggregated data. Bandwidth and processing time are seen as huge bottlenecks for dealing with data on the client side. And, for the most part, I agree. But there are situations where processing data in the browser makes perfect sense. In those use cases, how can we be successful?

Think about the data

Working with data in JavaScript requires both complete data and an understanding of the tools available without having to make unnecessary server calls. It helps to draw a distinction between trilateral data and summarized data. Trilateral data consists of raw, transactional data. This is the low-level detail that, by itself, is nearly impossible to analyze. On the other side of the spectrum you have your summarized data. This is the data that can be presented in a meaningful and thoughtful manner. We’ll call this our composed data. Most important to developers are the data structures that reside between our transactional details and our fully composed data. This is our “sweet spot.” These datasets are aggregated but contain more than what we need for the final presentation. They are multidimensional in that they have two or more different dimensions (and multiple measures) that provide flexibility for how the data can be presented. These datasets allow your end users to shape the data and extract information for further analysis. They are small and performant, but offer enough detail to allow for insights that you, as the author, may not have anticipated. Getting your data into perfect form so you can avoid any and all manipulation in the front end doesn’t need to be the goal. Instead, get the data reduced to a multidimensional dataset. Define several key dimensions (e.g., people, products, places, and time) and measures (e.g., sum, count, average, minimum, and maximum) that your client would be interested in. Finally, present the data on the page with form elements that can slice the data in a way that allows for deeper analysis. Creating datasets is a delicate balance. You’ll want to have enough data to make your analytics meaningful without putting too much stress on the client machine. This means coming up with clear, concise requirements. Depending on how wide your dataset is, you might need to include a lot of different dimensions and metrics. A few things to keep in mind:
  • Is the variety of content an edge case or something that will be used frequently? Go with the 80/20 rule: 80% of users generally need 20% of what’s available.
  • Is each dimension finite? Dimensions should always have a predetermined set of values. For example, an ever-increasing product inventory might be too overwhelming, whereas product categories might work nicely.
  • When possible, aggregate the data—dates especially. If you can get away with aggregating by years, do it. If you need to go down to quarters or months, you can, but avoid anything deeper.
  • Less is more. A dimension that has fewer values is better for performance. For instance, take a dataset with 200 rows. If you add another dimension that has four possible values, the most it will grow is 200 * 4 = 800 rows. If you add a dimension that has 50 values, it’ll grow 200 * 50 = 10,000 rows. This will be compounded with each dimension you add.
  • In multidimensional datasets, avoid summarizing measures that need to be recalculated every time the dataset changes. For instance, if you plan to show averages, you should include the total and the count. Calculate averages dynamically. This way, if you are summarizing the data, you can recalculate averages using the summarized values.
Make sure you understand the data you’re working with before attempting any of the above. You could make some wrong assumptions that lead to misinformed decisions. Data quality is always a top priority. This applies to the data you are both querying and manufacturing. Never take a dataset and make assumptions about a dimension or a measure. Don’t be afraid to ask for data dictionaries or other documentation about the data to help you understand what you are looking at. Data analysis is not something that you guess. There could be business rules applied, or data could be filtered out beforehand. If you don’t have this information in front of you, you can easily end up composing datasets and visualizations that are meaningless or—even worse—completely misleading. The following code example will help explain this further. Full code for this example can be found on GitHub.

Our use case

For our example we will use BuzzFeed’s dataset from “Where U.S. Refugees Come From—and Go—in Charts.” We’ll build a small app that shows us the number of refugees arriving in a selected state for a selected year. Specifically, we will show one of the following depending on the user’s request:
  • total arrivals for a state in a given year;
  • total arrivals for all years for a given state;
  • and total arrivals for all states in a given year.
The UI for selecting your state and year would be a simple form:
Our UI for our data input
The code will:
  1. Send a request for the data.
  2. Convert the results to JSON.
  3. Process the data.
  4. Log any errors to the console. (Note: To ensure that step 3 does not execute until after the complete dataset is retrieved, we use the then method and do all of our data processing within that block.)
  5. Display results back to the user.
We do not want to pass excessively large datasets over the wire to browsers for two main reasons: bandwidth and CPU considerations. Instead, we’ll aggregate the data on the server with Node.js. Source data:
[{"year":2005,"origin":"Afghanistan","dest_state":"Alabama","dest_city":"Mobile","arrivals":0},
{"year":2006,"origin":"Afghanistan","dest_state":"Alabama","dest_city":"Mobile","arrivals":0},
... ]
Multidimensional Data:
[{"year": 2005, "state": "Alabama","total": 1386}, 
 {"year": 2005, "state": "Alaska", "total": 989}, 
... ]
Transactional Details show several items with Year, Origin, Destination, City, and Arrivals. This is filtered through semi-aggregate data: By Year, By State, and Total. In the final column, we see a table with the fully composed data resulting from running the Transactional Details through the semi-aggregate data.

How to get your data structure into place

AJAX and the Fetch API

There are a number of ways with JavaScript to retrieve data from an external source. Historically you would use an XHR request. XHR is widely supported but is also fairly complex and requires several different methods. There are also libraries like Axios or jQuery’s AJAX API. These can be helpful to reduce complexity and provide cross-browser support. These might be an option if you are already using these libraries, but we want to opt for native solutions whenever possible. Lastly, there is the more recent Fetch API. This is less widely supported, but it is straightforward and chainable. And if you are using a transpiler (e.g., Babel), it will convert your code to a more widely supported equivalent. For our use case, we’ll use the Fetch API to pull the data into our application:
window.fetchData = window.fetchData || {};
  fetch('./data/aggregate.json')
  .then(response => {
      // when the fetch executes we will convert the response
      // to json format and pass it to .then()
      return response.json();
  }).then(jsonData => {
      // take the resulting dataset and assign to a global object
      window.fetchData.jsonData = jsonData;
  }).catch(err => {
      console.log("Fetch process failed", err);
  });
This code is a snippet from the main.js in the GitHub repo The fetch() method sends a request for the data, and we convert the results to JSON. To ensure that the next statement doesn’t execute until after the complete dataset is retrieved, we use the then() method and do all our data processing within that block. Lastly, we console.log() any errors. Our goal here is to identify the key dimensions we need for reporting—year and state—before we aggregate the number of arrivals for those dimensions, removing country of origin and destination city. You can refer to the Node.js script /preprocess/index.js from the GitHub repo for more details on how we accomplished this. It generates the aggregate.json file loaded by fetch() above.

Multidimensional data

The goal of multidimensional formatting is flexibility: data detailed enough that the user doesn’t need to send a query back to the server every time they want to answer a different question, but summarized so that your application isn’t churning through the entire dataset with every new slice of data. You need to anticipate the questions and provide data that formulates the answers. Clients want to be able to do some analysis without feeling constrained or completely overwhelmed. As with most APIs, we’ll be working with JSON data. JSON is a standard that is used by most APIs to send data to applications as objects consisting of name and value pairs. Before we get back to our use case, let’s look at a sample multidimensional dataset:
const ds = [{
  "year": 2005,
  "state": "Alabama",
  "total": 1386,
  "priorYear": 1201
}, {
  "year": 2005,
  "state": "Alaska",
  "total": 811,
  "priorYear": 1541
}, {
  "year": 2006,
  "state": "Alabama",
  "total": 989,
  "priorYear": 1386
}];
With your dataset properly aggregated, we can use JavaScript to further analyze it. Let’s take a look at some of JavaScript’s native array methods for composing data.

How to work effectively with your data via JavaScript

Array.filter()

The filter() method of the Array prototype (Array.prototype.filter()) takes a function that tests every item in the array, returning another array containing only the values that passed the test. It allows you to create meaningful subsets of the data based on select dropdown or text filters. Provided you included meaningful, discrete dimensions for your multidimensional dataset, your user will be able to gain insight by viewing individual slices of data.
ds.filter(d => d.state === "Alabama");

// Result
[{
  state: "Alabama",
  total: 1386,
  year: 2005,
  priorYear: 1201
},{
  state: "Alabama",
  total: 989,
  year: 2006,
  priorYear: 1386
}]

Array.map()

The map() method of the Array prototype (Array.prototype.map()) takes a function and runs every array item through it, returning a new array with an equal number of elements. Mapping data gives you the ability to create related datasets. One use case for this is to map ambiguous data to more meaningful, descriptive data. Another is to take metrics and perform calculations on them to allow for more in-depth analysis. Use case #1—map data to more meaningful data:
ds.map(d => (d.state.indexOf("Alaska")) ? "Contiguous US" : "Continental US");

// Result
[
  "Contiguous US", 
  "Continental US", 
  "Contiguous US"
]
Use case #2—map data to calculated results:
ds.map(d => Math.round(((d.priorYear - d.total) / d.total) * 100));

// Result
[-13, 56, 40]

Array.reduce()

The reduce() method of the Array prototype (Array.prototype.reduce()) takes a function and runs every array item through it, returning an aggregated result. It’s most commonly used to do math, like to add or multiply every number in an array, although it can also be used to concatenate strings or do many other things. I have always found this one tricky; it’s best learned through example. When presenting data, you want to make sure it is summarized in a way that gives insight to your users. Even though you have done some general-level summarizing of the data server-side, this is where you allow for further aggregation based on the specific needs of the consumer. For our app we want to add up the total for every entry and show the aggregated result. We’ll do this by using reduce() to iterate through every record and add the current value to the accumulator. The final result will be the sum of all values (total) for the array.
ds.reduce((accumulator, currentValue) => 
accumulator + currentValue.total, 0);

// Result
3364

Applying these functions to our use case

Once we have our data, we will assign an event to the “Get the Data” button that will present the appropriate subset of our data. Remember that we have several hundred items in our JSON data. The code for binding data via our button is in our main.js:
 document.getElementById("submitBtn").onclick =
  function(e){
      e.preventDefault();
      let state = document.getElementById("stateInput").value || "All"
      let year = document.getElementById("yearInput").value || "All"
      let subset = window.fetchData.filterData(year, state);
      if (subset.length == 0  )
        subset.push({'state': 'N/A', 'year': 'N/A', 'total': 'N/A'})
      document.getElementById("output").innerHTML =
      `<table class="table">
        <thead>
          <tr>
            <th scope="col">State</th>
            <th scope="col">Year</th>
            <th scope="col">Arrivals</th>
          </tr>
        </thead>
        <tbody>
          <tr>
            <td>${subset[0].state}</td>
            <td>${subset[0].year}</td>
            <td>${subset[0].total}</td>
          </tr>
        </tbody>
      </table>`
  }
The final output once our code is applied
If you leave either the state or year blank, that field will default to “All.” The following code is available in /js/main.js. You’ll want to look at the filterData() function, which is where we keep the lion’s share of the functionality for aggregation and filtering.
// with our data returned from our fetch call, we are going to 
// filter the data on the values entered in the text boxes
fetchData.filterData = function(yr, state) {
  // if "All" is entered for the year, we will filter on state 
  // and reduce the years to get a total of all years
  if (yr === "All") {
    let total = this.jsonData.filter(
      // return all the data where state
      // is equal to the input box
      dState => (dState.state === state)
        .reduce((accumulator, currentValue) => {
          // aggregate the totals for every row that has 
          // the matched value
          return accumulator + currentValue.total;
        }, 0);

    return [{'year': 'All', 'state': state, 'total': total}];
  }

  ...

  // if a specific year and state are supplied, simply
  // return the filtered subset for year and state based 
  // on the supplied values by chaining the two function
  // calls together 
  let subset = this.jsonData.filter(dYr => dYr.year === yr)
    .filter(dSt => dSt.state === state);

  return subset; 
};

// code that displays the data in the HTML table follows this. See main.js.
When a state or a year is blank, it will default to “All” and we will filter down our dataset to that particular dimension, and summarize the metric for all rows in that dimension. When both a year and a state are entered, we simply filter on the values. We now have a working example where we:
  • Start with a raw, transactional dataset;
  • Create a semi-aggregated, multidimensional dataset;
  • And dynamically build a fully composed result.
Note that once the data is pulled down by the client, we can manipulate the data in a number of different ways without having to make subsequent calls to the server. This is especially useful because if the user loses connectivity, they do not lose the ability to manipulate the data. This is useful if you are creating a progressive web app (PWA) that needs to be available offline. (If you are not sure if your web app should be a PWA, this article can help.) Once you get a firm handle on these three methods, you can create just about any analysis that you want on a dataset. Map a dimension in your dataset to a broader category and summarize using reduce. Combined with a library like D3, you can map this data into charts and graphs to allow a fully customizable data visualization.

Conclusion

This article gives a better sense of what is possible in JavaScript when working with data. As I mentioned, client-side JavaScript is in no way a substitute for translating and transforming data on the server, where the heavy lifting should be done. But by the same token, it also shouldn’t be completely ruled out when datasets are treated properly.
In late 2016, Gartner predicted that 30 percent of web browsing sessions would be done without a screen by 2020. Earlier the same year, Comscore had predicted that half of all searches would be voice searches by 2020. Though there’s recent evidence to suggest that the 2020 picture may be more complicated than these broad-strokes projections imply, we’re already seeing the impact that voice search, artificial intelligence, and smart software agents like Alexa and Google Assistant are making on the way information is found and consumed on the web. In addition to the indexing function that traditional search engines perform, smart agents and AI-powered search algorithms are now bringing into the mainstream two additional modes of accessing information: aggregation and inference. As a result, design efforts that focus on creating visually effective pages are no longer sufficient to ensure the integrity or accuracy of content published on the web. Rather, by focusing on providing access to information in a structured, systematic way that is legible to both humans and machines, content publishers can ensure that their content is both accessible and accurate in these new contexts, whether or not they’re producing chatbots or tapping into AI directly. In this article, we’ll look at the forms and impact of structured content, and we’ll close with a set of resources that can help you get started with a structured content approach to information design.

The role of structured content

In their recent book, Designing Connected Content, Carrie Hane and Mike Atherton define structured content as content that is “planned, developed, and connected outside an interface so that it’s ready for any interface.” A structured content design approach frames content resources—like articles, recipes, product descriptions, how-tos, profiles, etc.—not as pages to be found and read, but as packages composed of small chunks of content data that all relate to one another in meaningful ways. In a structured content design process, the relationships between content chunks are explicitly defined and described. This makes both the content chunks and the relationships between them legible to algorithms. Algorithms can then interpret a content package as the “page” I’m looking for—or remix and adapt that same content to give me a list of instructions, the number of stars on a review, the amount of time left until an office closes, and any number of other concise answers to specific questions. Structured content is already a mainstay of many types of information on the web. Recipe listings, for instance, have been based on structured content for years. When I search, for example, “bouillabaisse recipe” on Google, I’m provided with a standard list of links to recipes, as well as an overview of recipe steps, an image, and a set of tags describing one example recipe:
Google search results page for a bouillabaisse recipe including an image, numbered directions, and tags.
A “featured snippet” for allrecipes.com on the Google results page.
Google Structured Data Testing tool showing the markup for a bouillabaisse recipe website on the left half of the screen and the structured data attributes and values for structured content on the right half of the screen.
The same allrecipes.com page viewed in Google’s Structured Data Testing Tool. The pane on the right shows the machine-readable values.
This “featured snippet” view is possible because the content publisher, allrecipes.com, has broken this recipe into the smallest meaningful chunks appropriate for this subject matter and audience, and then expressed information about those chunks and the relationships between them in a machine-readable way. In this example, allrecipes.com has used both semantic HTML and linked data to make this content not merely a page, but also legible, accessible data that can be accurately interpreted, adapted, and remixed by algorithms and smart agents. Let’s look at each of these elements in turn to see how they work together across indexing, aggregation, and inference contexts.

Software agent search and semantic HTML

Semantic HTML is markup that communicates information about the meaningful relationships between document elements, as opposed to simply describing how they should look on screen. Semantic elements such as heading tags and list tags, for instance, indicate that the text they enclose is a heading (<h1>) for the set of list items (<li>) in the ordered list (<ol>) that follows.
A combined HTML code editor and preview window showing markup and results for heading, ordered list, and list item HTML tags.
HTML structured in this way is both presentational and semantic because people know what headings and lists look like and mean, and algorithms can recognize them as elements with defined, interpretable relationships. HTML markup that focuses only on the presentational aspects of a “page” may look perfectly fine to a human reader but be completely illegible to an algorithm. Take, for example, the City of Boston website, redesigned a few years ago in collaboration with top-tier design and development partners. If I want to find information about how to pay a parking ticket, a link from the home page takes me directly to the “How to Pay a Parking Ticket” screen (scrolled to show detail):
The City of Boston website's “How to Pay a Parking Ticket” page, showing a tabbed view of ways to pay and instructions for the first of those ways, paying online.
As a human reading this page, I easily understand what my options are for paying: I can pay online, in person, by mail, or over the phone. If I ask Google Assistant how to pay a parking ticket in Boston, however, things get a bit confusing:
Google Assistant app on iPhone with the results of a “how do I pay a parking ticket in Boston” query, showing results only weakly related to the intended content.
None of the links provided in the Google Assistant results take me directly to the “How to Pay a Parking Ticket” page, nor do the descriptions clearly let me know I’m on the right track. (I didn’t ask about requesting a hearing.) This is because the content on the City of Boston parking ticket page is styled to communicate content relationships visually to human readers but is not structured semantically in a way that also communicates those relationships to inquisitive algorithms. The City of Seattle’s “Pay My Ticket” page, though it lacks the polished visual style of Boston’s site, also communicates parking ticket payment options clearly to human visitors:
The City of Seattle website‘s “Pay My Ticket” page, showing four methods to pay a parking ticket in a simple, all-text layout.
The equivalent Google Assistant search, however, offers a much more helpful result than we see with Boston. In this case, the Google Assistant result links directly to the “Pay My Ticket” page and also lists several ways I can pay my ticket: online, by mail, and in person.
Google Assistant app on iPhone with the results of a “how do I pay a parking ticket in Seattle” query, showing nearly the same results as on the desktop web page referenced above.
Despite the visual simplicity of the City of Seattle parking ticket page, it more effectively ensures the integrity of its content across contexts because it’s composed of structured content that is marked up semantically. “Pay My Ticket” is a level-one heading (<h1>), and each of the options below it are level-two headings (<h2>), which indicate that they are subordinate to the level-one element.
The City of Seattle website’s “Pay My Ticket” page, with the HTML heading elements outlined and labeled for illustration.
These elements, when designed well, communicate information hierarchy and relationships visually to readers, and semantically to algorithms. This structure allows Google Assistant to reasonably surmise that the text in these <h2> headings represents payment options under the <h1> heading “Pay My Ticket.” While this use of semantic HTML offers distinct advantages over the “page display” styling we saw on the City of Boston’s site, the Seattle page also shows a weakness that is typical of manual approaches to semantic HTML. You’ll notice that, in the Google Assistant results, the “Pay by Phone” option we saw on the web page was not listed. If we look at the markup of this page, we can see that while the three options found by Google Assistant are wrapped in both <strong> and <h2> tags, “Pay by Phone” is only marked up with an <h2>. This irregularity in semantic structure may be what’s causing Google Assistant to omit this option from its results.
The City of Seattle website’s 'Pay My Ticket' page, with two HTML heading elements outlined and labeled for illustration, and an open inspector panel, where we can see that the headings look the same to viewers but are marked up differently in the code.
Although each of these elements would look the same to a sighted human creating this page, the machine interpreting it reads a difference. While WYSIWYG text entry fields can theoretically support semantic HTML, in practice they all too often fall prey to the idiosyncrasies of even the most well-intentioned content authors. By making meaningful content structure a core element of a site’s content management system, organizations can create semantically correct HTML for every element, every time. This is also the foundation that makes it possible to capitalize on the rich relationship descriptions afforded by linked data.

Linked data and content aggregation

In addition to finding and excerpting information, such as recipe steps or parking ticket payment options, search and software agent algorithms also now aggregate content from multiple sources by using linked data. In its most basic form, linked data is “a set of best practices for connecting structured data on the web.” Linked data extends the basic capabilities of semantic HTML by describing not only what kind of thing a page element is (“Pay My Ticket” is an <h1>), but also the real-world concept that thing represents: this <h1> represents a “pay action,” which inherits the structural characteristics of “trade actions” (the exchange of goods and services for money) and “actions” (activities carried out by an agent upon an object). Linked data creates a richer, more nuanced description of the relationship between page elements, and it provides the structural and conceptual information that algorithms need to meaningfully bring data together from disparate sources. Say, for example, that I want to gather more information about two recommendations I’ve been given for orthopedic surgeons. A search for a first recommendation, Scott Ruhlman, MD, brings up a set of links as well as a Knowledge Graph info box containing a photo, location, hours, phone number, and reviews from the web.
Google search results page for Scott Ruhlman, MD, showing a list of standard links and an info box with an image, a map, ratings, an address, and reviews information.
If we run Dr. Ruhlman’s Swedish Hospital profile page through Google’s Structured Data Testing Tool, we can see that content about him is structured as small, discrete elements, each of which is marked up with descriptive types and attributes that communicate both the meaning of those attributes’ values and the way they fit together as a whole—all in a machine-readable format.
Google Structured Data Testing tool, showing the markup for Dr. Ruhlman's profile page on the left half of the screen, and the structured data attributes and values for the structured content on that page on the right half of the screen.
In this example, Dr. Ruhlman’s profile is marked up with microdata based on the schema.org vocabulary. Schema.org is a collaborative effort backed by Google, Yahoo, Bing, and Yandex that aims to create a common language for digital resources on the web. This structured content foundation provides the semantic base on which additional content relationships can be built. The Knowledge Graph info box, for instance, includes Google reviews, which are not part of Dr. Ruhlman’s profile, but which have been aggregated into this overview. The overview also includes an interactive map, made possible because Dr. Ruhlman’s office location is machine-readable.
Google search results info box for Dr. Ruhlman, showing an photo; a map; ratings; an address; reviews; buttons to ask a question, leave a review, and add a photo; and other people searched for.
The search for a second recommendation, Stacey Donion, MD, provides a very different experience. Like the City of Boston site above, Dr. Donion’s profile on the Kaiser Permanente website is perfectly intelligible to a sighted human reader. But because its markup is entirely presentational, its content is virtually invisible to software agents.
Google search results page for Dr. Donion, showing a list of standard links for Dr. Donion, and a 'Did you mean: Dr Stacy Donlon MD' link at the top. There is a Google info box, as with the previous search results page example. But in this case the box does not display information about the doctor we searched for, Dr. Donion, but rather for 'Kaiser Permanente Orthopedics: Morris Joseph MD.'
In this example, we can see that Google is able to find plenty of links to Dr. Donion in its standard index results, but it isn’t able to “understand” the information about those sources well enough to present an aggregated result. In this case, the Knowledge Graph knows Dr. Donion is a Kaiser Permanente physician, but it pulls in the wrong location and the wrong physician’s name in its attempt to build a Knowledge Graph display. You’ll also notice that while Dr. Stacey Donion is an exact match in all of the listed search results—which are numerous enough to fill the first results page—we’re shown a “did you mean” link for a different doctor. Stacy Donlon, MD, is a neurologist who practices at MultiCare Neuroscience Center, which is not affiliated with Kaiser Permanente. Multicare does, however, provide semantic and linked data-rich profiles for their physicians.

Voice queries and content inference

The increasing prevalence of voice as a mode of access to information makes providing structured, machine-intelligible content all the more important. Voice and smart software agents are not just freeing users from their keyboards, they’re changing user behavior. According to LSA Insider, there are several important differences between voice queries and typed queries. Voice queries tend to be:
  • longer;
  • more likely to ask who, what, and where;
  • more conversational;
  • and more specific.
In order to tailor results to these more specifically formulated queries, software agents have begun inferring intent and then using the linked data at their disposal to assemble a targeted, concise response. If I ask Google Assistant what time Dr. Ruhlman’s office closes, for instance, it responds, “Dr. Ruhlman’s office closes at 5 p.m.,” and displays this result:
Google Assistant app on iPhone with the results of a “what time does dr. ruhlman office close” query. The results displayed include a card with “8:30AM–5:00PM” and the label, “Dr. Ruhlman Scott MD, Tuesday hours,” as well as links to call the office, search on Google, get directions, and visit a website. Additionally, there are four buttons labeled with the words “directions,” “phone number,” and “address,” and a thumbs-up emoji.
These results are not only aggregated from disparate sources, but are interpreted and remixed to provide a customized response to my specific question. Getting directions, placing a phone call, and accessing Dr. Ruhlman’s profile page on swedish.org are all at the tips of my fingers. When I ask Google Assistant what time Dr. Donion’s office closes, the result is not only less helpful but actually points me in the wrong direction. Instead of a targeted selection of focused actions to follow up on my query, I’m presented with the hours of operation and contact information for MultiCare Neuroscience Center.
Google Assistant app on iPhone with the results of a “what time does Doc Dr Stacey donion office close” query. The results displayed include a card with “8AM–5PM” and the label “MulitCare Neuroscience Center, Monday hours,” as well as links to call the office, search on Google, get directions, or visit a website.
MultiCare Neuroscience Center, you’ll recall, is where Dr. Donlon—the neuroscientist Google thinks I may be looking for, not the orthopedic surgeon I’m actually looking for—practices. Dr. Donlon’s profile page, much like Dr. Ruhlman’s, is semantically structured and marked up with linked data. To be fair, subsequent trials of this search did produce the generic (and partially incorrect) practice location for Dr. Donion (“Kaiser Permanente Orthopedics: Morris Joseph MD”). It is possible that through repeated exposure to the search term “Dr. Stacey Donion,” Google Assistant fine-tuned the responses it provided. The initial result, however, suggests that smart agents may be at least partially susceptible to the same availability heuristic that affects humans, wherein the information that is easiest to recall often seems the most correct. There’s not enough evidence in this small sample to support a broad claim that algorithms have “cognitive” bias, but even when we allow for potentially confounding variables, we can see the compounding problems we risk by ignoring structured content. “Donlon,” for example, may well be a more common name than “Donion” and may be easily mistyped on a QWERTY keyboard. Regardless, the Kaiser Permanente result we’re given above for Dr. Donion is for the wrong physician. Furthermore, in the Google Assistant voice search, the interaction format doesn’t verify whether we meant Dr. Donlon; it just provides us with her facility’s contact information. In these cases, providing clear, machine-readable content can only work to our advantage.

The business case for structured content design

In 2012, content strategist Karen McGrane wrote that “you don’t get to decide which platform or device your customers use to access your content: they do.” This statement was intended to help designers, strategists, and businesses prepare for the imminent rise of mobile. It continues to ring true for the era of linked data. With the growing prevalence of smart assistants and voice-based queries, an organization’s website is less and less likely to be a potential visitor’s first encounter with rich content. In many cases—such as finding location information, hours, phone numbers, and ratings—this pre-visit engagement may be a user’s only interaction with an information source. These kinds of quick interactions, however, are only one small piece of a much larger issue: linked data is increasingly key to maintaining the integrity of content online. The organizations I’ve used as examples, like the hospitals, government agencies, and colleges I’ve consulted with for years, don’t measure the success of their communications efforts in page views or ad clicks. Success for them means connecting patients, constituents, and community members with services and accurate information about the organization, wherever that information might be found. This communication-based definition of success readily applies to virtually any type of organization working to further its business goals on the web. The model of building pages and then expecting users to discover and parse those pages to answer questions, though time-tested in the pre-voice era, is quickly becoming insufficient for effective communication. It precludes organizations from participating in emergent patterns of information seeking and discovery. And—as we saw in the case of searching for information about physicians—it may lead software agents to make inferences based on insufficient or erroneous information, potentially routing customers to competitors who communicate more effectively. By communicating clearly in a digital context that now includes aggregation and inference, organizations are more effectively able to speak to their users where users actually are, be it on a website, a search engine results page, or a voice-controlled digital assistant. They are also able to maintain greater control over the accuracy of their messages by ensuring that the correct content can be found and communicated across contexts.

Getting started: who and how

Design practices that build bridges between user needs and technology requirements to meet business goals are crucial to making this vision a reality. Information architects, content strategists, developers, and experience designers all have a role to play in designing and delivering effective structured content solutions. Practitioners from across the design community have shared a wealth of resources in recent years on creating content systems that work for humans and algorithms alike. To learn more about implementing a structured content approach for your organization, these books and articles are a great place to start:

<![CDATA[ UX in the Age of Personalization ]]> . 17 abr 2019 13:51:16.A List Apart: The Full Feed.

If you listened to episode 180 of The Big Web Show, you heard two key themes: 1) personalization is now woven into much of the fabric of our digital technology, and 2) designers need to be much more involved in its creation and deployment. In my previous article we took a broad look at the first topic: the practice of harvesting user data to personalize web content, including the rewards (this website gets me!) and risks (creepy!). In this piece, we will take a more detailed look at the UX practitioner’s emerging role in personalization design: from influencing technology selection, to data modeling, to page-level implementation. And it’s high time we did.

A call to arms

Just as UX people took up the torch around content strategy years ago, there is a watershed moment quickly approaching for personalization strategy. Simply put, the technology in this space is far outpacing the design practice. For example, while “personalized” emails have been around forever (“Dear COOLIN, …”), it’s now estimated that some 45% of organizations [PDF] have attempted to personalize their homepage. If that scares you, it should: the same report indicated that fewer than a third think it’s actually working.
A bar chart showing the most commonly personalized experiences (in order of highest ranking to lowest): Email content at 71%, Home page at 45%, Landing pages at 37%, Interior pages at 28%, Product detail pages at 27%, Blog at 20%, Navigation at 18%, Search at 17%, Pricing at 14%, and App screens at 13%.
While good old “mail merge” personalization has been around forever, more organizations are now personalizing their website content. Source: Researchscape International survey of 300 marketing professionals from five countries, conducted February 22 to March 28, 2018.
As Jeff MacIntyre points out, “personalization failures are typically design failures.” Indeed, many personalization programs are still driven primarily out of marketing and IT departments, a holdover from the legacy of the inbound, “creepy” targeted ad. Fixing that model will require the same paradigm shift we’ve used to tackle other challenges in our field: intentionally moving design “upstream,” in this case to technology selection, data collection, and page-level implementation. That’s where you come in. In fact, if you’re anything like me, you’ve been doing this, quietly, already. Here are just a few examples of UX-specific tasks I’ve completed on recent design projects that had personalization aspects:
  • aligning personalization to the core content strategy;
  • working with the marketing team to understand goals and objectives;
  • identifying user segments (personas) that may benefit from personalized content;
  • drafting personalization use cases;
  • assisting the technical team with product selection;
  • helping to define the user data model, including first- and third-party sources;
  • wireframing personalized components in the information architecture;
  • taking inventory of existing content to repurpose for personalization;
  • writing or editing new personalized copy;
  • working with the design team to create personalized images;
  • developing a personalization editorial calendar and governance model;
  • helping to set up and monitor results from a personalization pilot;
  • partnering with the analytics team to make iterative improvements;
  • being a voice for the personalization program’s ethical standards;
  • and monitoring customer feedback to make sure people aren’t freaking the f* out.
Sound familiar? Many of these are simply variants on the same, user-centered tactics you’ve relied on for years. The difference now is that personalization creates a “third dimension” of complexity relative to audience and content. We’ll define that complexity further in two parts: technical design and information design. (We should note again that the focus of this article is personalizing web content, although many of the same principles also apply to email and native applications.)

Part 1: Personalization technical design

Influencing technology decisions

When clients or internal stakeholders come to you with a desire to “do personalization,” the first thing to ask is what does that mean. As you’ve likely noticed, the technology landscape has now matured to the point where you can “personalize” a digital experience based on just about anything, from basic geolocation to complex machine learning algorithms. What’s more, such features are increasingly baked into your own CMS or readily available from third-party plugins (see chart below). So defining what personalization is—and isn’t—is a critical first step. To accomplish this, I suggest asking two questions: 1) What data can you ethically collect on your users, and 2) which tactics best complement this data. Some capabilities may already exist in your current systems; some you may need to build into your future technology roadmap. The following is by no means an exhaustive list but highlights a few of the popular tactics out there today, and tools that support them:
Tactic Definition Examples
Geolocation Personalizing based on the physical location of the user, via a geolocation-enabled device or a web browser IP address (which can triangulate your position based on nearby wifi devices). Examples: If I’m in Washington, DC, show me promotions for DC. If I’m in Paris, show me promotions for Paris, in French.

Sample Tools: MaxMind, HTML5 API
Quizzes and Profile Info A simple, cost-effective way to gather first-party user data by asking basic questions to help assign someone to a segment. Often done as a layover “intercept” when the user arrives, which can then be modified based on a cookied profile. Generally must be exceptionally brief to be effective. Examples: Are you interested in our service for home use or business use? Are you in the market to buy or sell a house?
Campaign Source One of the most popular methods of personalization, it directs a user to a customized landing page based on incoming campaign data. Can be used for everything from passing a unique discount code to personalizing content on the entire site. Examples: Customize landing page based on incoming email campaigns, social media campaigns, and paid search campaigns.
Clicks or Pages Viewed Slightly more advanced approach to personalizing based on behavior; common on ecommerce. Examples: Products you previously viewed; suggested content you’ve recently been looking at.

Sample Tools: Dynamic Yield, Optimizely
SIC and NAICS Codes Standard Industrial Classification (SIC) and North American Industry Classification System (NAICS) for classifying industries based on a universal four-digit code, e.g., Manufacturing 2000–3999. Helpful for determining who is visiting you from a business location, based on incoming IP address. Examples: Show me a different message if I work in the fashion industry vs. hog farming.

Sample Tools: Marketo, Oracle (BlueKai), Demandbase
Geofencing Contextual personalization within a “virtual perimeter.” Establishes a fixed geographical boundary based on your device location, typically through RFID or GPS. Your device can then take an action when you enter or leave the location. Examples: Show me my boarding pass when I’m at the airport. Remind me about unused gift cards when I enter the store.

Sample Tools: Simpli.fi, Thinknear, Google Geofencing API.
Behavioral Profiling Add a user to a segment based on similar users who fall into that segment. Often combined with machine learning to identify new segments that humans wouldn’t be able to predict. Examples: Sitecore pattern cards, e.g., impulse purchaser, buys in bulk, bargain hunter; expedites shipping.
Machine Learning Identify patterns across large sets of data (often across channels) to better predict what a user will want. In theory, improves over time as algorithms “learn” from thousands of interactions. (Obvious downside: your site will need to support thousands of interactions.) Examples: Azure Machine Learning Studio, BloomReach (Hippo), Sitecore (xConnect, Cortex), Adobe Sensei.
As you can see, the best tactic(s) can vary dramatically based on your audience and how they interact with you. For example, if you’re a high-volume, B2C ecommerce site, you may have enough click-stream data to support useful personalized product recommendations. Conversely, if you’re a B2B business with a qualified lead model and fewer unique visitors, you may be better served by third-party data to help you tailor your message based on industry type (NAICS code) or geography. To help illustrate this idea, let’s do a quick mapping of tactics relative to visitor volume and session time:
A quadrant chart with Number of Visitors for the Y-Axis and Session Time for the X-Axis. In the top left quadrant (titled Advanced Segmentation) lie Geo-Fencing and Clicks or Pages Viewed. Directly between the top left and top right quadrant lies Behavioral Profiling. In the top right quadrant (titled Big Data 1-to-1) lies Machine Learning. In the bottom left quadrant (titled Basic Segmentation) lies Campaign Source, SIC/NASIC Codes, and Geo-Location. And finally in the bottom right quadrant (titled Basic Self Selection) lies Quizzes and Profile Info.
To find your personalization “sweet spot,” consider your audience in terms of volume (number of visits) and average attention span (time on site).
The good news here is that you needn’t have a massive data platform in place; you can begin to build audience profiles simply by asking users to self-identify via quizzes or profile info. But in either scenario, your goal is the same: help guide the technology decision toward a personalization approach that provides actual value to your audience, not “because we can.”

Part 2: Personalization information design

Personalization deliverables

Once you have a sense of the technical possibilities, it’s time to determine how the personalized experience will look. Let’s pretend we’re designing for a venture several of you inquired about in my previous article: Reindeer Hugs International. As the name implies, this is a nonprofit that provides hugs to reindeer. RHI recently set new business goals and wants to personalize the website to help achieve them.
The very reputable-looking logo of Reindeer Hugs International. It seems legit.
Seems reputable.
To address this goal, we propose four UX-specific deliverables:
  1. segments worksheet;
  2. campaigns worksheet;
  3. personalization wireframes;
  4. and personalization copy deck.
Following the technical model we discussed earlier, the first thing we do is define our audience based on existing site interaction patterns. We discover that RHI doesn’t get a ton of organic traffic, but they do have a reasonably active set of authenticated users (existing members) as well as some paid social media campaigns. Working with the marketing team, we propose personalizing the site for three high-potential segments, as follows:

Segments worksheet

Segment How to Identify Personalization Goal Messaging Strategy
Current Members Logged in or made guest contribution (track via cookie) Improve engagement with current members by 10% You’re a hugging rock star, but you can hug it out even more.
Non-member Males Inbound Facebook and Instagram campaigns Improve conversion with non-member males age 25–34 by 5% Make reindeer hugging manly again.
Non-member Parents Inbound Facebook and Instagram campaigns Improve conversion with non-member parents age 31–49 by 5% Reindeer hugging is great for the kids.
Next, let’s determine the specific value we could add for these segments when they come to the site. To do this, we’ll revisit a model that we looked at previously for the four personalization content types. This will help us organize the collective content or “campaign” we show each segment based on a specific personalization goal:
The four contrasting tasks at hand: Alert, Make Easier, Cross-Sell, and Enrich
A Personalization Content Model showing four flavors of personalized content.
For example, current members who are logged in might benefit from a “Make Easier” campaign of links to members-only content. Conversely, each of our three segments could benefit from a personalized “Cross-Sell” campaign to help generate awareness. Let’s capture our ideas like this:

Campaigns worksheet

Segment Alert Make Easier Cross-Sell Enrich
Current Members Geolocation Banner
Hugs needed in your area (displays to any user with location data).
Links for members who are logged in, such as to profile information, a member directory, and reindeer friends catalog. Capital Campaign
Generate awareness by audience (minimum three distinct messages).
Current Member Blog
Invest in creating original, hug-provoking content to further our brand.
Non-member Males Age 25–34 Non-Member CTA In the non-member experience, this will be replaced by a CTA. Thought Leadership
Demonstrate that we are the definitive source for reindeer hugs.
Non-member Parents Age 28–39

Personalization wireframes

Now let’s decide where on the site we want to run these personalized campaigns. This isn’t too dissimilar from the work you already do around templates and components, with the addition that we can now have personalized zones. You can think of these as blocks where the CMS (or third-party plugin) will be running a series of calculations to determine the user segment in real-time (or based on a previously cached profile). To get the most coverage, these are typically dropped in at the template level. Here are examples for our home page template and interior page template:
Two separate wireframes with corresponding colored boxes showing which portions of the page relate to each type of personalization.
Showing component-level “zoning” on homepage and landing page templates. The colors correspond to the personalization content type.
Everything in white is the non-personalized, or “static,” content, which never changes, regardless of who you are. The personalized zones themselves (color-coded based on our content model) will also have an underlying default or canonical content set that appears if the system doesn’t get a personalized match. (Note: this is also the version of the content that is typically indexed by search engines.) As you can see, an important rule of thumb is to personalize around the main content, not the entire page. There are a variety of reasons for this, including the risk of getting the audience wrong, effects on search indexing, and what’s known as the infinite content problem, i.e., can you realistically create content for every single audience on every single component? (Hint: no.) OK, we’re getting close! Finally, let’s look at what specifically we want the system to show in these slots. Based on our campaigns worksheet, we know how many permutations of content we need. We sit down with the creative team to design our targeted messages, including the copy, images, and calls to action. Here’s what the capital campaign (the blue zone) might look like for our three audiences:

Personalization copy deck

Reindeer Hugs International: Capital Campaign (Cross-Sell)
Element Definition Asset
Message A:
Current Member
Headline: Take Your Hugs to the Next Level

Copy: You’re a hugging expert. But did you know you could hug two reindeers at once?

Primary CTA: Sign up for our Two-for-One Hugs

Secondary CTA: Learn More
A young woman hugging a very handsome reindeer.
Source: Current-Member.jpg
Full-size render: 900x450
Thumbnail render: 300x200
Message B:
Real Men Hug
Headline: Real Men Hug Reindeer

Copy: Are you a real man?

Primary CTA: Prove It

Secondary CTA: [None]
A bearded man hugging another handsome reindeer.
Source: Man-Hug.jpg
Full-size render: 900x450
Thumbnail render: 300x200
Message C:
Parents with Young Kids
Headline: Looking for a fun activity to do with the kids?

Copy: Reindeer hugs are 100% kid-friendly and 200% environmentally-friendly.

Primary CTA: Shop Our Family Plan

Secondary CTA: Learn More
A young child happily hugging a cute, unthreatening reindeer
Source: Parents-Kids.jpg
Full-size render: 900x450
Thumbnail render: 300x200
That’s a pretty good start. We would want to follow a similar approach to detail our other three content campaigns, including alerts (e.g., hugs needed in your area), make easier (e.g., member shortcuts), and enrichment content (e.g., blog articles on latest reindeer fashions). When all the campaigns are up and running, we might expect the homepage to look something like this when seen by two different audiences, simultaneously, in real-time, in different browser sessions:
Two more detailed wireframes that show what the home page might look. On the left, one block has member links and info and another section has a members-only blog post. On the right, one block has a CTA on benefits that members get and a more general blog post.
Wireframes illustrating the anticipated homepage delivery to two distinct audiences: Current Member (left) and Non-Member Male 25–34 (right). If the system did not get an audience match, a default or non-personalized set of content would be shown.

Part 3: Advanced personalization techniques

Digital Experience Platforms

Of course, all of that work was fairly manual. If you are lucky enough to be working with an advanced DMP (Data Management Platform) or integrated DXP (Digital Experience Platform) then you have even more possibilities at your disposal. For example, machine learning and behavior profiling can help you discover segments over time that you might never have dreamed of (the study we referenced earlier showed that 26% of marketing programs have tried some form of algorithmic one-to-one approach; 68% still use rules-based targeting to segments). This can be enhanced via parametric scoring, where actioning off of multiple data inputs can help you create blends of audience types (in our example, a thirty-three-year-old dad might get 60 percent Parent and 40 percent Real Man … or whatever). Likewise, on the content side, content scoring can help you deliver more nuanced content. (For example, we might tag an article with 20 percent Reindeer Advocacy and 80 percent Hug Best Practices.) Platforms like Sitecore can even illustrate these metrics, like in this example of a pattern card:
Examples of a hexagonally shaped behavior diagram with the following personality traits at each corner clockwise from the top left: research, impulse purchase, returns merchandise, expedites shipping, bargain hunting, and buys in bulk.
The diagram at left shows how a particular user scores (some combination of research and returns merchandise). This most closely correlates to the “Neurotic Shopper” card, so we might show this user content on our free-returns policy. Source: The Berndt Group.

Cult of the complex

While all of that is super cool, even the most tech-savvy among us will benefit from starting out “simple,” lest you fall prey to the cult of the complex. The manual process of identifying your target audience and use cases, for example, is foundational to building an extensible personalization program, regardless of your tech stack. At a minimum, this approach will help you get buy-in from your team and organization vs. just telling everyone the site will be personalized in a “black box” somewhere. And even with the best-in-class products, I have yet to find seamless “one-click” personalization, where the system somehow magically does everything from finding audiences to pumping out content, all in real time. We’ll get there one day, perhaps. But, in the meantime, it’s up to you.
Kip Williams, professor of psychology sciences at Purdue University, conducted a fascinating experiment called “cyberball.” In his experiment, a test subject and two other participants played a computer game of catch. At a predetermined time, the test subject was excluded from the game, forcing them to only observe as the clock ran down.
From the cyberball game, three outlined figures playing catch. Player 1 is mid-throw to Player 3.
The experience showed increases in self-reported levels of anger and sadness, as well as lowering levels of the four needs. The digital version of the experiment created results that matched the results of the original physical one, meaning that these feelings occurred regardless of context. After the game was concluded, the test subject was told that the other participants were robots, not other human participants. Interestingly, the reveal of automated competitors did not lessen the negative feelings reported. In fact, it increased feelings of anger, while also decreasing participants’ sense of willpower and/or self-regulation. In other words: people who feel they are rejected by a digital system will feel hurt and have their sense of autonomy reduced, even when they believe there isn’t another human directly responsible.

So, what does this have to with browsers?

Every adjustment to the appearance and behavior of the features browsers let you manipulate is a roll of the dice, gambling on the delight of some at the expense of alienating others. When using a browser to navigate the web, there’s a lot of sameness, until there isn't. Most of the time we’re hopping from page-to-page and site-to-site, clicking links, pressing buttons, watching videos, filling out forms, writing messages, etc. But every once in awhile we stumble across something new and novel that makes us pause to figure out what’s going on. Every website and web app is its own self-contained experience, with its own ideas of how things should look and behave. Some are closer to others, but each one requires learning how to operate the interface to a certain degree. Some browsers can also have parts of their functionality and appearance altered, meaning that as with websites, there can be unexpected discrepancies. We’ll unpack some of the nuance behind some of these features, and more importantly, why most of them are better off left alone.

Scroll-to-top

All the major desktop browsers allow you to hit the Home key on the keyboard to jump to the top of the page. Some scrollbar implementations allow you to click on the top of the scrollbar area to do the same. Some browsers allow you to type Command+Up (macOS) / Ctrl+Up (Windows), as well. People who use assistive technology like screen readers can use things like banner landmarks to navigate the same way (provided they are correctly declared in the site’s HTML). However, not every device has an easily discoverable way to invoke this functionality: many laptops don’t have a Home key on their keyboard. The tap-the-clock-to-jump-to-the-top functionality on iOS is difficult to discover, and can be surprising and frustrating if accidentally activated. You need specialized browser extensions to recreate screen reader landmark navigation techniques. One commonly implemented UI solution for longer pages is the scroll-to-top button. It’s often fixed to the bottom-right corner of the screen. Activating this control will take the user to the top of the page, regardless of how far down they’ve scrolled. If your site features a large amount of content per page, it may be worth investigating this UI pattern. Try looking at analytics and/or conducting user tests to see where and how often this feature is used. The caveat being if it’s used too often, it might be worth taking a long, hard look at your information architecture and content strategy. Three things I like about the scroll-to-top pattern are:
  • Its functionality is pretty obvious (especially if properly labeled).
  • Provided it is designed well, it can provide a decent-sized touch target in a thumb-friendly area. For motor control considerations, its touch target can be superior to narrow scroll or status bars, which can make for frustratingly small targets to hit.
  • It does not alter or remove existing scroll behavior, augmenting it instead. If somebody is used to one way of scrolling to the top, you’re not overriding it or interrupting it.
If you’re implementing this sort of functionality, I have four requests to help make the experience work for everyone (I find the Smooth Scroll library to be a helpful starting place):
  • Honor user requests for reduced motion. The dramatic scrolling effect of whipping from the bottom of the page to the top may be a vestibular trigger, a situation where the system that controls your body’s sense of physical position and orientation in the world is disrupted, causing things like headaches, nausea, vertigo, migraines, and hearing loss.
  • Ensure keyboard focus is moved to the top of the document, mirroring what occurs visually. Applying this practice will improve all users’ experiences. Otherwise, hitting Tab after scrolling to the top would send the user down to the first interactive element that follows where the focus had been before they activated the scroll button.
  • Ensure the button does not make other content unusable by obscuring it. Be sure to account for when the browser is in a zoomed-in state, not just in its default state.
  • Be mindful of other fixed-position elements. I’ve seen my fair share of websites that also have a chatbot or floating action button competing to live in the same space.
A red chat icon overlaps with a corner of the scroll to top icon, obscuring a portion of the arrow.

Scrollbars

If you’re old enough to remember, it was once considered fashionable to style your website scrollbars. Internet Explorer allowed this customization via a series of vendor-specific properties. At best, they looked great! If the designer and developer were both skilled and detail-oriented, you’d get something that looked like a natural extension of the rest of the website. However, the stakes for a quality design were pretty high: scrollbars are part of an application’s interface, not a website’s. In inclusive design, it’s part of what we call external consistency. External consistency is the idea that an object’s functionality is informed and reinforced by similar implementations elsewhere. It’s why you can flip a wall switch in most houses and be guaranteed the lights come on instead of flushing the toilet. While scrollbars have some minor visual differences between operating systems (and operating system versions), they’re consistent externally in function. Scrollbars are also consistent internally, in that every window and program on the OS that requires scrolling has the same scrollbar treatment. If you customize your website's scrollbar colors, for less technologically literate people, yet another aspect of the interface has changed without warning or instruction on how to change it back. If the user is already confused about how things on the screen work, it’s one less familiar thing for them to cling to as stable and reliable. You might be rolling your eyes reading this, but I’d ask you to check out this incredible article by Jennifer Morrow instead. In it, she describes conducting a guerilla user test at a mall, only to have the session completely derailed when she discovers someone who has never used a computer before. What she discovers is as important as it is shocking. The gist of it is that some people (even those who have used a computer before) don’t understand the nuance of the various “layers” you navigate through to operate a computer: the hardware, the OS, the browser installed on the OS, the website the browser is displaying, the website’s modals and disclosure statements, etc. To them, the experience is flat. We should not expect these users to juggle this kind of cognitive overhead. These kinds of abstractions are crafted to be analogous to real-world objects, specifically so people can get what they want from a digital system without having to be programmers. Adding unnecessary complexity weakens these metaphors and gives users one less reference point to rely on. Remember the cyberball experiment. When a user is already in a distressed emotional state, our poorly-designed custom scrollbar might be the death-by-a-thousand-paper-cuts moment where they give up on trying to get what they want and reject the system entirely. While Morrow’s article was written in 2011, it’s just as relevant now as it was then. More and more people are using the internet globally, and more and more services integral to living daily life are getting digitized. It’s up to us as responsible designers and developers to be sure we make everyone, regardless of device, circumstance, or ability feel welcome. In addition to unnecessarily abandoning external consistency, there is the issue of custom scrollbar styling potentially not having sufficient color contrast. The too-light colors can create a situation where a person experiencing low-vision conditions won’t be able to perceive, and therefore operate, a website’s scrolling mechanism. This article won’t even begin to unpack the issues involved with custom implementations of scrollbars, where instead of theming the OS’s native scrollbars with CSS, one instead replaces them with a JavaScript solution. Trust me when I say I have yet to see one implemented in a way that could successfully and reliably recreate all features and functionality across all devices, OSes, browsers, and browsing modes. In my opinion? Don’t alter the default appearance of an OS’s scrollbars. Use that time to work on something else instead, say, checking for and fixing color contrast problems.

Scrolling

The main concern about altering scrolling behavior is one of consent: it’s taking an externally consistent, system-wide behavior and suddenly altering it without permission. The term scrolljacking has been coined to describe this practice. It is not to be confused with scrollytelling, a more considerate treatment of scrolling behavior that honors the OS’s scrolling settings. Altering the scrolling behavior on your website or web app can fly in the face of someone’s specific, expressed preferences. For some people, it’s simply an annoyance. For people with motor control concerns, it could make moving through a site difficult. In some extreme cases, the unannounced discrepancy between the amount of scrolling and the distance traveled can also be vestibular triggers. Another consideration is if your modified scrolling behavior accidentally locks out people who don’t use mice, touch, or trackpads to scroll. All in all, I think Robin Rendle said it best:
Scrolljacking, as I shall now refer to it both sarcastically and honestly, is a failure of the web designer’s first objective; it attacks a standardised pattern and greedily assumes control over the user’s input.

Highlighting

Another OS feature we’re permitted to style in the browser is highlighted text. Much like scrollbars, this is an interface element that is shared by all apps on the OS, not just the browser. Breaking the external consistency of the OS’s highlighting color has a lot of the same concerns as styled scrollbars, namely altering the expected behavior of something that functions reliably everywhere else. It’s potentially disorienting and alienating, and may deny someone’s expressed preferences. Some people highlight text as they read. If your custom highlight style has a low contrast ratio between the highlighted text color and the highlighted text’s background color, the person reading your website or web app may be unable to perceive the text they’re highlighting. The effect will cause the text to seemingly disappear as they try to read. Other people just may not care for your aesthetic sensibilities. Both macOS and Windows allow you to specify a custom highlight color. In a scenario where someone has deliberately set a preference other than the system default, a styled highlight color may override their stated specifications. For me, the potential risks far outweigh the vanity of a bespoke highlight style—better to just leave it be.

Text resizing

Lots of people change text size to suit their needs. And that’s a good thing. We want people to be able to read our content and act upon it, regardless of whatever circumstances they may be experiencing. For the problem of too-small text, some designers turn to text resizing widgets, a custom UI pattern that lets a person cycle through a number of preset CSS font-size values. Commonly found in places with heavy text content, text resizing widgets are often paired with complex, multicolumn designs. News sites are a common example. Before I dive into my concerns with text resizing widgets, I want to ask: if you find that your site needs a specialized widget to manage your text size, why not just take the simpler route and increase your base text size? Like many accessibility concerns, a request for a larger font size isn’t necessarily indicative of a permanent disability condition. It’s often circumstantial, such as a situation where you’re showing a website on your office’s crappy projector. Browsers allow users to change their preferred default font size, resizing text across websites accordingly. Browsers excel at handling this setting when you write CSS that takes advantage of unitless line-height values and relative font-size units. Some designers may feel that granting this liberty to users somehow detracts from their intended branding. Good designers understand that there’s more to branding than just how something looks. It’s about implementing the initial design in the browser, then working with the browser’s capabilities to best serve the person using it. Even if things like the font size are adjusted, a strong brand will still shine through with the ease of your user flows, quality of your typography and palette, strength of your copywriting, etc. Unfortunately, custom browser text resizing widgets lack a universal approach. If you rely on browser text settings, it just works—consistently, with the same controls, gestures, and keyboard shortcuts, for every page on every website, even in less-than-ideal conditions. You don’t have to write and maintain extra code, test for regressions, or write copy instructing the user on where to find your site’s text resizing widget and how to use it. Behavioral consistency is incredibly important. Browser text resizing is applied to all text on the page proportionately every time the setting is changed. These settings are also retained for the next time you visit. Not every custom text resizing widget does this, nor will it resize all content to the degree stipulated by the Web Content Accessibility Guidelines.

High-contrast themes

When I say high-contrast themes, I’m not talking about things like a dark mode. I’m talking about a response to people reporting that they need to change your website or web app’s colors to be more visually accessible to them. Much like text resizing controls, themes that are designed to provide higher contrast color values are perplexing: if you’re taking the time to make one, why not just fix the insufficient contrast values in your regular CSS? Effectively managing themes in CSS is a complicated, resource-intensive affair, even under ideal situations. Most site-provided high-contrast themes are static in that the designer or developer made decisions about which color values to use, which can be a problem. Too much contrast has been known to be a trigger for things like migraines, as well as potentially making it difficult to focus for users with some forms of attention-deficit hyperactivity disorder (ADHD). The contrast conundrum leads us to a difficult thing to come to terms with when it comes to accessibility: what works for one person may actually inhibit another. Because of this, it’s important to make things open and interoperable. Leave ultimate control up to the end user so they may decide how to best interact with content. If you are going to follow through on providing this kind of feature, some advice: model it after the Windows High Contrast mode. It’s a specialized Windows feature that allows a person to force a high color palette onto all aspects of the OS’s UI, including anything the browser displays. It offers four themes out of the box but also allows a user to suit their individual needs by specifying their own colors. Your high contrast mode feature should do the same. Offer a range of themes with different palettes, and let the user pick colors that work best for them—it will guarantee that if your offerings fail, people still have the ability to self-select.

Moving focus

Keyboard focus is how people who rely on input such as keyboards, switch controls, voice inputs, eye tracking, and other forms of assistive technology navigate and operate digital interfaces. While you can do things like use the autofocus attribute to move keyboard focus to the first input on a page after it loads, it is not recommended. For people experiencing low- and no-vision conditions, it is equivalent to being abruptly and instantaneously moved to a new location. It’s a confusing and disorienting experience—there’s a reason why there’s a trope in sci-fi movies of people vomiting after being teleported for the first time. For people with motor control concerns, moving focus without their permission means they may be transported to a place where they didn’t intend to go. Digging themselves out of this location becomes annoying at best and effort-intensive at worst. Websites without heading elements or document landmarks to serve as navigational aids can worsen this effect. This is all about consent. Moving focus is fine so long as a person deliberately initiates an action that requires it (shifting focus to an opened modal, for example). I don’t come to your house and force you to click on things, so don’t move my keyboard focus unless I specifically ask you to. Let the browser handle keyboard focus. Provided you use semantic markup, browsers do this well. Some tips:

The clipboard and browser history

The clipboard is sacred space. Don’t prevent people from copying things to it, and don’t append extra content to what they copy. The same goes for browser history and back and forward buttons. Don’t mess around with time travel, and just let the browser do its job.

Wrapping up

In the game part of cyberball, the fun comes from being able to participate with others, passing the ball back and forth. With the web, fun comes from being able to navigate through it. In both situations, fun stops when people get locked out, forced to watch passively from the sidelines. Fortunately, the web doesn’t have to be one long cyberball experiment. While altering the powerful, assistive technology-friendly features of browsers can enhance the experience for some users, it carries a great risk of alienating others if changes are made with ignorance about exactly how much will be affected. Remember that this is all in the service of what ultimately matters: creating robust experiences that allow people to successfully use your website or web app regardless of their ability or circumstance. Sometimes the best strategy is to let things be.

<![CDATA[ Designing for Conversions ]]> . 17 abr 2019 13:51:16.A List Apart: The Full Feed.

What makes creative successful? Creative work often lives in the land of feeling—we can say we like something, point to how happy the client is, or talk about how delighted users will be, but can’t objectively measure feelings. Measuring the success of creative work doesn’t have to stop with feeling. In fact, we can assign it numbers, do math with it, and track improvement to show clients objectively how well our creative is working for them. David Ogilvy once said, “If it doesn’t sell, it isn’t creative.” While success may not be a tangible metric for us, it is for our clients. They have hard numbers to meet, and as designers, we owe it to them to think about how our work can meet those goals. We can track sales, sure, but websites are ripe with other opportunities for measuring improvements. Designing for conversions will not only make you a more effective designer or copywriter, it will make you much more valuable to your clients, and that’s something we should all seek out.

Wait—what’s a conversion?

Before designing for conversions, let’s establish a baseline for what, exactly, we’re talking about. A conversion is an action taken by the user that accomplishes a business goal. If your site sells things, a conversion would be a sale. If you collect user information to achieve your business goals, like lead aggregation, it would be a form submission. Conversions can also be things like newsletter sign-ups or even hits on a page containing important information that you need users to read. You need some tangible action to measure the success of your site—that’s your conversion. Through analytics, you know how many people are coming to your site. You can use this to measure what percentage of users are converting. This number is your conversion rate, and it’s the single greatest metric for measuring the success of a creative change. In your analytics, you can set up goals and conversion funnels to track this for you (more on conversion funnels shortly). It doesn’t matter how slick that new form looks or how clever that headline is—if the conversion rate drops, it’s not a success. In fact, once you start measuring success by conversion rate, you’ll be surprised to see how even the cleverest designs applied in the wrong places can fail to achieve your goals. Conversions aren’t always a one-step process. Many of us have multi-step forms or long check-out processes where it can be very useful to track how far a user gets. It’s possible to set up multiple goals along the way so your analytics can give you this data. This is called a conversion funnel. Ideally, you’ll coordinate with the rest of your organization to get data beyond the website as well. For instance, changing button copy may lead to increased form submissions but a drop in conversions from lead to sale afterward. In this case, the button copy update probably confused users rather than selling them on the product. A good conversion funnel will safeguard against false positives like that. It’s also important to track the bounce rate, which is the percentage of users that hit a page and leave without converting or navigating to other pages. A higher bounce rate is an indication that there’s a mismatch between the user’s expectations when landing on your site and what they find once landing there. Bounce rate is really a part of the conversion funnel, and reducing bounce rate can be just as important as improving conversion rate.

Great. So how do we do that?

When I was first getting started in conversion-driven design, it honestly felt a little weird. It feels shady to focus obsessively on getting the user to complete an action. But this focus is in no way about tricking the user into doing something they don’t want to do—that’s a bad business model. As Gerry McGovern has commented, if business goals don’t align with customer goals, your business has no future. So if we’re not tricking users, what are we doing? Users come to your site with a problem, and they’re looking for a solution. The goal is to find users whose problems will be solved by choosing your product. With that in mind, improving the conversion rate doesn’t mean tricking users into doing something—it means showing the right users how to solve their problem. That means making two things clear: that your product will solve the user’s problem, and what the user must do to proceed. The first of these two points is the value proposition. This is how the user determines whether your product can solve his or her problem. It can be a simple description of the benefits, customer testimonials, or just a statement about what the product will do for the user. A page is not limited to one value proposition—it’s good to have several. (Hint: the page’s headline should almost always be a value proposition!) The user should be able to determine quickly why your product will be helpful in solving their problem. Once the value of your product has been made clear, you need to direct the user to convert with a call to action. A call to action tells the user what they must do to solve their problem—which, in your case, means to convert. Most buttons and links should be calls to action, but a bit of copy directly following a value proposition is a good place too. Users should never have to look around to find out what the next step is—it should be easy to spot and clear in its intention. Also, ease of access is a big success factor here. My team’s testing found that replacing a Request Information button (that pointed to a form page) with an actual form on every page significantly boosted the conversion rate. If you’re also trying to get information from a user, consider a big form at the top of the page so users can’t miss it. When they scroll down the page and are ready to convert, they remember the form and have no question as to what they have to do. So improving conversion rate (and, to some degree, decreasing bounce rate) is largely about adding clarity around the value proposition and call to action. There are other factors as well, like decreasing friction in the conversion process and improving performance, but these two things are where the magic happens, and conversion problems are usually problems with one of them.

So, value propositions…how do I do those?

The number one thing to remember when crafting a value proposition is that you’re not selling a product—you’re selling a solution. Value propositions begin with the user’s problem and focus on that. Users don’t care about the history of your company, how many awards you’ve won, or what clever puns you’ve come up with—they care about whether your product will solve their problem. If they don’t get the impression that it can do that, they will leave and go to a competitor. In my work with landing pages for career schools, we initially included pictures of people in graduation gowns and caps. We assumed that the most exciting part of going back to school was graduating. Data showed us that we were wrong. Our testing showed that photos of people doing the jobs they would be training for performed much better. In short, our assumption was that showing the product (the school) was more important than showing the benefit (a new career). The problem users were trying to solve wasn’t a diploma—it was a career, and focusing on the user showed a significant improvement in conversion rate. We had some clients that insisted on using their branding on the landing pages, including one school that wanted to use an eagle as their hero image because their main website had eagles everywhere. This absolutely bombed in conversions. No matter how strong or consistent your branding is, it will not outperform talking about users and their problems. Websites that get paid for clicks have mastered writing headlines this way. Clickbait headlines get a groan from copywriters—especially since they often use their powers for evil and not good—but there are some important lessons we can learn from them. Take this headline, for instance: Get an Associate’s degree in nursing Just like in the example above with the college graduates, we’re selling the product—not the benefit. This doesn’t necessarily show that we understand the user’s problem, and it does nothing to get them excited about our program. Compare that headline to this one: Is your job stuck in a rut? Get trained for a new career in nursing in only 18 months! In this case, we lead with the user’s problem. That immediately gets users’ attention. We then skip to a benefit: a quick turnaround. No time is wasted talking about the product—we save that for the body copy. The headline focuses entirely on the user. In your sign-up or check-out process, always lead with the information the user is most interested in. In our case, letting the user first select their school campus and area of study showed a significant improvement over leading with contact information. Similarly, put the less-exciting content last. In our testing, users were least excited about sharing their telephone number. Moving that field to be the last one in the form decreased form abandonment and improved conversions. As designers, be cognizant of what your copywriters are doing. If the headline is the primary value proposition (as it should be), make sure the headline is the focal point of your design. Ensure the messaging behind your design is in line with the messaging in the content. If there’s a disagreement in what the user’s problem is or how your product will solve that problem, the conversion rate will suffer. Once the value proposition has been clearly defined and stated, it’s time to focus on the call to action.

What about the call to action?

For conversion-driven sites, a good call to action is the most important component. If a user is ready to convert and has insufficient direction on how to do so, you lose a sale at 90 percent completion. It needs to be abundantly clear to the user how to proceed, and that’s where the call to action steps in. When crafting a call to action, don’t be shy. Buttons should be large, forms should be hard to miss, and language should be imperative. A call to action should be one of the first things the user notices on the page, even if he or she won’t be thinking about it again until after doing some research on the page. Having the next step right in front of the user vastly increases the chance of conversion, so users need to know that it’s there waiting. That said, a call to action should never get in the way of a value proposition. I see this all the time: a modal window shows as soon as I get to a site, asking me to subscribe to their mailing list before I have an inkling of the value the site can give me. I dismiss these without looking, and that call to action is completely missed. Make it clear how to convert, and make it easy, but don’t ask for a conversion before the user is ready. For situations like the one above, a better strategy might be asking me to subscribe as I exit the site; marketing to visitors who are leaving has been shown to be effective. In my former team’s tests, there were some design choices that could improve calls to action. For instance, picking a bright color that stood out from the rest of the site for the submit button did show an improvement in conversions, and reducing clutter around the call to action improved conversion rates by 232%. But most of the gains here were in either layout or copy; don’t get so caught up in minor design changes that you ignore more significant changes like these. Ease of access is another huge factor to consider. As mentioned above, when my team was getting started, we had a Request Information link in the main navigation and a button somewhere on the page that would lead the user to the form. The single biggest positive change we saw involved putting a form at the top of every page. For longer forms, we would break this form up into two or three steps, but having that first step in sight was a huge improvement, even if one click doesn’t seem like a lot of effort. Another important element is headings. Form headings should ask the user to do something. It’s one thing to label a form “Request Information”; it’s another to ask them to “Request Information Now.” Simply adding action words, like “now” or “today,” can change a description into an imperative action and improve conversion rates. With submit buttons, always take the opportunity to communicate value. The worst thing you can put on a submit button is the word “Submit.” We found that switching this button copy out with “Request Information” showed a significant improvement. Think about the implied direction of the interaction. “Submit” implies the user is giving something to us; “Request Information” implies we’re giving something to the user. The user is already apprehensive about handing over their information—communicate to them that they’re getting something out of the deal. Changing phrasing to be more personal to the user can also be very effective. One study showed that writing button copy in first person—for instance, “Create My Account” versus “Create Your Account”—showed a significant boost in conversions, boosting click-through rates by 90%. Users today are fearful that their information will be used for nefarious purposes. Make it a point to reassure them that their data is safe. Our testing showed that the best way to do this is to add a link to the privacy policy (“Your information is secure!”) with a little lock icon right next to the submit button. Users will often skip right over a small text link, so that lock icon is essential—so essential, in fact, that it may be more important than the privacy policy itself. I’m somewhat ashamed to admit this, but I once forgot to create a page for the privacy policy linked to from a landing page, so that little lock icon linked out to a 404. I expected a small boost in conversions when I finally uploaded the privacy policy, but nope—nobody noticed. Reassurance is a powerful thing.

Measure, measure, measure

One of the worst things you can do is push out a creative change, assume it’s great, and move on to the next task. A/B testing is ideal and will allow you to test a creative change directly against the old creative, eliminating other variables like time, media coverage, and anything else you might not be thinking of. Creative changes should be applied methodically and scientifically—just because two or three changes together show an improvement in conversion rate doesn’t mean that one of them wouldn’t perform better alone. Measuring tangible things like conversion rate not only helps your client or business, but can also give new purpose to your designs and creative decisions. It’s a lot easier to push for your creative decisions when you have hard data to back up why they’re the best choice for the client or project. Having this data on hand will give you more authority in dealing with clients or marketing folks, which is good for your creative and your career. If my time in the design world has taught me anything, it’s that, in the realm of creativity, certainty can be hard to come by. So, perhaps most importantly, objective measures of success give you, and your client, the reassurance that you’re doing the right thing.

<![CDATA[ Semantics to Screen Readers ]]> . 17 abr 2019 13:51:16.A List Apart: The Full Feed.

As a child of the ’90s, one of my favorite movie quotes is from Harriet the Spy: “there are as many ways to live as there are people in this world, and each one deserves a closer look.” Likewise, there as many ways to browse the web as there are people online. We each bring unique context to our web experience based on our values, technologies, environments, minds, and bodies. Assistive technologies (ATs), which are hardware and software that help us perceive and interact with digital content, come in diverse forms. ATs can use a whole host of user input, ranging from clicks and keystrokes to minor muscle movements. ATs may also present digital content in a variety of forms, such as Braille displays, color-shifted views, and decluttered user interfaces (UIs). One more commonly known type of AT is the screen reader. Programs such as JAWS, Narrator, NVDA, and VoiceOver can take digital content and present it to users through voice output, may display this output visually on the user’s screen, and can have Braille display and/or screen magnification capabilities built in. If you make websites, you may have tested your sites with a screen reader. But how do these and other assistive programs actually access your content? What information do they use? We’ll take a detailed step-by-step view of how the process works. (For simplicity we’ll continue to reference “browsers” and “screen readers” throughout this article. These are essentially shorthands for “browsers and other applications,” and “screen readers and other assistive technologies,” respectively.)

The semantics-to-screen-readers pipeline

Accessibility application programming interfaces (APIs) create a useful link between user applications and the assistive technologies that wish to interact with them. Accessibility APIs facilitate communicating accessibility information about user interfaces (UIs) to the ATs. The API expects information to be structured in a certain way, so that whether a button is properly marked up in web content or is sitting inside a native app taskbar, a button is a button is a button as far as ATs are concerned. That said, screen readers and other ATs can do some app-specific handling if they wish. On the web specifically, there are some browser and screen reader combinations where accessibility API information is supplemented by access to DOM structures. For this article, we’ll focus specifically on accessibility APIs as a link between web content and the screen reader. Here’s the breakdown of how web content reaches screen readers via accessibility APIs: The web developer uses host language markup (HTML, SVG, etc.), and potentially roles, states, and properties from the ARIA suite where needed to provide the semantics of their content. Semantic markup communicates what type an element is, what content it contains, what state it’s in, etc. The browser rendering engine (alternatively referred to as a “user agent”) takes this information and maps it into an accessibility API. Different accessibility APIs are available on different operating systems, so a browser that is available on multiple platforms should support multiple accessibility APIs. Accessibility API mappings are maintained on a lower level than web platform APIs, so web developers don’t directly interact with accessibility APIs. The accessibility API includes a collection of interfaces that browsers and other apps can plumb into, and generally acts as an intermediary between the browser and the screen reader. Accessibility APIs provide interfaces for representing the structure, relationships, semantics, and state of digital content, as well as means to surface dynamic changes to said content. Accessibility APIs also allow screen readers to retrieve and interact with content via the API. Again, web developers don’t interact with these APIs directly; the rendering engine handles translating web content into information useful to accessibility APIs.

Examples of accessibility APIs

The screen reader uses client-side methods from these accessibility APIs to retrieve and handle information exposed by the browser. In browsers where direct access to the Document Object Model (DOM) is permitted, some screen readers may also take additional information from the DOM tree. A screen reader can also interact with apps that use differing accessibility APIs. No matter where they get their information, screen readers can dream up any interaction modes they want to provide to their users (I’ve provided links to screen reader commands at the end of this article). Testing by site creators can help identify content that feels awkward in a particular navigation mode, such as multiple links with the same text (“Learn more”), as one example.

Example of this pipeline: surfacing a button element to screen reader users

Let’s suppose for a moment that a screen reader wants to understand what object is next in the accessibility tree (which I’ll explain further in the next section), so it can surface that object to the user as they navigate to it. The flow will go a little something like this:
Diagram showing the client (screen reader) making a call to the accessibility API, which passes along the request to the provider (browser), which checks the content in the web document, which sends the information back up the chain
Diagram illustrating the steps involved in presenting the next object in a document; detailed list follows
  1. The screen reader requests information from the API about the next accessible object, relative to the current object.
  2. The API (as an intermediary) passes along this request to the browser.
  3. At some point, the browser references DOM and style information, and discovers that the relevant element is a non-hidden button: <button>Do a thing</button>.
  4. The browser maps this HTML button into the format the API expects, such as an accessible object with various properties: Name: Do a thing, Role: Button.
  5. The API returns this information from the browser to the screen reader.
  6. The screen reader can then surface this object to the user, perhaps stating “Button, Do a thing.”
Suppose that the screen reader user would now like to “click” this button. Here’s how their action flows all the way back to web content:
Diagram showing a user using a 'primary action' command to a client (screen reader), which passes the command to the accessibility API, which passes the command along to the provider (browser), which passes the command as a click event to the web document
Diagram illustrating the steps involved in routing a screen reader click to web content; detailed list follows
  1. The user provides a particular screen reader command, such as a keystroke or gesture.
  2. The screen reader calls a method into the API to invoke the button.
  3. The API forwards this interaction to the browser.
  4. How a browser may respond to incoming interactions depends on the context, but in this case the browser can raise this as a “click” event through web APIs. The browser should give no indication that the click came from an assistive technology, as doing so would violate the user’s right to privacy.
  5. The web developer has registered a JavaScript event listener for clicks; their callback function is now executed as if the user clicked with a mouse.
Now that we have a general sense of the pipeline, let’s go into a little more detail on the accessibility tree.

The accessibility tree

Screenshot showing the accessibility tools in Microsoft Edge
Dev Tools in Microsoft Edge showing the DOM tree and accessibility tree side by side; there are more nodes in the DOM tree
The accessibility tree is a hierarchical representation of elements in a UI or document, as computed for an accessibility API. In modern browsers, the accessibility tree for a given document is a separate, parallel structure to the DOM tree. “Parallel” does not necessarily mean there is a 1:1 match between the nodes of these two trees. Some elements may be excluded from the accessibility tree, for example if they are hidden or are not semantically useful (think non-focusable wrapper divs without any semantics added by a web developer). This idea of a hierarchical structure is somewhat of an abstraction. The definition of what exactly an accessibility tree is in practice has been debated and partially defined in multiple places, so implementations may differ in various ways. For example, it’s not actually necessary to generate accessible objects for every element in the DOM whenever the DOM tree is constructed. As a performance consideration, a browser could choose to deal with only a subset of objects and their relationships at a time—that is, however much is necessary to fulfill the requests coming from ATs. The rendering engine could make these computations during all user sessions, or only do so when assistive technologies are actively running. Generally speaking, modern web browsers wait until after style computation to build up any accessible objects. Browsers wait in part because generated content (such as ::before and ::after) ƒcan contain text that can participate in calculation of the accessible object’s name. CSS styles can also impact accessible objects in other various ways: text styling can come through as attributes on accessible text ranges. Display property values can impact the computation of line text ranges. These are just a few ways in which style can impact accessibility semantics. Browsers may also use different structures as the basis for accessible object computation. One rendering engine may walk the DOM tree and cross-reference style computations to build up parallel tree structures; another engine may use only the nodes that are available in a style tree in order to build up their accessibility tree. User agent participants in the standards community are currently thinking through how we can better document our implementation details, and whether it might make sense to standardize more of these details further down the road. Let’s now focus on the branches of this tree, and explore how individual accessibility objects are computed.

Building up accessible objects

From API to API, an accessible object will generally include a few things:
  • Role, or the type of accessible object (for example, Button). The role tells a user how they can expect to interact with the control. It is typically presented when screen reader focus moves onto the accessible object, and it can be used to provide various other functionalities, such as skipping around content via one type of object.
  • Name, if specified. The name is an (ideally short) identifier that better helps the user identify and understand the purpose of an accessible object. The name is often presented when screen focus moves to the object (more on this later), can be used as an identifier when presenting a list of available objects, and can be used as a hook for functionalities such as voice commands.
  • Description and/or help text, if specified. We’ll use “Description” as a shorthand. The Description can be considered supplemental to the Name; it’s not the main identifier but can provide further information about the accessible object. Sometimes this is presented when moving focus to the accessible object, sometimes not; this variation depends on both the screen reader’s user experience design and the user’s chosen verbosity settings.
  • Properties and methods surfacing additional semantics. For simplicity’s sake, we won’t go through all of these. For your awareness, properties can include details like layout information or available interactions (such as invoking the element or modifying its value).
Let’s walk through an example using markup for a simple mood tracker. We’ll use simplified property names and values, because these can differ between accessibility APIs.
<form>
  <label for="mood">On a scale of 1–10, what is your mood today?</label>
  <input id="mood" type="range"
       min="1" max="10" value="5"
       aria-describedby="helperText" />
  <p id="helperText">Some helpful pointers about how to rate your mood.</p>
  <!-- Using a div with button role for the purposes of showing how the accessibility tree is created. Please use the button element! -->
  <div tabindex="0" role="button">Log Mood</div>
</form>
First up is our form element. This form doesn’t have any attributes that would give it an accessible Name, and a form landmark without a Name isn’t very useful when jumping between landmarks. Therefore, HTML mapping standards specify that it should be mapped as a group. Here’s the beginning of our tree:
  • Role: Group
Next up is the label. This one doesn’t have an accessible Name either, so we’ll just nest it as an object of role “Label” underneath the form:
  • Role: Group
    • Role: Label
Let’s add the range input, which will map into various APIs as a “Slider.” Due to the relationship created by the for attribute on the label and id attribute on the input, this slider will take its Name from the label contents. The aria-describedby attribute is another id reference and points to a paragraph with some text content, which will be used for the slider’s Description. The slider object’s properties will also store “labelledby” and “describedby” relationships pointing to these other elements. And it will specify the current, minimum, and maximum values of the slider. If one of these range values were not available, ARIA standards specify what should be the default value. Our updated tree:
  • Role: Group
    • Role: Label
    • Role: Slider Name: On a scale of 1–10, what is your mood today? Description: Some helpful pointers about how to rate your mood. LabelledBy: mood DescribedBy: helperText ValueNow: 5 ValueMin: 1 ValueMax: 10
The paragraph will be added as a simple paragraph object (“Text” or “Group” in some APIs):
  • Role: Group
    • Role: Label
    • Role: Slider Name: On a scale of 1–10, what is your mood today? Description: Some helpful pointers about how to rate your mood. LabelledBy: mood DescribedBy: helperText ValueNow: 5 ValueMin: 1 ValueMax: 10
    • Role: Paragraph
The final element is an example of when role semantics are added via the ARIA role attribute. This div will map as a Button with the name “Log Mood,” as buttons can take their name from their children. This button will also be surfaced as “invokable” to screen readers and other ATs; special types of buttons could provide expand/collapse functionality (buttons with the aria-expanded attribute), or toggle functionality (buttons with the aria-pressed attribute). Here’s our tree now:
  • Role: Group
    • Role: Label
    • Role: Slider Name: On a scale of 1–10, what is your mood today? Description: Some helpful pointers about how to rate your mood. LabelledBy: mood DescribedBy: helperText ValueNow: 5 ValueMin: 1 ValueMax: 10
    • Role: Paragraph
    • Role: Button Name: Log Mood

On choosing host language semantics

Our sample markup mentions that it is preferred to use the HTML-native button element rather than a div with a role of “button.” Our buttonified div can be operated as a button via accessibility APIs, as the ARIA attribute is doing what it should—conveying semantics. But there’s a lot you can get for free when you choose native elements. In the case of button, that includes focus handling, user input handling, form submission, and basic styling. Aaron Gustafson has what he refers to as an “exhaustive treatise” on buttons in particular, but generally speaking it’s great to let the web platform do the heavy lifting of semantics and interaction for us when we can. ARIA roles, states, and properties are still a great tool to have in your toolbelt. Some good use cases for these are
  • providing further semantics and relationships that are not naturally expressed in the host language;
  • supplementing semantics in markup we perhaps don’t have complete control over;
  • patching potential cross-browser inconsistencies;
  • and making custom elements perceivable and operable to users of assistive technologies.

Notes on inclusion or exclusion in the tree

Standards define some rules around when user agents should exclude elements from the accessibility tree. Excluded elements can include those hidden by CSS, or the aria-hidden or hidden attributes; their children would be excluded as well. Children of particular roles (like checkbox) can also be excluded from the tree, unless they meet special exceptions. The full rules can be found in the “Accessibility Tree” section of the ARIA specification. That being said, there are still some differences between implementers, some of which include more divs and spans in the tree than others do.

Notes on name and description computation

How names and descriptions are computed can be a bit confusing. Some elements have special rules, and some ARIA roles allow name computation from the element’s contents, whereas others do not. Name and description computation could probably be its own article, so we won’t get into all the details here (refer to “Further reading and resources” for some links). Some short pointers:
  • aria-label, aria-labelledby, and aria-describedby take precedence over other means of calculating name and description.
  • If you expect a particular HTML attribute to be used for the name, check the name computation rules for HTML elements. In your scenario, it may be used for the full description instead.
  • Generated content (::before and ::after) can participate in the accessible name when said name is taken from the element’s contents. That being said, web developers should not rely on pseudo-elements for non-decorative content, as this content could be lost when a stylesheet fails to load or user styles are applied to the page.
When in doubt, reach out to the community! Tag questions on social media with “#accessibility.” “#a11y” is a common shorthand; the “11” stands for “11 middle letters in the word ‘accessibility.’” If you find an inconsistency in a particular browser, file a bug! Bug tracker links are provided in “Further reading and resources.”

Not just accessible objects

Besides a hierarchical structure of objects, accessibility APIs also offer interfaces that allow ATs to interact with text. ATs can retrieve content text ranges, text selections, and a variety of text attributes that they can build experiences on top of. For example, if someone writes an email and uses color alone to highlight their added comments, the person reading the email could increase the verbosity of speech output in their screen reader to know when they’re encountering phrases with that styling. However, it would be better for the email author to include very brief text labels in this scenario. The big takeaway here for web developers is to keep in mind that the accessible name of an element may not always be surfaced in every navigation mode in every screen reader. So if your aria-label text isn’t being read out in a particular mode, the screen reader may be primarily using text interfaces and only conditionally stopping on objects. It may be worth your while to consider using text content—even if visually hidden—instead of text via an ARIA attribute. Read more thoughts on aria-label and aria-labelledby.

Accessibility API events

It is the responsibility of browsers to surface changes to content, structure, and user input. Browsers do this by sending the accessibility API notifications about various events, which screen readers can subscribe to; again, for performance reasons, browsers could choose to send notifications only when ATs are active. Let’s suppose that a screen reader wants to surface changes to a live region (an element with role="alert" or aria-live):
Diagram showing a client (screen reader), which is already subscribed to live region events and can request more info about the live region, which receives a notification from the accessibility API, which gets a notification that a live region has changed from the provider (browser), which has a live region changed by the web document
Diagram illustrating the steps involved in announcing a live region via a screen reader; detailed list follows
  1. The screen reader subscribes to event notifications; it could subscribe to notifications of all types, or just certain types as categorized by the accessibility API. Let’s assume in our example that the screen reader is at least listening to live region change events.
  2. In the web content, the web developer changes the text content of a live region.
  3. The browser (provider) recognizes this as a live region change event, and sends the accessibility API a notification.
  4. The API passes this notification along to the screen reader.
  5. The screen reader can then use metadata from the notification to look up the relevant accessible objects via the accessibility API, and can surface the changes to the user.
ATs aren’t required to do anything with the information they retrieve. This can make it a bit trickier as a web developer to figure out why a screen reader isn’t announcing a change: it may be that notifications aren’t being raised (for example, because a browser is not sending notifications for a live region dynamically inserted into web content), or the AT is not subscribed or responding to that type of event.

Testing with screen readers and dev tools

While conformance checkers can help catch some basic accessibility issues, it’s ideal to walk through your content manually using a variety of contexts, such as
  • using a keyboard only;
  • with various OS accessibility settings turned on;
  • and at different zoom levels and text sizes, and so on.
As you do this, keep in mind the Web Content Accessibility Guidelines (WCAG 2.1), which give general guidelines around expectations for inclusive web content. If you can test with users after your own manual test passes, all the better! Robust accessibility testing could probably be its own series of articles. In this one, we’ll go over some tips for testing with screen readers, and catching accessibility errors as they are mapped into the accessibility API in a more general sense.

Screen reader testing

Screen readers exist in many forms: some are pre-installed on the operating system and others are separate applications that in some cases are free to download. The WebAIM screen reader user survey provides a list of commonly used screen reader and browser combinations among survey participants. The “Further reading and resources” section at the end of this article includes full screen reader user docs, and Deque University has a great set of screen reader command cheat sheets that you can refer to. Some actions you might take to test your content:
  • Read the next/previous item.
  • Read the next/previous line.
  • Read continuously from a particular point.
  • Jump by headings, landmarks, and links.
  • Tab around focusable elements only.
  • Get a summary of all elements of a particular type within the page.
  • Search the page for specific content.
  • Use table-specific commands to interact with your tables.
  • Jump around by form field; are field instructions discoverable in this navigational mode?
  • Use keyboard commands to interact with all interactive elements. Are your JavaScript-driven interactions still operable with screen readers (which can intercept key input in certain modes)? WAI-ARIA Authoring Practices 1.1 includes notes on expected keyboard interactions for various widgets.
  • Try out anything that creates a content change or results in navigating elsewhere. Would it be obvious, via screen reader output, that a change occurred?

Tracking down the source of unexpected behavior

If a screen reader does not announce something as you’d expect, here are a few different checks you can run:
  • Does this reproduce with the same screen reader in multiple browsers on this OS? It may be an issue with the screen reader or your expectation may not match the screen reader’s user experience design. For example, a screen reader may choose to not expose the accessible name of a static, non-interactive element. Checking the user docs or filing a screen reader issue with a simple test case would be a great place to start.
  • Does this reproduce with multiple screen readers in the same browser, but not in other browsers on this OS? The browser in question may have an issue, there may be compatibility differences between browsers (such as a browser doing extra helpful but non-standard computations), or a screen reader’s support for a specific accessibility API may vary. Filing a browser issue with a simple test case would be a great place to start; if it’s not a browser bug, the developer can route it to the right place or make a code suggestion.
  • Does this reproduce with multiple screen readers in multiple browsers? There may be something you can adjust in your code, or your expectations may differ from standards and common practices.
  • How does this element’s accessibility properties and structure show up in browser dev tools?

Inspecting accessibility trees and properties in dev tools

Major modern browsers provide dev tools to help you observe the structure of the accessibility tree as well as a given element’s accessibility properties. By observing which accessible objects are generated for your elements and which properties are exposed on a given element, you may be able to pinpoint issues that are occurring either in front-end code or in how the browser is mapping your content into the accessibility API. Let’s suppose that we are testing this piece of code in Microsoft Edge with a screen reader:
<div class="form-row">
  <label>Favorite color</label>
  <input id="myTextInput" type="text" />
</div>
We’re navigating the page by form field, and when we land on this text field, the screen reader just tells us this is an “edit” control—it doesn’t mention a name for this element. Let’s check the tools for the element’s accessible name. 1. Inspect the element to bring up the dev tools.
Screenshot showing the Microsoft Edge dev tools inspecting an input element
The Microsoft Edge dev tools, with an input element highlighted in the DOM tree
2. Bring up the accessibility tree for this page by clicking the accessibility tree button (a circle with two arrows) or pressing Ctrl+Shift+A (Windows).
Screenshot showing the Microsoft Edge tools inspecting an input element with the Accessibility Tree panel open
The accessibility tree button activated in the Microsoft Edge dev tools
Reviewing the accessibility tree is an extra step for this particular flow but can be helpful to do. When the Accessibility Tree pane comes up, we notice there’s a tree node that just says “textbox:,” with nothing after the colon. That suggests there’s not a name for this element. (Also notice that the div around our form input didn’t make it into the accessibility tree; it was not semantically useful). 3. Open the Accessibility Properties pane, which is a sibling of the Styles pane. If we scroll down to the Name property—aha! It’s blank. No name is provided to the accessibility API. (Side note: some other accessibility properties are filtered out of this list by default; toggle the filter button—which looks like a funnel—in the pane to get the full list).
Screenshot showing the Microsoft Edge tools inspecting an input element with the Accessibility Tree panel open
The Accessibility Properties pane open in Microsoft Edge dev tools, in the same area as the Styles pane
4. Check the code. We realize that we didn’t associate the label with the text field; that is one strategy for providing an accessible name for a text input. We add for="myTextInput" to the label:
<div class="form-row">
  <label for="myTextInput">Favorite color</label>
  <input id="myTextInput" type="text" />
</div>
And now the field has a name:
Screenshot showing the Microsoft Edge tools inspecting an input element with the Accessibility Tree panel open, where the input's Name attribute now has a value
The accessible Name property set to the value of “Favorite color” inside Microsoft Edge dev tools
In another use case, we have a breadcrumb component, where the current page link is marked with aria-current="page":
<nav class="breadcrumb" aria-label="Breadcrumb">
  <ol>
    <li>
      <a href="/cat/">Category</a>
    </li>
    <li>
      <a href="/cat/sub/">Sub-Category</a>
    </li>
    <li>
      <a aria-current="page" href="/cat/sub/page/">Page</a>
    </li>
  </ol>
</nav>
When navigating onto the current page link, however, we don’t get any indication that this is the current page. We’re not exactly sure how this maps into accessibility properties, so we can reference a specification like Core Accessibility API Mappings 1.2 (Core-AAM). Under the “State and Property Mapping” table, we find mappings for “aria-current with non-false allowed value.” We can check for these listed properties in the Accessibility Properties pane. Microsoft Edge, at the time of writing, maps into UIA (UI Automation), so when we check AriaProperties, we find that yes, “current=page” is included within this property value.
Screenshot showing the Microsoft Edge tools inspecting an input element with the Accessibility Tree panel open, where the input's AriaProperties attribute now has a value of current=page
The accessible Name property set to the value of “Favorite color” inside Microsoft Edge dev tools
Now we know that the value is presented correctly to the accessibility API, but the particular screen reader is not using the information. As a side note, Microsoft Edge’s current dev tools expose these accessibility API properties quite literally. Other browsers’ dev tools may simplify property names and values to make them easier to read, particularly if they support more than one accessibility API. The important bit is to find if there’s a property with roughly the name you expect and whether its value is what you expect. You can also use this method of checking through the property names and values if mapping specs, like Core-AAM, are a bit intimidating!

Advanced accessibility tools

While browser dev tools can tell us a lot about the accessibility semantics of our markup, they don’t generally include representations of text ranges or event notifications. On Windows, the Windows SDK includes advanced tools that can help debug these parts of MSAA or UIA mappings: Inspect and AccEvent (Accessible Event Watcher). Using these tools presumes knowledge of the Windows accessibility APIs, so if this is too granular for you and you’re stuck on an issue, please reach out to the relevant browser team! There is also an Accessibility Inspector in Xcode on MacOS, with which you can inspect web content in Safari. This tool can be accessed by going to Xcode > Open Developer Tool > Accessibility Inspector.

Diversity of experience

Equipped with an accessibility tree, detailed object information, event notifications, and methods for interacting with accessible objects, screen readers can craft a browsing experience tailored to their audiences. In this article, we’ve used the term “screen readers” as a proxy for a whole host of tools that may use accessibility APIs to provide the best user experience possible. Assistive technologies can use the APIs to augment presentation or support varying types of user input. Examples of other ATs include screen magnifiers, cognitive support tools, speech command programs, and some brilliant new app that hasn’t been dreamed up yet. Further, assistive technologies of the same “type” may differ in how they present information, and users who share the same tool may further adjust settings to their liking. As web developers, we don’t necessarily need to make sure that each instance surfaces information identically, because each user’s preferences will not be exactly the same. Our aim is to ensure that no matter how a user chooses to explore our sites, content is perceivable, operable, understandable, and robust. By testing with a variety of assistive technologies—including but not limited to screen readers—we can help create a better web for all the many people who use it.

Further reading and resources

As I write this, the world is sending its thoughts and prayers to our Muslim cousins. The Christchurch act of terrorism has once again reminded the world that white supremacy’s rise is very real, that its perpetrators are no longer on the fringes of society, but centered in our holiest places of worship. People are begging us to not share videos of the mass murder or the hateful manifesto that the white supremacist terrorist wrote. That’s what he wants: for his proverbial message of hate to be spread to the ends of the earth.

We live in a time where you can stream a mass murder and hate crime from the comfort of your home. Children can access these videos, too.

As I work through the pure pain, unsurprised, observing the toll on Muslim communities (as a non-Muslim, who matters least in this event), I think of the imperative role that our industry plays in this story.

At time of writing, YouTube has failed to ban and to remove this video. If you search for the video (which I strongly advise against), it still comes up with a mere content warning; the same content warning that appears for casually risqué content. You can bypass the warning and watch people get murdered. Even when the video gets flagged and taken down, new ones get uploaded.

Human moderators have to relive watching this trauma over and over again for unlivable wages. News outlets are embedding the video into their articles and publishing the hateful manifesto. Why? What does this accomplish?

I was taught in journalism class that media (photos, video, infographics, etc.) should be additive (a progressive enhancement, if you will) and provide something to the story for the reader that words cannot.

Is it necessary to show murder for our dear readers to understand the cruelty and finality of it? Do readers gain something more from watching fellow humans have their lives stolen from them? What psychological damage are we inflicting upon millions of people   and for what?

Who benefits?

The mass shooter(s) who had a message to accompany their mass murder. News outlets are thirsty for perverse clicks to garner more ad revenue. We, by way of our platforms, give agency and credence to these acts of violence, then pilfer profits from them. Tech is a money-making accomplice to these hate crimes.

Christchurch is just one example in an endless array where the tools and products we create are used as a vehicle for harm and for hate.

Facebook and the Cambridge Analytica scandal played a critical role in the outcome of the 2016 presidential election. The concept of “race realism,” which is essentially a term that white supremacists use to codify their false racist pseudo-science, was actively tested on Facebook’s platform to see how the term would sit with people who are ignorantly sitting on the fringes of white supremacy. Full-blown white supremacists don’t need this soft language. This is how radicalization works.

The strategies articulated in the above article are not new. Racist propaganda predates social media platforms. What we have to be mindful with is that we’re building smarter tools with power we don’t yet fully understand: you can now have an AI-generated human face. Our technology is accelerating at a frightening rate, a rate faster than our reflective understanding of its impact.

Combine the time-tested methods of spreading white supremacy, the power to manipulate perception through technology, and the magnitude and reach that has become democratized and anonymized.

We’re staring at our own reflection in the Black Mirror.

The right to speak versus the right to survive

Tech has proven time and time again that it voraciously protects first amendment rights above all else. (I will also take this opportunity to remind you that the first amendment of the United States offers protection to the people from the government abolishing free speech, not from private money-making corporations).

Evelyn Beatrice Hall writes in The Friends of Voltaire, “I disapprove of what you say, but I will defend to the death your right to say it.” Fundamentally, Hall’s quote expresses that we must protect, possibly above all other freedoms, the freedom to say whatever we want to say. (Fun fact: The quote is often misattributed to Voltaire, but Hall actually wrote it to explain Voltaire’s ideologies.)

And the logical anchor here is sound: We must grant everyone else the same rights that we would like for ourselves. Former 99u editor Sean Blanda wrote a thoughtful piece on the “Other Side,” where he posits that we lack tolerance for people who don’t think like us, but that we must because we might one day be on the other side. I agree in theory.

But, what happens when a portion of the rights we grant to one group (let’s say, free speech to white supremacists) means the active oppression another group’s right (let’s say, every person of color’s right to live)?

James Baldwin expresses this idea with a clause, “We can disagree and still love each other unless your disagreement is rooted in my oppression and denial of my humanity and right to exist.”

It would seem that we have a moral quandary where two sets of rights cannot coexist. Do we protect the privilege for all users to say what they want, or do we protect all users from hate? Because of this perceived moral quandary, tech has often opted out of this conversation altogether. Platforms like Twitter and Facebook, two of the biggest offenders, continue to allow hate speech to ensue with irregular to no regulation.

When explicitly asked about his platform as a free-speech platform and its consequence to privacy and safety, Twitter CEO Jack Dorsey said,

“So we believe that we can only serve the public conversation, we can only stand for freedom of expression if people feel safe to express themselves in the first place. We can only do that if they feel that they are not being silenced.”

Dorsey and Twitter are most concerned about protecting expression and about not silencing people. In his mind, if he allows people to say whatever they want on his platform, he has succeeded. When asked about why he’s failed to implement AI to filter abuse like, say, Instagram had implemented, he said that he’s most concerned about being able to explain why the AI flagged something as abusive. Again, Dorsey protects the freedom of speech (and thus, the perpetrators of abuse) before the victims of abuse.

But he’s inconsistent about it. In a study by George Washington University comparing white nationalists and ISIS social media usage, Twitter’s freedom of speech was not granted to ISIS. Twitter suspended 1,100 accounts related to ISIS whereas it suspended only seven accounts related to Nazis, white nationalism, and white supremacy, despite the accounts having more than seven times the followers, and tweeting 25 times more than the ISIS accounts. Twitter here made a moral judgment that the fewer, less active, and less influential ISIS accounts were somehow not welcome on their platform, whereas the prolific and burgeoning Nazi and white supremacy accounts were.

So, Twitter has shown that it won’t protect free speech at all costs or for all users. We can only conclude that Twitter is either intentionally protecting white supremacy or simply doesn’t think it’s very dangerous. Regardless of which it is (I think I know), the outcome does not change the fact that white supremacy is running rampant on its platforms and many others.

Let’s brainwash ourselves for a moment and pretend like Twitter does want to support freedom of speech equitably and stays neutral and fair to complete this logical exercise: Going back to the dichotomy of rights example I provided earlier, where either the right to free speech or the right to safety and survival prevail, the rights and the power will fall into the hands of the dominant group or ideologue.

In case you are somehow unaware, the dominating ideologue, whether you’re a flagrant white supremacist or not, is white supremacy. White supremacy was baked into founding principles of the United States, the country where the majority of these platforms were founded and exist. (I am not suggesting that white supremacy doesn’t exist globally, as it does, evidenced most recently by the terrorist attack in Christchurch. I’m centering the conversation intentionally around the United States as it is my lived experience and where most of these companies operate.)

Facebook attempted to educate its team on white supremacy in order to address how to regulate free speech. A laugh-cry excerpt:

“White nationalism and calling for an exclusively white state is not a violation for our policy unless it explicitly excludes other PCs [protected characteristics].”

White nationalism is a softened synonym for white supremacy so that racists-lite can feel more comfortable with their transition into hate. White nationalism (a.k.a. white supremacy) by definition explicitly seeks to eradicate all people of color. So, Facebook should see white nationalist speech as exclusionary, and therefore a violation of their policies.

Regardless of what tech leaders like Dorsey or Facebook CEO Zuckerberg say or what mediocre and uninspired condolences they might offer, inaction is an action.

Companies that use terms and conditions or acceptable use policies to defend their inaction around hate speech are enabling and perpetuating white supremacy. Policies are written by humans to protect that group of human’s ideals. The message they use might be that they are protecting free speech, but hate speech is a form of free speech. So effectively, they are protecting hate speech. Well, as long as it’s for white supremacy and not the Islamic State.

Whether the motivation is fear (losing loyal Nazi customers and their sympathizers) or hate (because their CEO is a white supremacist), it does not change the impact: Hate speech is tolerated, enabled, and amplified by way of their platforms.

“That wasn’t our intent”

Product creators might be thinking, Hey, look, I don’t intentionally create a platform for hate. The way these features were used was never our intent.

Intent does not erase impact.

We cannot absolve ourselves of culpability merely because we failed to conceive such evil use cases when we built it. While we very well might not have created these platforms with the explicit intent to help Nazis or imagined it would be used to spread their hate, the reality is that our platforms are being used in this way.

As product creators, it is our responsibility to protect the safety of our users by stopping those that intend to or already cause them harm. Better yet, we ought to think of this before we build the platforms to prevent this in the first place.

The question to answer isn’t, “Have I made a place where people have the freedom to express themselves?” Instead we have to ask, “Have I made a place where everyone has the safety to exist?” If you have created a place where a dominant group can embroil and embolden hate against another group, you have failed to create a safe place. The foundations of hateful speech (beyond the psychological trauma of it) lead to events like Christchurch.

We must protect safety over speech.

The Domino Effect

This week, Slack banned 28 hate groups. What is most notable, to me, is that the groups did not break any parts of their Acceptable Use Policy. Slack issued a statement:

The use of Slack by hate groups runs counter to everything we believe in at Slack and is not welcome on our platform… Using Slack to encourage or incite hatred and violence against groups or individuals because of who they are is antithetical to our values and the very purpose of Slack.

That’s it.

It is not illegal for tech companies like Slack to ban groups from using their proprietary software because it is a private company that can regulate users if they do not align with their vision as a company. Think of it as the “no shoes, no socks, no service” model, but for tech.

Slack simply decided that supporting the workplace collaboration of Nazis around efficient ways to evangelize white supremacy was probably not in line with their company directives around inclusion. I imagine Slack also considered how their employees of color most ill-affected by white supremacy would feel working for a company that supported it, actively or not.

What makes the Slack example so notable is that they acted swiftly and on their own accord. Slack chose the safety of all their users over the speech of some.

When caught with their enablement of white supremacy, some companies will only budge under pressure from activist groups, users, and employees.

PayPal finally banned hate groups after Charlottesville and after Southern Poverty Law Center (SPLC) explicitly called them out for enabling hate. SPLC had identified this fact for three years prior. PayPal had ignored them for all three years.

Unfortunately, taking these “stances” against something as clearly and viscerally wrong as white supremacy is rare for companies to do. The tech industry tolerates this inaction through unspoken agreements.

If Facebook doesn’t do anything about racist political propaganda, YouTube doesn’t do anything about PewDiePie, and Twitter doesn’t do anything about disproportionate abuse against Black women, it says to the smaller players in the industry that they don’t have to either.

The tech industry reacts to its peers. When there is disruption, as was the case with Airbnb, who screened and rejected any guests who they believed to be partaking in the Unite the Right Charlottesville rally, companies follow suit. GoDaddy cancelled Daily Stormer’s domain registration and Google did the same when they attempted migration.

If one company, like Slack or Airbnb, decides to do something about the role it’s going to play, it creates a perverse kind of FOMO for the rest: Fear of missing out of doing the right thing and standing on the right side of history.

Don’t have FOMO, do something

The type of activism at those companies all started with one individual. If you want to be part of the solution, I’ve gathered some places to start. The list is not exhaustive, and, as with all things, I recommend researching beyond this abridged summary.

  1. Understand how white supremacy impacts you as an individual.
    Now, if you are a person of color, queer, disabled, or trans, it’s likely that you know this very intimately.

     

    If you are not any of those things, then you, as a majority person, need to understand how white supremacy protects you and works in your favor. It’s not easy work, it is uncomfortable and unfamiliar, but you have the most powerful tools to fix tech. The resources are aplenty, but my favorite abridged list:

    1. Seeing White podcast
    2. Ijeoma Oluo’s So you want to talk about race
    3. Reni Eddo-Lodge’s Why I’m no longer talking to white people about race (Very key read for UK folks)
    4. Robin DiAngelo’s White Fragility
  2. See where your company stands: Read your company’s policies like accepted use and privacy policies and find your CEO’s stance on safety and free speech.
    While these policies are baseline (and in the Slack example, sort of irrelevant), it’s important to known your company's track record. As an employee, your actions and decisions either uphold the ideologies behind the company or they don’t. Ask yourself if the company’s ideologies are worth upholding and whether they align with your own. Education will help you to flag if something contradicts those policies, or if the policies themselves allow for unethical activity.
  3. Examine everything you do critically on an ongoing basis.
    You may feel your role is small or that your company is immune—maybe you are responsible for the maintenance of one small algorithm. But consider how that algorithm or similar ones can be exploited. Some key questions I ask myself:
    1. Who benefits from this? Who is harmed?
    2. How could this be used for harm?
    3. Who does this exclude? Who is missing?
    4. What does this protect? For whom? Does it do so equitably?
  4. See something? Say something.
    If you believe that your company is creating something that is or can be used for harm, it is your responsibility to say something. Now, I’m not naïve to the fact that there is inherent risk in this. You might fear ostracization or termination. You need to protect yourself first. But you also need to do something.
    1. Find someone who you trust who might be at less risk. Maybe if you’re a nonbinary person of color, find a white cis man who is willing to speak up. Maybe if you’re a white man who is new to the company, find a white man who has more seniority or tenure. But also, consider how you have so much more relative privilege compared to most other people and that you might be the safest option.
    2. Unionize. Find peers who might feel the same way and write a collective statement.
    3. Get someone influential outside of the company (if knowledge is public) to say something.
  5. Listen to concerns, no matter how small, particularly if they’re coming from the most endangered groups.
    If your user or peer feels unsafe, you need to understand why. People often feel like small things can be overlooked, as their initial impact might be less, but it is in the smallest cracks that hate can grow. Allowing one insensitive comment about race is still allowing hate speech. If someone, particularly someone in a marginalized group, brings up a concern, you need to do your due diligence to listen to it and to understand its impact.

I cannot emphasize this last point enough.

What I say today is not new. Versions of this article have been written before. Women of color like me have voiced similar concerns not only in writing, but in design reviews, in closed door meetings to key stakeholders, in Slack DMs. We’ve blown our whistles.

But here is the power of white supremacy.

White supremacy is so ingrained in every single aspect of how this nation was built, how our corporations function, and who is in control. If you are not convinced of this, you are not paying attention or intentionally ignoring the truth.

Queer, Muslim, disabled, trans women and nonbinary folks of color — the marginalized groups most impacted by this — are the ones who are voicing these concerns most voraciously. Speaking up requires us to enter the spotlight and outside of safety—we take a risk and are not heard.

The silencing of our voices is one of many effective tools of white supremacy. Our silencing lives within every microaggression, each time we’re talked over, or not invited to partake in key decisions.

In tech, I feel I am a canary in a coal mine. I have sung my song to warn the miners of the toxicity. My sensitivity to it is heightened, because of my existence.

But the miners look at me and tell me that my lived experience is false. It does not align with their narrative as humans. They don’t understand why I sing.

If the people at the highest echelons of the tech industry—the white, male CEOs in power—fail to listen to its most marginalized people—the queer, disabled, trans, people of color—the fate of the canaries will too become the fate of the miners.

<![CDATA[ Nothing Fails Like Success ]]> . 17 abr 2019 13:51:16.A List Apart: The Full Feed.

A family buys a house they can’t afford. They can’t make their monthly mortgage payments, so they borrow money from the Mob. Now they’re in debt to the bank and the Mob, live in fear of losing their home, and must do whatever their creditors tell them to do.

Welcome to the internet, 2019.

Buying something you can’t afford, and borrowing from organizations that don’t have your (or your customers’) best interest at heart, is the business plan of most internet startups. It’s why our digital services and social networks in 2019 are a garbage fire of lies, distortions, hate speech, tribalism, privacy violations, snake oil, dangerous idiocy, deflected responsibility, and whole new categories of unpunished ethical breaches and crimes.

From optimistically conceived origins and message statements about making the world a better place, too many websites and startups have become the leading edge of bias and trauma, especially for marginalized and at-risk groups.

Why (almost) everything sucks

Twitter, for instance, needs a lot of views for advertising to pay at the massive scale its investors demand. A lot of views means you can’t be too picky about what people share. If it’s misogynists or racists inspiring others who share their heinous beliefs to bring back the 1930s, hey, it’s measurable. If a powerful elected official’s out-of-control tweeting reduces churn and increases views, not only can you pay your investors, you can even take home a bonus. Maybe it can pay for that next meditation retreat.

You can cloak this basic economic trade-off in fifty layers of bullshit—say you believe in freedom of speech, or that the antidote to bad speech is more speech—but the fact is, hate speech is profitable. It’s killing our society and our planet, but it’s profitable. And the remaining makers of Twitter—the ones whose consciences didn’t send them packing years ago—no longer have a choice. The guy from the Mob is on his way over, and the vig is due.

Not to single out Twitter, but this is clearly the root cause of its seeming indifference to the destruction hate speech is doing to society…and will ultimately do to the platform. (But by then Jack will be able to afford to meditate full-time.)

Other companies do other evil things to pay their vig. When you owe the Mob, you have no choice. Like sell our data. Or lie about medical research.

There are internet companies (like Basecamp, or like Automattic, makers of WordPress.com, where I work) that charge money for their products and services, and use that money to grow their business. I wish more internet companies could follow that model, but it’s hard to retrofit a legitimate business model to a product that started its life as free.

And there are even some high-end news publications, such as The New York Times, The Washington Post, and The Guardian, that survive on a combination of advertising and flexible paywalls. But these options are not available to most digital publications and businesses.

Return with me to those Halcyon days…

Websites and internet startups used to be you and your friends making cool stuff for your other friends, and maybe building new friendships and even small communities in the process. (Even in 2019, that’s still how some websites and startups begin—as labors of love, fashioned by idealists in their spare time.)

Because they are labors of love; because we’ve spent 25 years training people to believe that websites, and news, and apps, and services should be free; because, when we begin a project, we can scarcely believe anyone will ever notice or care about it—for these reasons and more, the things we make digitally, especially on the web, are offered free of charge. We labor on, excited by positive feedback, and delighted to discover that, if we keep at it, our little community will grow.

Most such labors of love disappear after a year or two, as the creators drift out of touch with each other, get “real” jobs, fall in love, start families, or simply lose interest due to lack of attention from the public or the frustrations of spending weekends and holidays grinding away at an underappreciated site or app while their non-internet friends spend those same hours either having fun or earning money.

Along came money

But some of these startup projects catch on. And when they do, a certain class of investor smells ROI. And the naive cofounders, who never expected their product or service to really get anywhere, can suddenly envision themselves rich and Zuckerberg-famous. Or maybe they like the idea of quitting their day job, believing in themselves, and really going for it. After all, that is an empowering and righteous vision.

Maybe they believe that by taking the initial investment, they can do more good—that their product, if developed further, can actually help people. This is often the motivation behind agreeing to an initial investment deal, especially in categories like healthcare.

Or maybe the founders are problem solvers. Existing products or services in a given category have a big weakness. The problem solvers are sure that their idea is better. With enough capital, and a slightly bigger team, they can show the world how to do it right. Most inventions that have moved humankind forward followed exactly this path. It should lead to a better world (and it sometimes does). It shouldn’t produce privacy breaches and fake medicine and election-influencing bots and all the other plagues of our emerging digital civilization. So why does it?

Content wants to be paid

Primarily it is because these businesses have no business model. They were made and given away free. Now investors come along who can pay the founders, buy them an office, give them the money to staff up, and even help with PR and advertising to help them grow faster.

Now there are salaries and insurance and taxes and office space and travel and lecture tours and sales booths at SXSW, but there is still no charge for the product.

And the investor seeks a big return.

And when the initial investment is no longer enough to get the free-product company to scale to the big leagues, that’s when the really big investors come in with the really big bucks. And the company is suddenly famous overnight, and “everybody” is using the product, and it’s still free, and the investors are still expecting a giant payday.

Like I said—a house you can’t afford, so you go into debt to the bank and the Mob.

The money trap

Here it would be easy to blame capitalism, or at least untrammeled, under-regulated capitalism, which has often been a source of human suffering—not that capitalism, properly regulated, can’t also be a force for innovation which ameliorates suffering. That’s the dilemma for our society, and where you come down on free markets versus governmental regulation of businesses should be an intellectual decision, but these days it is a label, and we hate our neighbors for coming down a few degrees to the left or right of us. But I digress and oversimplify, and this isn’t a complaint about late stage capitalism per se, although it may smell like one.

No, the reason small companies created by idealists too frequently turn into consumer-defrauding forces for evil has to do with the amount of profit each new phase of investor expects to receive, and how quickly they expect to receive it, and the fact that the products and services are still free. And you know what they say about free products.

Nothing fails like success

A friend who’s a serial entrepreneur has started maybe a dozen internet businesses over the span of his career. They’ve all met a need in the marketplace. As a consequence, they’ve all found customers, and they’ve all made a profit. Yet his investors are rarely happy.

“Most of my startups have the decency to fail in the first year,” one investor told him. My friend’s business was taking in several million dollars a year and was slowly growing in staff and customers. It was profitable. Just not obscenely so.

And internet investors don’t want a modest return on their investment. They want an obscene profit right away, or a brutal loss, which they can write off their taxes. Making them a hundred million for the ten million they lent you is good. Losing their ten million is also good—they pay a lower tax bill that way, or they use the loss to fold a company, or they make a profit on the furniture while writing off the business as a loss…whatever rich people can legally do under our tax system, which is quite a lot.

What these folks don’t want is to lend you ten million dollars and get twelve million back.

You and I might go, “Wow! I just made two million dollars just for being privileged enough to have money to lend somebody else.” And that’s why you and I will never have ten million dollars to lend anybody. Because we would be grateful for it. And we would see a free two million dollars as a life-changing gift from God. But investors don’t think this way.

We didn’t start the fire, but we roasted our weenies in it

As much as we pretend to be a religious nation, our society worships these investors and their profits, worships companies that turn these profits, worships above all the myth of overnight success, which we use to motivate the hundreds of thousands of workers who will work nights and weekends for the owners in hopes of cashing in when the stock goes big.

Most times, even if the stock does go big, the owner has found a way to devalue it by the time it does. Owners have brilliant advisers they pay to figure out how to do those things. You and I don’t.

A Christmas memory

I remember visiting San Francisco years ago and scoring an invitation to Twitter’s Christmas party through a friend who worked there at the time. Twitter was, at the time, an app that worked via SMS and also via a website. Period.

Some third-party companies, starting with my friends at Iconfactory, had built iPhone apps for people who wanted to navigate Twitter via their newfangled iPhones instead of the web. Twitter itself hadn’t publicly addressed mobile and might not even have been thinking about it.

Although Twitter was transitioning from a fun cult thing—used by bloggers who attended SXSW Interactive in 2007—to an emerging cultural phenomenon, it was still quite basic in its interface and limited in its abilities. Which was not a bad thing. There is art in constraint, value in doing one thing well. As an outsider, if I’d thought about it, I would have guessed that Twitter’s entire team consisted of no more than 10 or 12 wild-eyed, sleep-deprived true believers.

Imagine my surprise, then, when I showed up at the Christmas party and discovered I’d be sharing dinner with hundreds of designers, developers, salespeople, and executives instead of the handful I’d naively anticipated meeting. (By now, of course, Twitter employs many thousands. It’s still not clear to an outsider why so many workers are needed.)

But one thing is clear: somebody has to pay for it all.

Freemium isn’t free

Employees, let alone thousands of them, on inflated Silicon Valley engineer salaries, aren’t free. Health insurance and parking and meals and HR and travel and expense accounts and meetups and software and hardware and office space and amenities aren’t free. Paying for all that while striving to repay investors tenfold means making a buck any way you can.

Since the product was born free and a paywall isn’t feasible, Twitter must rely on that old standby: advertising. Advertising may not generate enough revenue to keep your hometown newspaper (or most podcasts and content sites) in business, but at Twitter’s scale, it pays.

It pays because Twitter has so many active users. And what keeps those users coming back? Too often, it’s the dopamine of relentless tribalism—folks whose political beliefs match and reinforce mine in a constant unwinnable war of words with folks whose beliefs differ.

Of course, half the antagonists in a given brawl may be bots, paid for in secret by an organization that wants to make it appear that most citizens are against Net Neutrality, or that most Americans oppose even the most basic gun laws, or that our elected officials work for lizard people. The whole system is broken and dangerous, but it’s also addictive, and we can’t look away. From our naive belief that content wants to be free, and our inability to create businesses that pay for themselves, we are turning our era’s greatest inventions into engines of doom and despair.

Your turn

So here we are. Now what do we do about it?

It’s too late for current internet businesses (victims of their own success) that are mortgaged to the hilt in investor gelt. But could the next generation of internet startups learn from older, stable companies like Basecamp, and design products that pay for themselves via customer income—products that profit slowly and sustainably, allowing them to scale up in a similarly slow, sustainable fashion?

The self-payment model may not work for apps and sites that are designed as modest amusements or communities, but maybe those kinds of startups don’t need to make a buck—maybe they can simply be labors of love, like the websites we loved in the 1990s and early 2000s.

Along those same lines, can the IndieWeb, and products of IndieWeb thinking like Micro.blog, save us? Might they at least provide an alternative to the toxic aspects of our current social web, and restore the ownership of our data and content? And before you answer, RTFM.

On an individual and small collective basis, the IndieWeb already works. But does an IndieWeb approach scale to the general public? If it doesn’t scale yet, can we, who envision and design and build, create a new generation of tools that will help give birth to a flourishing, independent web? One that is as accessible to ordinary internet users as Twitter and Facebook and Instagram? Tantek Çelik thinks so, and he’s been right about the web for nearly 30 years. (For more about what Tantek thinks, listen to our conversation in Episode № 186 of The Big Web Show.)
Are these approaches mere whistling against a hurricane? Are most web and internet users content with how things are? What do you think? Share your thoughts on your personal website (dust yours off!) or (irony ahoy!) on your indie or mainstream social networks of choice using hashtag #LetsFixThis. I can’t wait to see what you have to say.

<![CDATA[ Responsible JavaScript: Part I ]]> . 17 abr 2019 13:51:16.A List Apart: The Full Feed.

By the numbers, JavaScript is a performance liability. If the trend persists, the median page will be shipping at least 400 KB of it before too long, and that’s merely what’s transferred. Like other text-based resources, JavaScript is almost always served compressed—but that might be the only thing we’re getting consistently right in its delivery.

Unfortunately, while reducing resource transfer time is a big part of that whole performance thing, compression has no effect on how long browsers take to process a script once it arrives in its entirety. If a server sends 400 KB of compressed JavaScript, the actual amount browsers have to process after decompression is north of a megabyte. How well devices cope with these heavy workloads depends, well, on the deviceMuch has been written about how adept various devices are at processing lots of JavaScript, but the truth is, the amount of time it takes to process even a trivial amount of it varies greatly between devices.

Take, for example, this throwaway project of mine, which serves around 23 KB of uncompressed JavaScript. On a mid-2017 MacBook Pro, Chrome chews through this comparably tiny payload in about 25 ms. On a Nokia 2 Android phone, however, that figure balloons to around 190 ms. That’s not an insignificant amount of time, but in either case, the page gets interactive reasonably fast.

Now for the big question: how do you think that little Nokia 2 does on an average page? It chokes. Even on a fast connection, browsing the web on it is an exercise in patience as JavaScript-laden web pages brick it for considerable stretches of time.

A performance timeline for a JavaScript-heavy website. Most of the timeline is JavaScript.
Figure 1. A performance timeline overview of a Nokia 2 Android phone browsing on a page where excessive JavaScript monopolizes the main thread.

While devices and the networks they navigate the web on are largely improving, we’re eating those gains as trends suggest. We need to use JavaScript responsibly. That begins with understanding what we’re building as well as how we’re building it.

The mindset of “sites” versus “apps”

Nomenclature can be strange in that we sometimes loosely identify things with terms that are inaccurate, yet their meanings are implicitly understood by everyone. Sometimes we overload the term “bee” to also mean “wasp”, even though the differences between bees and wasps are substantial. Those differences can motivate you to deal with each one differently. For instance, we’ll want to destroy a wasp nest, but because bees are highly beneficial and vulnerable insects, we may opt to relocate them.

We can be just as fast and loose in interchanging the terms “website” and “web app”. The differences between them are less clear than those between yellowjackets and honeybees, but conflating them can bring about painful outcomes. The pain comes in the affordances we allow ourselves when something is merely a “website” versus a fully-featured “web app.” If you’re making an informational website for a business, you’re less likely to lean on a powerful framework to manage changes in the DOM or implement client-side routing—at least, I hope. Using tools so ill-suited for the task would not only be a detriment to the people who use that site but arguably less productive.

When we build a web app, though, look out. We’re installing packages which usher in hundreds—if not thousands—of dependencies, some of which we’re not sure are even safe. We’re also writing complicated configurations for module bundlers. In this frenzied, yet ubiquitous, sort of dev environment, it takes knowledge and vigilance to ensure what gets built is fast and accessible. If you doubt this, run npm ls --prod in your project’s root directory and see if you recognize everything in that list. Even if you do, that doesn’t account for third party scripts—of which I’m sure your site has at least a few.

What we tend to forget is that the environment websites and web apps occupy is one and the same. Both are subject to the same environmental pressures that the large gradient of networks and devices impose. Those constraints don’t suddenly vanish when we decide to call what we build “apps”, nor do our users’ phones gain magical new powers when we do so.

It’s our responsibility to evaluate who uses what we make, and accept that the conditions under which they access the internet can be different than what we’ve assumed. We need to know the purpose we’re trying to serve, and only then can we build something that admirably serves that purpose—even if it isn’t exciting to build.

That means reassessing our reliance on JavaScript and how the use of it—particularly to the exclusion of HTML and CSS—can tempt us to adopt unsustainable patterns which harm performance and accessibility.

Don’t let frameworks force you into unsustainable patterns

I’ve been witness to some strange discoveries in codebases when working with teams that depend on frameworks to help them be highly productive. One characteristic common among many of them is that poor accessibility and performance patterns often result. Take the React component below, for example:

import React, { Component } from "react";
import { validateEmail } from "helpers/validation";

class SignupForm extends Component {
  constructor (props) {
    super(props);

    this.handleSubmit = this.handleSubmit.bind(this);
    this.updateEmail = this.updateEmail.bind(this);
    this.state.email = "";
  }

  updateEmail (event) {
    this.setState({
      email: event.target.value
    });
  }

  handleSubmit () {
    // If the email checks out, submit
    if (validateEmail(this.state.email)) {
      // ...
    }
  }

  render () {
    return (
      <div>
        <span class="email-label">Enter your email:</span>
        <input type="text" id="email" onChange={this.updateEmail} />
        <button onClick={this.handleSubmit}>Sign Up</button>
      </div>
    );
  }
}

There are some notable accessibility issues here:

  1. A form that doesn’t use a <form> element is not a form. Indeed, you could paper over this by specifying role="form" in the parent <div>, but if you’re building a form—and this sure looks like one—use a <form> element with the proper action and method attributes. The action attribute is crucial, as it ensures the form will still do something in the absence of JavaScript—provided the component is server-rendered, of course.
  2. <span> is not a substitute for a <label> element, which provides accessibility benefits <span>s don’t.
  3. If we intend to do something on the client side prior to submitting a form, then we should move the action bound to the <button> element's onClick handler to the <form> element’s onSubmit handler.
  4. Incidentally, why use JavaScript to validate an email address when HTML5 offers form validation controls in almost every browser back to IE 10? There’s an opportunity here to rely on the browser and use an appropriate input type, as well as the required attribute—but be aware that getting this to work right with screen readers takes a little know-how.
  5. While not an accessibility issue, this component doesn't rely on any state or lifecycle methods, which means it can be refactored into a stateless functional component, which uses considerably less JavaScript than a full-fledged React component.

Knowing these things, we can refactor this component:

import React from "react";

const SignupForm = props => {
  const handleSubmit = event => {
    // Needed in case we're sending data to the server XHR-style
    // (but will still work if server-rendered with JS disabled).
    event.preventDefault();

    // Carry on...
  };
  
  return (
    <form method="POST" action="/signup" onSubmit={handleSubmit}>
      <label for="email" class="email-label">Enter your email:</label>
      <input type="email" id="email" required />
      <button>Sign Up</button>
    </form>
  );
};

Not only is this component now more accessible, but it also uses less JavaScript. In a world that’s drowning in JavaScript, deleting lines of it should feel downright therapeutic. The browser gives us so much for free, and we should try to take advantage of that as often as possible.

This is not to say that inaccessible patterns occur only when frameworks are used, but rather that a sole preference for JavaScript will eventually surface gaps in our understanding of HTML and CSS. These knowledge gaps will often result in mistakes we may not even be aware of. Frameworks can be useful tools that increase our productivity, but continuing education in core web technologies is essential to creating usable experiences, no matter what tools we choose to use.

Rely on the web platform and you’ll go far, fast

While we’re on the subject of frameworks, it must be said that the web platform is a formidable framework of its own. As the previous section showed, we’re better off when we can rely on established markup patterns and browser features. The alternative is to reinvent them, and invite all the pain such endeavors all but guarantee us, or worse: merely assume that the author of every JavaScript package we install has solved the problem comprehensively and thoughtfully.

SINGLE PAGE APPLICATIONS

One of the tradeoffs developers are quick to make is to adopt the single page application (SPA) model, even if it’s not a fit for the project. Yes, you do gain better perceived performance with the client-side routing of an SPA, but what do you lose? The browser’s own navigation functionality—albeit synchronous—provides a slew of benefits. For one, history is managed according to a complex specification. Users without JavaScript—be it by their own choice or not—won’t lose access altogether. For SPAs to remain available when JavaScript is not, server-side rendering suddenly becomes a thing you have to consider.

Two series of screenshots. On the left, we have a blank screen for several seconds until the app appears after 5.24s. On the right, the basic components appear at 4ms and the site is fully usable at 5.16s.
Figure 2. A comparison of an example app loading on a slow connection. The app on the left depends entirely upon JavaScript to render a page. The app on the right renders a response on the server, but then uses client-side hydration to attach components to the existing server-rendered markup.

Accessibility is also harmed if a client-side router fails to let people know what content on the page has changed. This can leave those reliant on assistive technology to suss out what changes have occurred on the page, which can be an arduous task.

Then there’s our old nemesis: overhead. Some client-side routers are very small, but when you start with Reacta compatible router, and possibly even a state management library, you’re accepting that there’s a certain amount of code you can never optimize away—approximately 135 KB in this case. Carefully consider what you’re building and whether a client side router is worth the tradeoffs you’ll inevitably make. Typically, you’re better off without one.

If you’re concerned about the perceived navigation performance, you could lean on rel=prefetch to speculatively fetch documents on the same origin. This has a dramatic effect on improving perceived loading performance of pages, as the document is immediately available in the cache. Because prefetches are done at a low priority, they’re also less likely to contend with critical resources for bandwidth.

Screenshot showing a list of assets loaded on a webpage. 'writing/' is labeled as prefetched on initial navigation. This asset is then loaded in 2ms when actually requested by the user.
Figure 3. The HTML for the writing/ URL is prefetched on the initial page. When the writing/ URL is requested by the user, the HTML for it is loaded instantaneously from the browser cache.

The primary drawback with link prefetching is that you need to be aware that it can be potentially wasteful. Quicklink, a tiny link prefetching script from Google, mitigates this somewhat by checking if the current client is on a slow connection—or has data saver mode enabled—and avoids prefetching links on cross-origins by default.

Service workers are also hugely beneficial to perceived performance for returning users, whether we use client side routing or not—provided you know the ropesWhen we precache routes with a service worker, we get many of the same benefits as link prefetching, but with a much greater degree of control over requests and responses. Whether you think of your site as an “app” or not, adding a service worker to it is perhaps one of the most responsible uses of JavaScript that exists today.

JAVASCRIPT ISN’T THE SOLUTION TO YOUR LAYOUT WOES

If we’re installing a package to solve a layout problem, proceed with caution and ask “what am I trying to accomplish?” CSS is designed to do this job, and requires no abstractions to use effectively. Most layout issues JavaScript packages attempt to solve, like box placement, alignment, and sizingmanaging text overflow, and even entire layout systems, are solvable with CSS today. Modern layout engines like Flexbox and Grid are supported well enough that we shouldn’t need to start a project with any layout framework. CSS is the framework. When we have feature queries, progressively enhancing layouts to adopt new layout engines is suddenly not so hard.

/* Your mobile-first, non-CSS grid styles goes here */

/* The @supports rule below is ignored by browsers that don't
   support CSS grid, _or_ don't support @supports. */
@supports (display: grid) {
  /* Larger screen layout */
  @media (min-width: 40em) {
    /* Your progressively enhanced grid layout styles go here */
  }
}

Using JavaScript solutions for layout and presentations problems is not new. It was something we did when we lied to ourselves in 2009 that every website had to look in IE6 exactly as it did in the more capable browsers of that time. If we’re still developing websites to look the same in every browser in 2019, we should reassess our development goals. There will always be some browser we’ll have to support that can’t do everything those modern, evergreen browsers can. Total visual parity on all platforms is not only a pursuit made in vain, it’s the principal foe of progressive enhancement.

I’m not here to kill JavaScript

Make no mistake, I have no ill will toward JavaScript. It’s given me a career and—if I’m being honest with myself—a source of enjoyment for over a decade. Like any long-term relationship, I learn more about it the more time I spend with it. It’s a mature, feature-rich language that only gets more capable and elegant with every passing year.

Yet, there are times when I feel like JavaScript and I are at odds. I am critical of JavaScript. Or maybe more accurately, I’m critical of how we’ve developed a tendency to view it as a first resort to building for the web. As I pick apart yet another bundle not unlike a tangled ball of Christmas tree lights, it’s become clear that the web is drunk on JavaScript. We reach for it for almost everything, even when the occasion doesn’t call for it. Sometimes I wonder how vicious the hangover will be.

In a series of articles to follow, I’ll be giving more practical advice to follow to stem the encroaching tide of excessive JavaScript and how we can wrangle it so that what we build for the web is usable—or at least more so—for everyone everywhere. Some of the advice will be preventative. Some will be mitigating “hair of the dog” measures. In either case, the outcomes will hopefully be the same. I believe that we all love the web and want to do right by it, but I want us to think about how to make it more resilient and inclusive for all.


créditos

REQUEST_URI: /dyn/feeds/feed?id=32 - id: 005CBBB8D503ADA0 - , uid: , sheet: feeds/feed-list.xsl

2019-04-21T00:27:01.291 - SERVER_NAME: chafar.net, server_id: cnet, SERVER_SOFTWARE: Apache/2.4.10 (Debian)