Streaming performances are missing something crucial: the crowd. We’re going to need that. So today, I’m releasing a spec for the product we need: Virtual Stadium. If you build it, we will use it.
Any good performer — musician, athlete, or public speaker — knows the crowd is an essential part of the performance. Did they laugh? Did they cheer? Are they strangely silent or restless? Your performance changes because of the crowd reacts. And it’s not just the performers. The audience reacts to itself. We cheer because we’re part of a cheering crowd. We grumble because we heard somebody shout at the umpire. We lean forward and feel the tension in the people around us.
That’s missing from streaming performances. And if we’re ever going restart sporting events, concerts, and speeches, we’re going to need it.
How the current experience falls short
Right now, virtual events have two modes.
There is the collaborative meeting. In this mode, you treat everyone in the audience as equal participants, as I do when I run a workshop by videoconference. I can see all their faces; they’re free to interact with me. This scales up to about 40 people, then it breaks down.
Then there is the Webinar or streaming model, in which performers perform and the audience watches. Audience participation tends to be limited to virtual hand-raising and chat. This is a broadcast model, in which the crowd is mostly invisible.
We need a model more like an actual performance, in which the performers perform, the audience reacts, and everyone can see the audience’s reaction. Call it Stadium. Here’s a spec. I encourage Zoom or one of its competitors to build it.
The audience experience
A crucial part of Stadium is that the audience is on camera, with video and audio.
As an audience member, when you connect to a Stadium performance, you point the camera on your phone, tablet, or PC at yourself. It might be you and your family sitting on the couch, or you sitting at a desk watching. Just as at an actual performance, you are aware that others can see you. If you cheer or laugh, you know others can hear you.
You may dress up in team gear, post a banner that says “Yankees Suck” or “Play Free Bird,” or even paint your face to show your support for the performer, just as you would at a game or other performance.
Everyone watching the event has access to the “primary feed,” which is produced by the event. The primary feed is like the broadcast of a game, concert, or rally — it includes views selected by the producer, as well as some selected shots of people in the crowd as they react.
If you are watching an event taking place in a large venue, you can pick a seat. You could be front-row center or off to the side or in the end zone. Ticket prices might vary based on your seat, or you might have a superticket that allows you to switch your viewpoint at any moment and basically curate your own feed.
If you’re attending the event, you can choose to be part of a group of friends or colleagues who are watching the event at the same time. If you are part of a group, your audio will include the sounds of the other people in your group, and your video will include windows showing the people in your group. So you can say “Allen, did you see that catch!” or “Sarah, we should implement this marketing idea tomorrow!” Naturally, you can also communicate by text.
In addition to shouting and waving, you can communicate by tapping reaction buttons, much as you would do watching a Facebook live stream. You can see others’ reactions as well. The interface will prioritize showing reactions from your group over those of the general crowd.
The audio you hear is a blend of the performance and the audience. (The mix is determined by the producer.) So if the crowd is cheering, you hear cheers. If it is laughing, you hear laughter. For practical reasons, the actual crowd audio might be blended from a random collection of 200 or 500 audience members; it’s not practical to blend 10,000 audio streams.
Naturally, you can opt out of being on audio and video in the audience. It’s not required. But why wouldn’t you want to be a participant instead of just a consumer?
The performer experience
As a performer, I find one of the biggest challenges in virtual events is not knowing how the crowd is feeling. Stadium fixes that.
Performers will hear the blended crowd noise and be able to see rotating shots of random audience members on video on monitors. There could even be multiple monitors showing views of people in different locations. (What are the people in the bleachers doing and saying? The people in the expensive seats up front? The people who paid for backstage passes?)
You could even select an audience member from a console and do questions and answers with that person. I could see coaches talking to audience members during breaks, or public speakers fielding questions and giving answers at appropriate points in speeches.
The producer experience
Real events have producers. So do virtual events. The producer’s main job is to curate the feed.
So producers would select among multiple cameras to show the action, including cameras that are moving around the speaking space. They will intersperse those views with video of people watching remotely. They’ll control the audio mix, deciding just how much audio from the crowd to blend with the audio from the performance and, in sporting events, from the commentators.
They may turn the crowd reaction icons in the feed on or off at different points in the action.
They’ll also communicate with the performers. Some of this may be through text on screens in front of the performers, and some by manipulating the crowd views that the performers see on monitors.
Producers will also moderate the Q&A, selecting audience members to participate based on questions they want to ask.
Real concerts and events have security. So will events on Stadium.
As we’ve learned, Zoombombing — people flooding events with unwelcome or obscene video and audio — is a real problem.
In Stadium, a security staffer will monitor audience activity. Any audience member can tag another as offensive — security staffers will review and potentially boot them out of the performance. And they’ll monitor audience members on large screen arrays, looking and listening for offensive conduct.
Expectations of audience conduct would vary based on the type of event. At a sporting event, there will be plenty of cheering and shouting. At a concert, you’d expect people to be respectful during the actual music and keep most of their cheering to the moments between songs. At a speech, audience shouting would be considered offensive during most of the performance.
I would expect AI to help here. An AI can flag audience members who are shouting words (as opposed to cheering), screaming profanities, or showing things that look a bit too much like normally concealed body parts; the security staffer can make a judgment and toss the offender. (Unless, of course, it’s a speech or concert specifically designated for nudists.)
Because the crowd audio will be blended — and because it will be constantly shifting to different samples of hundreds of audience members — no audience member will be able to easily flood the full audio feed with offensive material. (If there’s one asshole in the audience at a football game, they don’t tend to ruin the experience for the whole crowd; Stadium would be similar.)
If audience members are paying for the performance, this will create a disincentive to misbehave and lose your ticket.
I would expect events to charge admission for most Stadium performances, as the experience would come as close as possible to replicating attendance at a live performance. I’d also expect pricing tiers, including different seating locations, the ability to curate your own show, backstage views, or even custom food catering delivery in your virtual at-home luxury suite. Of course, some folks will run free events on Stadium as well — rallies for which the payoff is different, such as creating enthusiasm for a candidate, or generating charity contributions.
The vendors of Stadium would charge as any other technology provider would. I’d expect prices in the thousands or tens of thousands of dollars per performance, depending on the size of the audience, the features available, and the number of camera feeds and monitors in the performance space.
Producers would charge for their talent, capabilities, and time, as they do now.
We’ll get used to watching from home
Even after venues and crowds become common again, I would expect Stadium to become a regular part of live performances. People watch sports from home now, even though there is a live audience. I’d expect Stadium to restore some of the value that events have in the short term, then create a revenue stream to supplement ticket sales for actual seats in the long term.
But no matter what happens, Stadium will always have one advantage over the actual experience of being in the room with a performer.
The refrigerator and pantry, filled with your favorite foods, will be close by. And the bathrooms will be cleaner, more pleasant, and far less crowded.
One response to “Virtual Stadium — the Zoom live event product we need now”
The article is a nice wishlist. But one aspect of broadcast latency is not considered. The fan reactions wil always be delayed with respect to the on ground events, unless you have a seperate sub second latency live stream for users using the app.