Streaming Video Over the Internet
How a transmission works:
A strong alternating current is run
along a wire that is called an antenna. This causes an alternating electric
and magnetic field that gives rise to an electromagnetic field radiating in
all directions. The wavelength - and therefore the frequency - of the signal
is determined by the rate of alternation.
The receiver is a passive wire that
when struck by the electromagnetic wave produces an alternating current that
mirrors the original signal. The circuitry of the receiver enables tuning or
resonance with a given wavelength.
top of page
Modulation:
Transmission frequencies don't fall
within the programming ranges. Their frequencies are assigned by the FCC and
are called carrier frequencies. Amplitude modulation (AM) programming is modulated
to variations in the carrier signal's amplitude. Frequency modulation (FM) converts
program material into variations of the carrier signal's frequency.
top of page
Video Specifications
Video uses an additive color system, mixing Red, Green, and Blue
Broadcast NTSC
In 1945, the FCC allocated 13 VHF
television channels. The NTSC in 1953 created the standards for the signals.
The standard dictates a frame rate of 30 frames per second in an interlaced
fashion. This divided each frame into two fields, and therefore each second
into 60 fields. NTSC standard has a visible resolution of 484 scanlines. The
horizontal resolution is 400. This standard was devised for black and white
television. The broadcast bandwidth was set for 4.5MHz. To adapt the system
for color transmission a chroma signal was modulated onto the luminance signal
as a subcarrier. The combined signals are known as composite video.
The phase of the chroma signal determines
hue, while the amplitude determines saturation. The chroma information is modulated
onto the subcarrier at two phases: the I, or in-phase value at 0 degrees; and
Q, or quadrature value at 90 degrees. The luminance signal is called the Y signal.
NTSC color video is referred to as a YIQ signal.
NTSC is used in North America, Central
America, Japan, and parts of the South Pacific and South America.
top of page
Other Broadcast Formats
PAL (Phase Alteration Line)
PAL offers 25 interlaced frames per second with 625 scanlines. It was developed after NTSC and has greater bandwidth for chroma modulation and therefore better color resolution. PAL is used in Great Britain, West Germany, and The Netherlands.
SECAM (Sequential Couleur Avec Memoire)
SECAM has the same frame rate and resolution as PAL, except that FM is used to encode the chroma. SECAM is used in France, the former Eastern block countries, and the Middle East.
HDTV (High Definition Television)
High Definition Television (HDTV) is part of the new digital television standard recommended by the ATSC and adopted by the Federal Communications Commission (FCC). In total, the new digital television standard is composed of 18 individual specifications for a variety of resolutions, refresh rates, aspect ratios and scanning methods (progressive versus interlaced). Of these 18 Digital Television (DTV) specifications, 12 are known as SDTV or Standard Definition Television while 6 are truly HDTV formats.
HDTV enjoys a number of advantages over traditional analog television. Chief among these advantages is the higher resolution of HDTV. Additionally, the standard sets a wider aspect ratio for televisions similar to the wide screens used in movie theaters. All the digital television formats, both SDTV and HDT, benefit from AC-3 digital sound. They also benefit from the nature of digital signals, which eliminate snow, color bleeding and other picture anomalies resulting in an image with excellent rendition of detail and color depth without distortion.
The digital television standard incorporates 18 formats. The 12 standard definition flavors of DTV make use of "low" resolutions essentially equal to those available with analog television using the best source material. There is a resolution of 640 pixels wide by 480 pixels tall available in progressive scan formats with refresh rates of 60, 30, and 24 Hz (hertz - cycles per second). The 640 by 480 resolution is also available in a 60 Hz interlaced format. All four of the 640 by 480 standards are available only in the 4:3 "square" aspect ratio used by the old NTSC analog television standard.
A slightly higher resolution of 704 pixels wide by 480 pixels tall provides aspect ratios of both 4 by 3 and 16 by 9. These formats are available in 60, 30, and 24 Hz refresh rates with progressive scanning and in 60 Hz with interlaced scanning. All eight of the possible 704 by 480 formats are standard definition.
True high definition signals are available in two resolutions and in six varieties. All HDTV formats use the wide 16:9 aspect ratio. The first three HDTV formats provide resolutions of 1,280 pixels across by 720 pixels high all in progressive scan formats with refresh rates of 60, 30, and 24 Hz. The highest resolution HDTV format creates an image using 1,920 pixels left to right and 1,080 pixels top to bottom. This ultra-high resolution format is available in 30 and 24 Hz refresh rates using progressive scanning and 60 Hz using interlaced scanning. There are also plans to introduce a 60 Hz progressive scanning format of the 1,920 by 1,080 HDTV format when the technology becomes available to sufficiently compress and deliver such a signal.
All 18 digital television formats have a number of elements in common. They all transmit signals in a digital format. Additionally, all the formats make use of AC-3 digital sound for audio compression and a modification of the MPEG-2 format for video compression. The primary differences between standard definition digital televisions and high definition models are the resolution and aspect ratio, although there are standard definition formats that incorporate the wide 16 by 9 ratio.
HDTV signals at 60 Hz require around 19 megabits per second to carry all the necessary information to recreate the video. This 19 Mbits is achieved by compressing the pure signal through a special high definition subset of the MPEG-2 video compression scheme and Dolby's AC-3 audio compression scheme. In fact, the uncompressed HDTV data is about sixty times larger than the compressed 19 Mbit signal!
A 19 Mbit HDTV signal fits into a standard 6 MHz frequency band used for uncompressed, analog NTSC television signals. Each HDTV broadcaster has been allotted a frequency band for their HDTV broadcasts. However, using digital compression technology, that same bandwidth can carry a single high definition signal or multiple standard resolution signals. Since a 640 by 480 resolution standard definition digital television signal is not as large as a 1,920 by 1,080 resolution signal, several of the smaller signals can supplant the single high definition one.
Format Comparison:
Current TV standard - 525 X 700, 30fps, 60 fields/sec, 4:3 (w:h)
HDTV standard - 720 X 1280 or 1080 X 1920, 30fps, 60 fields/sec, 1.78:1 (w:h)
VHS quality - 320 X 240, 30fps
Broadcast quality - 720 X 480, 30fps
top of page
Component Video (S-Video)
Broadcast signals suffer from the
conversion to RF carrier waves, and are susceptible to RF interference. These
RF signals are received by an antenna and are converted back to composite video
by the tuner or receiver. They are further decoded into RGB signals for the
CRT. Each stage in the process can degrade the signal.
Excluding the RF stage of video results
in a much better signal. Even so, composite video is still several steps away
from the RGB signal needed by the CRT. There is still interference from modulating
the chroma onto the luminance signal.
A solution in video production work
is component video, which separates luminance and chrominance signals. The chroma
channel retains the hue and saturation information in one component, called
C. The luminance component is still Y, although recorded at a higher frequency,
making it possible to exceed 400 line resolution. The Y/C signal is used by
Hi8 Camcorders and S-VHS equipment.
top of page
Signal Sync
Video waveform is actually three
signals at once: luminance, chrominance, and sync. A voltage range describes
the luminance from black (minimum) to white (maximum). The calibration method
which is system independent is called the IRE levels. In NTSC, one volt peak
to peak corresponds to 140 IRE. In NTSC, the black level, or pedestal level,
is 7.5 IRE; and white level is 100 IRE. Blanking is controlled directly from
the video waveform- that is the guns are turned off anytime the level drops
below 7.5 IRE. Timing of all video equipment takes the form of sync pulses corresponding
to -40 IRE.
The other signal in the video waveform
is the color burst. This takes the form of nine consecutive cycles with absolute
peaks of 20 IRE and -20 IRE. It serves as a color sync signal and communicates
proper hue to the video monitor. Vertical sync keeps the picture from flipping.
Horizontal sync keeps the image from being skewed. Color sync ensures that the
proper color is displayed. Unlike audio, sync must be maintained for video signals
to work together.
50dB represents better than 200:1
signal to noise ratio, considered good. At 40 dB snow, or video noise, becomes
noticeable.
top of page
Video Formats
Betacam SPM-IIDeveloped by Sony, perhaps the most popular component format for both field acquisition and post production today. Betacam uses cassettes and transports similar to the old Betamax home video format, but the similarities end there. Tape speed is six times higher, and luminance and chrominance are recorded on two separate tracks. The two colour difference signals are compressed in time by two and recorded sequentially on a single track.
M-II (read em-two) was developed by Matsushita for Japan's national broadcasting company NHK. Today M-II is one of the most popular broadcast quality component formats with quality similar to Betacam SP. Large users of M-II include NHK, of course, and NBC. Recording technique is similar to Betacam SP but uses some enhancements which compensate for the lower tape speed.
U-Matic
Another format by Sony. Has three different versions (LB, HB and SP), which differ by the subcarrier frequencies used for luminance and chrominance recording. U-Matic LB (Low Band) has been around from the early 70s and is one of the oldest cassette video formats. HB (High Band) has increased chroma subcarrier frequency, which improves colour resolution. In the SP variant, both chroma and luma subcarrier frequencies have been increased.
U-Matic SP (in common lingo "3/4" after the tape width in inches) is still a popular production format for those not wealthy enough to use Beta SP or similar. Although U-Matic doesn't appear much better than Super VHS on paper, the higher colour resolution and much better signal-to-noise ratio make the picture subjectively far more enjoyable. The U-Matic tape transport is also much faster in changing modes, which makes editing less frustrating.
LB and HB U-Matic tapes are often used for archiving because of the relatively low tape costs and low recording density, which makes the tapes robust against aging.
DV/DVCPRO
DV (formerly DVC) is backed by manufacturers such as Sony, Philips, Thomson, Hitachi, Matsushita (Panasonic) and others. It was the first digital recording format in the reach of consumer markets. DV uses 5:1 compression based on DCT. Depending on the image contents, the encoder adaptively decides whether to compress picture fields separately or combine two fields into a single compression block. As such, DV coding can be thought of as something half-way between Motion JPEG and MPEG.
DVCPRO is a professional variant of the DV by Panasonic. The only major difference is doubled tape speed, which is needed for better drop-out tolerance and general recording robustness. It is also capable of 4x normal speed playback. This doesn't mean your run-of-the-mill FF with picture, but accelerated transfer of all of the information into for example a non-linear editing system.
As for the picture quality, all these variants are nearly broadcast quality, DV being available at nearly consumer prices. For newsgathering and other similar uses, the quality is certainly enough, especially considering that typical postproduction will be done digitally, which will not degrade the quality any further. Compression is mild enough to keep artifacts away in all but problem scenes. The quantization will be visible if you try something like chroma keying, however.
Video Equipment
Different quality equipment yields different quality analog video
Video Connections
Traditional TV antenna connections
are 300-ohm. The cable coming into your home is a 75-ohm F -connector.
All cables used to carry video signals
are coaxial to shield against RF.
On consumer products RCA plugs are
used for both audio and video connections.
Professional video equipment more often employs twist-locking BNC connectors for video, and XLR's for audio.
Digital Video
Analog signals from a video source
are converted to digital values via an ADC. The digital information is converted
back to analog by a DAC to be viewed on a monitor.
The major issue in digital video
is the disk space it takes up, and the corollary issues of transmission, throughput
and display. A video image that is 640X480 with a 24 bit resolution at 30 fps
represents a little over 26MB per second, not including audio. Given that, a
1 GB disk can only hold about 38 seconds of digital video.
Compromise is required to incorporate
digital video in multimedia. Frame size can be reduced, as is often done in
games and other applications to provide menu and control space on screen. Pixel
resolution can be sacrificed, and 24bit video can be dithered down to 8bit.
Finally, the frame rate can be reduced. Below 16fps, however, flicker becomes
very noticeable.
top of page
FireWire is a computer serial bus more accurately known as IEEE 1394. It is bi-directional (meaning it can carry signals in both directions). Although was not developed specifically for digital cameras it was designed with applications like video in mind.
FireWire can transfer video at 4 times real time for rapid transfer to a computer or edit station. FireWire is the only computer bus connector provided on any DV format camcorders. A DV recorded signal can also be transferred via analog composite or S-video connectors as well as digitally via Firewire.
FireWire supports 63 devices on a single bus (SCSI supports 7, SCSI Wide supports 15) and allows busses to be bridged (joined together) to give a theoretical maximum of thousands of devices. It uses a thin, easy to handle cable that can stretch further between devices than SCSI which only supports a maximum 'chain' length of 7 meters (20 feet).
Data Transfer Rate
Port | Megabytes per second |
Serial | .01 |
Parallel | .115 |
USB | 1.5 |
SCSI-1 | 5 |
SCSI-2 | 10 |
Ultra SCSI | 20 |
FireWire | 12.5-50 |
Wide Ultra SCSI | 40 |
Video compression
Most compression schemes are lossy,
and many are adaptive. That means they can be implemented so that the compression
algorithm can be optimized for an image or series of images.
One approach is to average color
areas so that redundancy compression will be effective. This is called quantization
of the image. It can be applied in increasing steps.
Another scheme is to use motion compression.
In scenes where the action is limited, significant compression is achieved by
only redrawing the pixels that change from one frame to the next. The utility
of this method is limited when the scene changes or the camera zooms or pans.
In a related scheme, rapid motion scenes, which the eye perceives as a blur,
can be more heavily compressed, and when the action slows down, a less lossy
codec can be used.
Most codecs are actually a combination of these approaches. Some standardization has taken place, as in the case of MPEG.
Compression techniques take advantage of redundancy or coherence in images
Compression techniques may apply to color bands individually or together, and may be either fixed or adaptive.
Codecs
:
Cinepak is a vector quantization based codec developed specifically to deliver 24-bit video in quarter screen (320 X 240 pixel) windows from files restricted to single-spin CD-ROM data rates. Vector quantization stores information about differences between frames of video by quantifying the magnitude and direction of a pixel's movement. Cinepak's decompressor then uses a CLUT (color lookup table) to recreate the color of each pixel in a frame.
Motion JPEG:
JPEG is a well-established still-image compression codec that removes the redundancies in individual frames. Motion-JPEG is an adaptation of the still-image standard to video. Motion-JPEG uses the same algorithms as JPEG to create I-frames (compressed intraframes), and then successive frames are compressed by holding the compression parameters constant, to keep up with the video data stream in real-time. There are various, non-compatible versions of Motion JPEG from different manufacturers. Using fast codec acceleration hardware available from several manufacturers, I-frames are coded and decoded symmetrically in less than one-thirtieth of a second.
MPEG:
MPEG was designed for digital video. MPEG uses the same algorithms as JPEG to create one I-frame, then removes the redundancy from successive frames by predicting them from the I-frame and encoding only the difference from its predictions. This is called interframe compression. The MPEG committee created two standards, MPEG I that can play back from a single speed CD-ROM (150 KB per second) at 320 X 240 at 30 fps and MPEG II with enough data (1.2 MB/second) to encode studio-quality video at 704 X 480 at 30 frames per second.
SMPTE Time Code
SMPTE time code is the universal
reference for synchronizing audio and video.
SMPTE readouts are in this form:
HOURS:MINUTES:SECONDS:FRAME
This reference allows each frame
to be identified and relocated. SMPTE also specifies frame rates in use around
the world: 24 fps for film, 25 fps for non-NTSC video, 30 fps for NTSC black
and white video, and 30 fps drop frame for NTSC color video.
Because of the issues regarding modulation
of chroma onto the luminance signal the actual frame rate equals 29.97. This
actually means that the nominal 30 fps is off by 108 frames per hour, or 3.6
seconds. Drop frame compensates for this problem by omitting reference to 2
frames per minute except the tenth minute.
A pulse wave carries the reference
to SMPTE time code. These pulses can recorded onto videotape using a time code
generator, in a process called striping a tape. The pulse waves fluctuate between
2400Hz and 4800Hz.
top of page
Digital SMPTE
A SMPTE reader translates the 2400Hz
pulse into a digital 0, and the 4800Hz pulse to a digital 1, and converts the
bit stream into meaningful information. This yields 80 bits of data for each
frame that can even record user data such as the recording date, reel number,
etc.
There are two ways of recording this
data to tape: LTC and VITC.
LTC (Linear Time Code) resides on
a linear track, such as the audio track of a video tape. LTC can't be read a
transport speeds much higher or lower than playback.
VITC (Vertical Interval Time Code)
is stored in the video blanking interval of the video signal itself. The advantage
of this method is that the time code can be read at any playback speed, even
freeze-frame. One disadvantage is that VITC can't be striped onto a tape- it
must be recorded when the video is recorded.
top of page
DVD
DVD is essentially a bigger, faster CD that can hold cinema-like video, better-than-CD audio, and computer data. DVD aims to encompass home entertainment, computers, and business information with a single digital format, eventually replacing audio CD, videotape, laserdisc, CD-ROM, and video game cartridges. DVD has widespread support from all major electronics companies, all major computer hardware companies, and all major movie and music studios. With this unprecedented support, DVD has become the most successful consumer electronics product of all time in less than three years of its introduction.
It's important to understand the difference between the physical formats (such as DVD-ROM or DVD-R) and the application formats (such as DVD-Video or DVD-Audio). DVD-ROM is the base format that holds data. DVD-Video (often simply called DVD) defines how video programs such as movies are stored on disc and played in a DVD-Video player or a DVD computer. The difference is similar to that between CD-ROM and Audio CD. DVD-ROM includes recordable variations DVD-R/RW, DVD-RAM, and DVD+R/RW . The application formats include DVD-Video, DVD-Video Recording, DVD-Audio, DVD-Audio Recording, DVD Stream Recording. There are also special application formats for game consoles such as Sony PlayStation 2.
Audio/Video Specifications:
Data Transfer Rate: | Variable speed date transfer at an average rate of 4.69 megabits/ second for image and sound |
Image Compression: | MPEG-2 digital image compression |
Audio: | Dolby AC-3 (5.1 ch), LPCM for NTSC and MPEG Audio, LPCM for PAL/SECAM (A maximum of 8 audio channels and 32 subtitle channels can be stored) |
Running Time (movies): | 133 min./side (At an average data rate of 4.69 megabits/ second for image and sound, including 3 audio channels and 4 sub-title channels) |
Streaming Video Over the Internet
Streaming video across the Internet has its own specific issues that must be addressed. The Internet was not designed for real time streaming. The Internet is a shared medium and uses a best effort delivery mechanism, Internet Protocol (IP) to deliver content. There is no dedicated path between source and the sink. IP breaks content up into self contained packets and these packets are routed independently. Limited bandwidth, latency, noise, packet loss, retransmission and out of order packet delivery are all problems that can affect real time streaming over the Internet.
All Internet streaming technologies get around this by buffering a certain amount of content before actually starting to play. The buffer irons out the natural traffic variations inherent on the Internet. Many seconds worth of content can be buffered and in excess of 30 seconds worth is not uncommon. Note that after the initial buffering the streamed broadcast will start to play at the same time as more content is being downloaded. This is an improvement over earlier technologies where the whole file had to be downloaded before playing could commence.
Over the last couple of years cable and DSL access has been increasing allowing bandwidths between 128kb/s to 512kb/s to be available to end users. At this bit rate near-VHS quality rich media can be achieved through modern compression techniques and sophisticated codec technology.
Streaming CodecsThere are a variety of compression systems used today. The Motion Picture Experts Group (MPEG) has three open (ISO/IEC) standards that can be used for streaming.
MPEG-1, originally developed for VHS quality video on CD-ROM in 1988 and has its optimal bit rate at about 1.5Mb/s for quarter screen TV (352x240) at 30 frames/sec. MPEG-1 is mainly considered as a storage format, however it does offer excellent streaming quality for the bit-rate it supports.
MPEG-2 was ratified in 1996. It was designed for use in digital TV broadcasting and is best known for DVD encoding. Its target bit-rate is between 4 and 9Mb/sec but it can be used in HDTV for resolutions up to 1920x1080 pixels at 30 frames/sec which will witness average bit rates up to 80 Mb/sec. As an Internet streaming technology it is probably not useful as it uses bit rates higher than those to which almost everyone has access.
MPEG-4 was ratified in 1999 and is a new standard specifically developed to address Web and mobile delivery. Its optimal bit rate is between 385 to 768 Kb/sec. There is still active work continuing on this standard but a number of groups are putting some heavy research and development efforts behind making MPEG4 the standard on the Internet. Codecs are currently available from Microsoft and Apple.
Despite the open standards of MPEG most people use one of the big three proprietary formats. These are RealMedia, Quicktime and Windows Media. All three have specific advantages which have allowed them to gain ground in the market - mainly because they are free, and support the Real Time Streaming Protocol (RTSP).
RealMediaQuicktimeA very popular player which is very widely distributed and available for all major OS platforms. RealNetworks claim over 70% of the Internet streaming market with the player being installed on over 90% of home PCs.
RealSystem 8 supports over 40 media formats. Surestream is an automatic multi bit-rate technology that will adjust the streamed data rate to suit the client's connectivity. In practical terms this means that a single encoding will suit all users from dial-up to corporate LAN. Also supported is Synchronised Multimedia Integration Language (SMIL) which allows mixed multimedia content to be delivered in a synchronised way.
Windows Media PlayerOriginally developed in 1991 QT now claims more than 100 million copies distributed world-wide. Quicktime's major advantages are its maturity and the large number of codecs available for it. It features an open plug-in feature to allow third party codecs to be added. MPEG1 and MPEG4 codecs are currently available.
The plug-in feature has allowed over 200 digital media formats to be supported with companies such as Sorenson Labs producing very impressive codecs. As with RealPlayer, SMIL is available and RTSP is also supported.
Windows Media Player is the newcomer to the streaming world. Because of this there are fewer codecs available for it. There is an MPEG4 codec and Microsoft's proprietary but very good ASF codec. Microsoft have put some work into their RTSP implementation and it is considered more efficient than others. SMIL is supported, but only at a basic level.
QuickTime VR
Panoramas:
QTVR Panoramas are in essence the view from a single point in space out to a surrounding environment. From the central observation point, called a node, a viewer can look in any direction and may zoom in or out from a particular view by changing the zoom angle of their view. Panoramas can be created in a number of ways. The most common method is to capture a series of source images around a single point of rotation, then digitize them into source Pict files. Individual Picts captured around a point of rotation These source Picts are then stitched together using the QTVR tools to create a single panorama Pict image that represents a cylindrical view from the point of rotation. Alternatively, many panoramic cameras create a single panoramic image directly on film, and these may also be converted into a QTVR panorama.
The source images for panoramas can also be created via a 3D rendering or CAD application. Most rendering programs can export a 360 degree panoramic Pict from any point in the scene. Even if a 3D program cannot do this, all rendering applications can generate a series of individual Pict files around a given point. These can be exported and then stitched together with the QTVR tools to create the panoramic Pict. Finally, QTVR panoramas support the creation of hot spot areas, which function as an invisible yet detectable mask on the final panorama. Developers can use these hot spots to link panoramas to other QTVR panoramas, QTVR objects, QTVR scenes, or other media such as graphics, text, videos, and sounds via an authoring environment. Alternatively, the same hot spots can be used to reference World Wide Web sites when the QTVR panorama is included on a web page.
Objects:
Where panoramas are represented by a 360 degree view from a single point in space, QTVR objects are essentially the reverse: the view from multiple points in space onto a single point, or object. Where QTVR panoramas are composed from a single cylindrical Pict image, objects are composed from a number of individual views that have been captured or rendered. The individual views are not stitched together, but instead are placed in an ordered sequence that enables a user to shift rapidly from view to view. Depending on the number of views that have been captured and assembled, the object movie can be manipulated to provide a full or partial range of vertical and horizontal motion. In addition, animation frames can be associated with any particular horizontal and vertical view. When the object movie is created, these animation frames can be specified to play back only once, loop indefinitely, or play as a palindrome (play the first frame to last frame, then play in reverse from last frame to first frame).
Manipulating an object in space, however, is just one metaphor for how object movies are used. A different approach to QTVR objects is represented in the absolute referenced type of object movie. While typical object movies rotate through the available horizontal and vertical views based on clicking and dragging a mouse, absolute referenced object movies go directly to specific frames based on the position of the mouse within the object movie's frame window. Thus, if a user clicks in the center of the window, one view is shown, while if they click near one or another edge of the movie, a completely different frame can be shown. This type of object movie is ideal for representing a static object with various effects tied to the position of the mouse. Like panoramas, objects also support hot spots for linking to other objects or media, and can support zooming into or out from a given view.
One additional capability of object movies, though, is their ability to directly support QuickTime audio and sprite tracks which are associated with specific views. Thus, an object movie that is panned can play a different audio track at each view in the pan, while an absolute referenced object movie can play a different audio track based upon the relative position of a mouse click within the viewing frame.