- •Copyright
- •Contents
- •About the Author
- •Foreword
- •Preface
- •Glossary
- •1 Introduction
- •1.1 THE SCENE
- •1.2 VIDEO COMPRESSION
- •1.4 THIS BOOK
- •1.5 REFERENCES
- •2 Video Formats and Quality
- •2.1 INTRODUCTION
- •2.2 NATURAL VIDEO SCENES
- •2.3 CAPTURE
- •2.3.1 Spatial Sampling
- •2.3.2 Temporal Sampling
- •2.3.3 Frames and Fields
- •2.4 COLOUR SPACES
- •2.4.2 YCbCr
- •2.4.3 YCbCr Sampling Formats
- •2.5 VIDEO FORMATS
- •2.6 QUALITY
- •2.6.1 Subjective Quality Measurement
- •2.6.2 Objective Quality Measurement
- •2.7 CONCLUSIONS
- •2.8 REFERENCES
- •3 Video Coding Concepts
- •3.1 INTRODUCTION
- •3.2 VIDEO CODEC
- •3.3 TEMPORAL MODEL
- •3.3.1 Prediction from the Previous Video Frame
- •3.3.2 Changes due to Motion
- •3.3.4 Motion Compensated Prediction of a Macroblock
- •3.3.5 Motion Compensation Block Size
- •3.4 IMAGE MODEL
- •3.4.1 Predictive Image Coding
- •3.4.2 Transform Coding
- •3.4.3 Quantisation
- •3.4.4 Reordering and Zero Encoding
- •3.5 ENTROPY CODER
- •3.5.1 Predictive Coding
- •3.5.3 Arithmetic Coding
- •3.7 CONCLUSIONS
- •3.8 REFERENCES
- •4 The MPEG-4 and H.264 Standards
- •4.1 INTRODUCTION
- •4.2 DEVELOPING THE STANDARDS
- •4.2.1 ISO MPEG
- •4.2.4 Development History
- •4.2.5 Deciding the Content of the Standards
- •4.3 USING THE STANDARDS
- •4.3.1 What the Standards Cover
- •4.3.2 Decoding the Standards
- •4.3.3 Conforming to the Standards
- •4.7 RELATED STANDARDS
- •4.7.1 JPEG and JPEG2000
- •4.8 CONCLUSIONS
- •4.9 REFERENCES
- •5 MPEG-4 Visual
- •5.1 INTRODUCTION
- •5.2.1 Features
- •5.2.3 Video Objects
- •5.3 CODING RECTANGULAR FRAMES
- •5.3.1 Input and output video format
- •5.5 SCALABLE VIDEO CODING
- •5.5.1 Spatial Scalability
- •5.5.2 Temporal Scalability
- •5.5.3 Fine Granular Scalability
- •5.6 TEXTURE CODING
- •5.8 CODING SYNTHETIC VISUAL SCENES
- •5.8.1 Animated 2D and 3D Mesh Coding
- •5.8.2 Face and Body Animation
- •5.9 CONCLUSIONS
- •5.10 REFERENCES
- •6.1 INTRODUCTION
- •6.1.1 Terminology
- •6.3.2 Video Format
- •6.3.3 Coded Data Format
- •6.3.4 Reference Pictures
- •6.3.5 Slices
- •6.3.6 Macroblocks
- •6.4 THE BASELINE PROFILE
- •6.4.1 Overview
- •6.4.2 Reference Picture Management
- •6.4.3 Slices
- •6.4.4 Macroblock Prediction
- •6.4.5 Inter Prediction
- •6.4.6 Intra Prediction
- •6.4.7 Deblocking Filter
- •6.4.8 Transform and Quantisation
- •6.4.11 The Complete Transform, Quantisation, Rescaling and Inverse Transform Process
- •6.4.12 Reordering
- •6.4.13 Entropy Coding
- •6.5 THE MAIN PROFILE
- •6.5.1 B slices
- •6.5.2 Weighted Prediction
- •6.5.3 Interlaced Video
- •6.6 THE EXTENDED PROFILE
- •6.6.1 SP and SI slices
- •6.6.2 Data Partitioned Slices
- •6.8 CONCLUSIONS
- •6.9 REFERENCES
- •7 Design and Performance
- •7.1 INTRODUCTION
- •7.2 FUNCTIONAL DESIGN
- •7.2.1 Segmentation
- •7.2.2 Motion Estimation
- •7.2.4 Wavelet Transform
- •7.2.6 Entropy Coding
- •7.3 INPUT AND OUTPUT
- •7.3.1 Interfacing
- •7.4 PERFORMANCE
- •7.4.1 Criteria
- •7.4.2 Subjective Performance
- •7.4.4 Computational Performance
- •7.4.5 Performance Optimisation
- •7.5 RATE CONTROL
- •7.6 TRANSPORT AND STORAGE
- •7.6.1 Transport Mechanisms
- •7.6.2 File Formats
- •7.6.3 Coding and Transport Issues
- •7.7 CONCLUSIONS
- •7.8 REFERENCES
- •8 Applications and Directions
- •8.1 INTRODUCTION
- •8.2 APPLICATIONS
- •8.3 PLATFORMS
- •8.4 CHOOSING A CODEC
- •8.5 COMMERCIAL ISSUES
- •8.5.1 Open Standards?
- •8.5.3 Capturing the Market
- •8.6 FUTURE DIRECTIONS
- •8.7 CONCLUSIONS
- •8.8 REFERENCES
- •Bibliography
- •Index
• |
DESIGN AND PERFORMANCE |
238 |
Figure 7.13 8 × 8 block after FDCT, quant, rescale, IDCT
7.2.4 Wavelet Transform
The DWT was chosen for MPEG-4 still texture coding because it can out-perform blockbased transforms for still image coding (although the Intra prediction and transform in H.264 performs well for still images). A number of algorithms have been proposed for the efficient coding and decoding of the DWT [23–25]. One issue related to software and hardware implementations of the DWT is that it requires substantially more memory than block transforms, since the transform operates on a complete image or a large section of an image (rather than a relatively small block of samples).
7.2.5 Quantise/Rescale
Scalar quantisation and rescaling (Chapter 3) can be implemented by division and/or multiplication by constant parameters (controlled by a quantisation parameter or quantiser step size). In general, multiplication is an expensive computation and some gains may be achieved by integrating the quantisation and rescaling multiplications with the forward and inverse transforms respectively. In H.264, the specification of the quantiser is combined with that of the transform in order to facilitate this combination (see Chapter 6).
7.2.6 Entropy Coding
7.2.6.1 Variable-Length Encoding
In Chapter 3 we introduced the concept of entropy coding using variable-length codes (VLCs). In MPEG-4 Visual and H.264, the VLC required to encode each data symbol is defined by the standard. During encoding each data symbol is replaced by the appropriate VLC, determined by (a) the context (e.g. whether the data symbol is a header value, transform coefficient,
FUNCTIONAL DESIGN |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
239 |
|
|||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Variable-length encoding example |
|
|
|
|
|
|
|
|
• |
||||||||||||||
|
|
|
|
|
|
Table 7.1 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||||||||||||||
|
|
Input VLC |
|
|
|
|
R (before output) |
|
|
|
R (after output) |
|
|
|
|
|
|
||||||||||||||||||
|
Value, V |
Length, L |
Value |
Size |
|
|
|
Value |
Size |
|
|
|
Output |
||||||||||||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||
|
|
– |
|
– |
|
|
|
– |
0 |
|
|
|
|
|
– |
0 |
|
|
|
|
|
|
– |
||||||||||||
101 |
|
|
3 |
|
|
|
|
|
101 |
3 |
|
101 |
|
3 |
|
|
|
|
|
|
– |
||||||||||||||
11100 |
|
|
5 |
|
|
|
|
11100101 |
8 |
|
|
|
|
|
– |
0 |
|
|
|
11100101 |
|
||||||||||||||
100 |
|
|
3 |
|
|
|
|
|
100 |
3 |
|
100 |
|
3 |
|
|
|
|
|
|
– |
||||||||||||||
101 |
|
|
3 |
|
|
|
|
|
101100 |
6 |
|
101100 |
|
6 |
|
|
|
|
|
|
– |
||||||||||||||
101 |
|
|
3 |
|
|
|
|
101101100 |
9 |
|
1 |
|
1 |
|
|
|
01101100 |
|
|||||||||||||||||
11100 |
|
|
5 |
|
|
|
|
|
111001 |
6 |
|
111001 |
|
6 |
|
|
|
|
|
|
– |
||||||||||||||
1101 |
|
|
4 |
|
|
|
|
1101111001 |
10 |
|
11 |
|
2 |
|
|
|
01111001 |
|
|||||||||||||||||
|
. . . etc. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
New data |
|
|
|
Select VLC |
|
|
|
Look up value |
Pack L bits of |
|
|
|
|
More than S |
|
|
|
|
|
|
|
|
Finished data |
||||||||||||
|
|
|
|
|
|
V into output |
|
|
|
|
|
|
no |
|
|
|
|
||||||||||||||||||
symbol |
|
table |
|
|
|
V and length L |
|
|
register R |
|
|
|
|
bytes in R ? |
|
|
|
|
|
|
|
symbol |
|||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
yes |
|
|
|
|
|
|
|
|
|
|
|
|
|
||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Write S least |
|
|
Right-shift R |
|
|
|
|
|
||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
significant bytes to |
|
|
|
|
|
|||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
by S bytes |
|
|
|
|
|
|
||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
stream |
|
|
|
|
|
|
|
|
||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|||||
|
|
|
|
|
|
Figure 7.14 Variable length encoding flowchart |
|
|
|
|
|
|
|
|
motion vector component, etc.) and (b) the value of the data symbol. Chapter 3 presented some examples of pre-defined VLC tables from MPEG-4 Visual.
VLCs (by definition) contain variable numbers of bits but in many practical transport situations it is necessary to map a series of VLCs produced by the encoder to a stream of bytes or words. A mechanism for carrying this out is shown in Figure 7.14. An output register, R, collects encoded VLCs until enough data are present to write out one or more bytes to the stream. When a new data symbol is encoded, the value V of the VLC is concatenated with the previous contents of R (with the new VLC occupying the most significant bits). A count of the number of bits held in R is incremented by L (the length of the new VLC in bits). If R contains more than S bytes (where S is the number of bytes to be written to the stream at a time), the S least significant bytes of R are written to the stream and the contents of R are right-shifted by S bytes.
Example
A series of VLCs (from Table 3.12, Chapter 3) are encoded using the above method. S = 1, i.e. 1 byte is written to the stream at a time. Table 7.1 shows the variable-length encoding process at each stage with each output byte highlighted in bold type.
Figure 7.15 shows a basic architecture for carrying out the VLE process. A new data symbol and context indication (table selection) are passed to a look-up unit that returns the value V and length L of the codeword. A packer unit concatenates sequences of VLCs and outputs S bytes at a time (in a similar way to the above example).
240 |
|
|
|
|
|
DESIGN AND PERFORMANCE |
||
|
data |
|
|
value V |
|
|
|
|
|
|
Look-up |
|
Pack |
|
sequence of |
|
|
• table select |
|
length L |
|
|
||||
|
table |
output |
|
S-byte words |
|
|||
|
|
Figure 7.15 Variable length encoding architecture |
|
|||||
|
|
|
incomplete |
|
|
|
|
|
Start decoding |
Select VLC |
Read 1 bit |
VLC detected? |
valid |
Return syntax |
Finished |
||
table |
|
element |
decoding |
|||||
|
|
|
|
|
|
|||
|
|
|
|
|
invalid |
|
Return error |
|
|
|
|
|
|
|
indication |
|
|
|
|
|
|
|
|
|
|
Figure 7.16 Flowchart for decoding one VLC
Issues to consider when designing a variable length encoder include computational efficiency and look-up table size. In software, VLE can be processor-intensive because of the large number of bit-level operations required to pack and shift the codes. Look-up table design can be problematic because of the large size and irregular structure of VLC tables. For example, the MPEG-4 Visual TCOEF table (see Chapter 3) is indexed by the three parameters Run (number of preceding zero coefficients), Level (nonzero coefficient level) and Last (final nonzero coefficient in a block). There are only 102 valid VLCs but over 16 000 valid combinations of Run, Level and Last, each corresponding to a VLC of up to 13 bits or a 20-bit ‘Escape’ code, and so this table may require a significant amount of storage. In the H.264 Variable Length Coding scheme, many symbols are represented by ‘universal’ Exp-Golomb codes that can be calculated from the data symbol value (avoiding the need for large VLC look-up tables) (see Chapter 6).
7.2.6.2 Variable-length Decoding
Decoding VLCs involves ‘scanning’ or parsing a received bitstream for valid codewords, extracting these codewords and decoding the appropriate syntax elements. As with the encoding process, it is necessary for the decoder to know the current context in order to select the correct codeword table. Figure 7.16 illustrates a simple method of decoding one VLC. The decoder reads successive bits of the input bitstream until a valid VLC is detected (the usual case) or an invalid VLC is detected (i.e. a code that is not valid within the current context). For example, a code starting with nine or more zeros is not a valid VLC if the decoder is expecting an MPEG-4 Transform Coefficient. The decoder returns the appropriate syntax element if a valid VLC is found, or an error indication if an invalid VLC is detected.
VLC decoding can be computationally intensive, memory intensive or both. One method of implementing the decoder is as a Finite State Machine. The decoder starts at an initial state and moves through successive states based on the value of each bit. Eventually, the decoder reaches a state that corresponds to (a) a complete, valid VLC or (b) an invalid VLC. The