Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Richardson I.E.H.264 and MPEG-4 video compression.2003.pdf
Скачиваний:
30
Добавлен:
23.08.2013
Размер:
4.27 Mб
Скачать

TEXTURE CODING

149

 

Rectangular Temporal Scalability (Section 5.5.2);

Rectangular Spatial Scalability (Section 5.5.1);

Object-based Spatial Scalability (Section 5.5.1).

5.5.6 The Fine Granular Scalability Profile

The FGS profile includes Simple and Advanced Simple objects plus the FGS object which includes these tools:

B-VOP, Interlace and Alternate Quantiser tools;

FGS Spatial Scalability;

FGS Temporal Scalability.

FGS ‘Spatial Scalability’ uses the encoding and decoding techniques described in Section 5.5.3 to encode each frame as a base layer and an FGS enhancement layer. FGS ‘Temporal Scalability’ combines FGS (Section 5.5.3) with temporal scalability (Section 5.5.2). An enhancement-layer frame is encoded using forward or bidirectional prediction from base layer frame(s) only. The DCT coefficients of the enhancement-layer frame are encoded in bitplanes using the FGS technique.

5.6 TEXTURE CODING

The applications targeted by the developers of MPEG4 include scenarios where it is necessary to transmit still texture (i.e. still images). Whilst block transforms such as the DCT are widely considered to be the best practical solution for motion-compensated video coding, the Discrete Wavelet Transform (DWT) is particularly effective for coding still images (see Chapter 3) and MPEG-4 Visual uses the DWT as the basis for tools to compress still texture. Applications include the coding of rectangular texture objects (such as complete image frames), coding of arbitrary-shaped texture regions and coding of texture to be mapped onto animated 2D or 3D meshes (see Section 5.8).

The basic structure of a still texture encoder is shown in Figure 5.68. A 2D DWT is applied to the texture object, producing a DC component (low-frequency subband) and a number of AC (high-frequency) subbands (see Chapter 3). The DC subband is quantised, predictively encoded (using a form of DPCM) and entropy encoded using an arithmetic encoder. The AC subbands are quantised and reordered (‘scanned’), zero-tree encoded and entropy encoded.

Discrete Wavelet Transform

The DWT adopted for MPEG-4 Still Texture coding is the Daubechies (9,3)-tap biorthogonal filter [6]. This is essentially a matched pair of filters, one low pass (with three filter coefficients or ‘taps’) and one high pass (with nine filter taps).

Quantisation

The DC subband is quantised using a scalar quantiser (see Chapter 3). The AC subbands may be quantised in one of three ways:

150

 

 

 

 

MPEG-4 VISUAL

 

 

 

 

 

 

 

 

 

Scalable

 

Scalable still

 

 

texture

 

Texture

 

 

 

 

 

 

 

Advanced

 

 

 

 

Scalable shape

 

Scalable

 

 

 

 

 

Texture

 

 

 

 

coding

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Texture error

 

 

 

 

 

 

resilience

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Wavelet tiling

 

 

 

 

 

 

 

 

 

 

 

 

Figure 5.67 Tools and objects for texture coding

 

 

 

 

DC subband

Quant

 

 

Predictive

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Still

 

 

 

 

 

 

coding

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

DWT

 

 

 

 

 

 

 

 

 

 

Arithmetic

 

Coded

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

texture

 

 

 

 

 

 

 

 

 

 

encoder

 

 

 

 

 

 

 

 

 

 

 

 

bitstream

 

 

 

 

 

 

 

Quant and

 

 

Zero-tree

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Scanning

 

 

coding

 

 

 

 

 

 

 

 

 

AC subbands

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Figure 5.68 Wavelet still texture encoder block diagram

1.Scalar quantisation using a single quantiser (‘mode 1’), prior to reordering and zero-tree encoding.

2.‘Bilevel’ quantisation (‘mode 3’) after reordering. The reordered coefficients are coded one bitplane at a time (see Section 5.5.3 for a discussion of bitplanes) using zero-tree encoding. The coded bitstream can be truncated at any point to provide highly scalable decoding (in a similar way to FGS, see previous section).

3.‘Multilevel’ quantisation (‘mode 2’) prior to reordering and zero-tree encoding. A series of quantisers are applied, from coarse to fine, with the output of each quantiser forming a series of layers (a type of scalable coding).

Reordering

The coefficients of the AC subbands are scanned or reordered in one of two ways:

1.Tree-order. A ‘parent’ coefficient in the lowest subband is coded first, followed by its ‘child’ coefficients in the next higher subband, and so on. This enables the EZW coding (see below) to exploit the correlation between parent and child coefficients. The first three trees to be coded in a set of coefficients are shown in Figure 5.69.

TEXTURE CODING

151

DC

1st tree

2nd tree

3rd tree

Figure 5.69 Tree-order scanning

1st band 2nd band

3rd band

DC

Figure 5.70 Band-by-band scanning

2.Band-by-band order. All the coefficients in the first AC subband are coded, followed by all the coefficients in the next subband, and so on (Figure 5.70). This scanning method tends to reduce coding efficiency but has the advantage that it supports a form of spatial scalability since a decoder can extract a reduced-resolution image by decoding a limited number of subbands.

DC Subband Coding

The coefficients in the DC subband are encoded using DPCM. Each coefficient is spatially predicted from neighbouring, previously-encoded coefficients.

152

MPEG-4 VISUAL

 

 

 

 

 

Table 5.9 Zero-tree coding symbols

 

 

 

 

Symbol

Meaning

 

ZeroTree Root (ZTR)

The current coefficient and all subsequent coefficients in the tree

 

Value + ZeroTree Root

(or band) are zero. No further data is coded for this tree (or band).

 

The current coefficient is nonzero but all subsequent coefficients are

 

(VZTR)

zero. No further data is coded for this tree/band.

 

Value (VAL)

The current coefficient is nonzero and one or more subsequent

 

 

coefficients are nonzero. Further data must be coded.

 

Isolated Zero (IZ)

The current coefficient is zero but one or more subsequent coefficients

 

 

are nonzero. Further data must be coded.

 

 

 

AC Subband Coding

Coding of coefficients in the AC subbands is based on EZW (Embedded Zerotree Wavelet coding). The coefficients of each tree (or each subband if band-by-band scanning is used) are encoded starting with the first coefficient (the ‘root’ of the tree if tree-order scanning is used) and each coefficient is coded as one of the four symbols listed in Table 5.9.

Entropy Coding

The symbols produced by the DC and AC subband encoding processes are entropy coded using a context-based arithmetic encoder. Arithmetic coding is described in Chapter 3 and the principle of context-based arithmetic coding is discussed in Section 5.4.1 and Chapter 6.

5.6.1 The Scalable Texture Profile

The Scalable Texture Profile contains just one object which in turn contains one tool, Scalable Texture. This tool supports the coding process described in the preceding section, for rectangular video objects only. By selecting the scanning mode and quantiser method it is possible to achieve several types of scalable coding.

(a)Single quantiser, tree-ordered scanning: no scalability.

(b)Band-by-band scanning: spatial scalability (by decoding a subset of the bands).

(c)Bilevel quantiser: bitplane-based scalability, similar to FGS.

(d)Multilevel quantiser: ‘quality’ scalability, with one layer per quantiser.

5.6.2The Advanced Scalable Texture Profile

The Advanced Scalable Texture profile contains the Advanced Scalable Texture object which adds extra tools to Scalable Texture. Wavelet tiling enables an image to be divided into several nonoverlapping sub-images or ‘tiles’, each coded using the wavelet texture coding process described above. This tool is particularly useful for CODECs with limited memory, since the wavelet transform and other processing steps can be applied to a subset of the image at a time. The shape coding tool adds object-based capabilities to the still texture coding process by adapting the DWT to deal with arbitrary-shaped texture objects. Using the error

CODING STUDIO-QUALITY VIDEO

153

I-VOP

Studio Slice

 

Studio DPCM

 

Studio Binary and

Simple Studio

 

Gray Shape

Core Studio

 

Interlace

P-VOP

 

Frame/Field

Studio Sprite

 

Figure 5.71 Tools and objects for studio coding

resilience tool, the coded texture is partitioned into packets (‘texture packets’). The bitstream is processed in Texture Units (TUs), each containing a DC subband, a complete coded tree structure (tree-order scanning) or a complete coded subband (band-by-band scanning). A texture packet contains one or more coded TUs. This packetising approach helps to minimise the effect of a transmission error by localising it to one decoded TU.

5.7 CODING STUDIO-QUALITY VIDEO

Before broadcasting digital video to the consumer it is necessary to code (or transcode) the material into a compressed format. In order to maximise the quality of the video delivered to the consumer it is important to maintain high quality during capture, editing and distribution between studios. The Simple Studio and Core Studio profiles of MPEG-4 Visual are designed to support coding of video at a very high quality for the studio environment. Important considerations include maintaining high fidelity (with near-lossless or lossless coding), support for 4:4:4 and 4:2:2 colour depths and ease of transcoding (conversion) to/from legacy formats such as MPEG-2.

5.7.1 The Simple Studio Profile

The Simple Studio object is intended for use in the capture, storage and editing of high quality video. It supports only I-VOPs (i.e. no temporal prediction) and the coding process is modified in a number of ways.

Source format: The Simple Studio profile supports coding of video sampled in 4:2:0, 4:2:2 and 4:4:4 YCbCr formats (see Chapter 2 for details of these sampling modes) with progressive

154

 

 

 

 

 

 

 

 

 

 

 

MPEG-4 VISUAL

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

0

 

1

 

4

 

 

5

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

2

 

3

 

6

 

 

7

 

 

 

 

 

 

 

Y

 

Cb

 

 

Cr

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

0

 

1

 

4

 

8

 

5

 

9

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

2

 

3

 

6

 

10

 

7

 

11

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Y

 

 

Cb

 

 

Cr

4:4:4 macroblock structure (12 blocks)

Figure 5.72 Modified macroblock structures (4:2:2 and 4:4:4 video)

0

1

2

3

 

4

 

6

5

8

7

 

9

 

......etc

Figure 5.73 Example slice structure

or interlaced scanning. The modified macroblock structures for 4:2:2 and 4:4:4 video are shown in Figure 5.72.

Transform and quantisation: The precision of the DCT and IDCT are extended by three fractional bits. Together with modifications to the forward and inverse quantisation processes, this enables fully lossless DCT-based encoding and decoding. In some cases, lossless DCT coding of intra data may result in a coded frame that is larger than the original and for this reason the encoder may optionally use DPCM to code the frame data instead of the DCT (see Chapter 3).

Shape coding: Binary shape information is coded using PCM rather than arithmetic coding (in order to simplify the encoding and decoding process). Alpha (grey) shape may be coded with an extended resolution of up to 12 bits.

Slices: Coded data are arranged in slices in a similar way to MPEG-2 coded video [7]. Each slice includes a start code and a series of coded macroblocks and the slices are arranged in raster order to cover the coded picture (see for example Figure 5.73). This structure is adopted to simplify transcoding to/from an MPEG-2 coded representation.

VOL headers: Additional data fields are added to the VOL header, mimicking those in an MPEG-2 picture header in order to simplify MPEG-2 transcoding.