Wiki

Wiki Display

Subsetting

Details Print

Definition

The extraction of a multi-dimensional rectangular array of pixels from a single data granule, where consecutive pixels are extracted from each array dimension. For each dimension, the size of the pixel array is characterized by the starting pixel location and the number of pixels to extract.

Delivering a subset of data from a set of structured data granules using time, space, variable type, and data characteristics.

Comments

Data subsetting can be achieved through several tools which support OpenDAP (THREDDS, Hyrax, ERDDAP). Specialized data APIs also support subsetting for order fulfillment (Common Access).

In research communities (for example, earth sciences, astronomy, business, and government), subsetting is the process of retrieving just the parts of large files which are of interest for a specific purpose. This occurs usually in a client—server setting, where the extraction of the parts of interest occurs on the server before the data is sent to the client over a network. The main purpose of subsetting is to save bandwidth on the network and storage space on the client computer.

Subsetting may be favorable for the following reasons:[1]

restrict or divide the time range
select cross sections of data
select particular kinds of time series
exclude particular observations

Subsetting within Programs

You can subset within statistical software programs to help speed up the process of subsetting if needed. There are many different types of subsetting that can provide challenges with using software programs though.

Some types of subsetting are:

Atomic Vectors
Lists
Matrices and Arrays
Data Frames
S3 Objects
S4 Objects

Source

ECS Project

NCEI/DAB

NESDIS Data Management Lexicon and Related Terms

Category

4. Data Stewardship words

2716 Views

Wiki

Search Bar

Menu Display

Wiki

Wiki Display

Subsetting

Menu Display