Digital Preservation

Analog-to-digital: The process in which a continuous analog signal is quantized and converted to a series of binary integers. Also known as digitization.

Bit: Shorthand for binary digit, which has two optional values “0” or “1.” Eight bits means 8 binary digits. There are 256 possible combinations for 8 binary digits and therefore color depth of 8 bits represents 256 (2x2x2x2x2x2x2x2) possible colors. Because each pixel of a video picture contains 3 samples Y’, R-Y’, B-Y’, the possible colors of an 8-bit system would be 16.7 million (256 x 256 x 256).

Bit rate: The amount of data transported in a given amount of time, usually defined in Mega (Million) bits per second (Mbps). Bit rate is one way to define the amount of compression used on a video signal.

Bit error rate (BER): The percentage of bits that have errors in playback. One possible indicator for the deterioration. Playback is never perfect and there are many possible causes of error such as noise, dirt and dust, and drop out. In the binary world of digital data a bit is either correct or incorrect. Since it only has two states, the challenge is to correctly identify whether a bit is correct or not. To enable this the data is therefore coded by adding redundant bits. All systems build in redundancy and error correction mechanisms. Information about bit error rates can refer to the bit error rate prior to error correction or the residual errors after error correction.

Bitmap: An image made up of a given number of pixels, each with a specific color value, laid out in a grid. Ideal for reproducing photographic representations, because a sufficient quality and quantity of pixels can give the appearance of a continuous tone image.

Bitstream: The sequence of 1s and 0s passed among computers and input/output devices, typically containing a text message or audiovisual content.

Born DigitalA digital object that has never had an analog form. They differ from documents, movies and photographs that may have been scanned or converted to a digital format.

Byte: A multi-digit binary number is called a word. A word of 8 binary digits or bits is called a byte. The amount of data that can be moved over time is expressed as MBps (Megabytes per second) or KBps (Kilobytes per second). A kilobyte of memory contains 1024 bytes, one megabyte contains 1024 kilobytes and a gigabyte contains 1024 megabytes. These concepts are essential to understanding issues relating to the storage and format choices of digital materials as well as the terminology surrounding the measurement of errors

Checksum: A small set of data computed through an algorithm with the intent of detecting errors in data files or blocks introduced through storage or transfer. The checksum data accompanies or is otherwise associated with the data files or blocks, and is used to help ensure data integrity.

Codec: A compression/ decompression (sometimes coder/decoder) algorithm or scheme that reduces the volume of bits necessary to store a data object such as an image file (compression) but that allows the reconstruction of the compressed data into a usable format for display, processing, etc. (decompression). There are many different codecs, and they are often used to minimize file transfer time in order to optimize images or data for Web use.

Compression: Use of a method to reduce the size of digital files or streams of data, also referred to as data reduction. Compression is used to either save data storage space or better enable movement over networks or transmission lines. There are many different techniques to compress data, but all fall into one of two overall categories:

  • Lossless: Any data removed can be reconstructed.
  • Lossy: Some or all of the data removed is discarded and gone forever.

DAM: Digital Asset Management. A system that enables the management of digital objects, such as image files, from ingest to archiving and supports continued retrieval. Off-the-shelf DAM software may offer templates and other devices or strategies to facilitate ingest, metadata capture, and searching. May also be called media asset management (MAM).

Digital Archiving This term is used very differently within sectors. The library and archiving communities often use it interchangeably with digital preservation. Computing professionals tend to use digital archiving to mean the process of backup and ongoing maintenance as opposed to strategies for long-term digital preservation.

Digital preservation:  The specific problems and methods of preserving digital, as opposed to analog, assets because of their vulnerability to format obsolescence and media decay. Various strategies have been developed to respond to this, including documentation, the gathering of preservation metadata, the use of open standards, redundant storage, refreshing, migration, emulation, technology preservation, re-creation, and digital archaeology.

  • Short-term preservation – Access to digital materials either for a defined period of time while use is predicted but which does not extend beyond the foreseeable future and/or until it becomes inaccessible because of changes in technology.
  • Medium-term preservation – Continued access to digital materials beyond changes in technology for a defined period of time but not indefinitely.
  • Long-term preservation – Continued access to digital materials, or at least to the information contained in them, indefinitely.

Emulation: A digital preservation strategy that uses current software to simulate original or obsolete computer environments. May either restore full functionality to archival data or provide a simple viewing mechanism.

Encapsulation A strategy for digital media preservation that groups a digital object with all other entities that are necessary to provide access to that object. In encapsulation, physical or logical structures called “containers” or “wrappers” provide information about the relationships between all data and software application components. Encapsulation aims to overcome the issue of obsolete file formats by including details on how to interpret the original information.

Fixity Check: a method for ensuring the integrity of a file and verifying it has not been altered or corrupted. During transfer, an archive may run a fixity check to ensure a transmitted file has not been altered en route. Within the archive, fixity checking is used to ensure that digital files have not been altered or corrupted. It is most often accomplished by computing checksums such as MD5, SHA1 or SHA256 for a file and comparing them to a stored value.

FTP (File Transfer Protocol): A method for uploading files to and downloading files from Web sites and other computers connected to the Internet. FTP does not allow its users to view file contents, but to simply transfer them efficiently and securely.

Gamma correction:  A process used with video and computer graphics images to correct brightness and internal micro-contrast within the image, allowing a change of ratio between the brightest red component of an image and the weakest red.

GB: A gigabyte (GB or GiB) is a unit of measurement in computers of one thousand million bytes. Because computers work on the binary system, rather than a gigabyte being 103 megabytes (1000 MBs), the term gigabyte can also mean 210 megabytes (1024 MiBs).

Go-tosProgramming instructions that tell a computer to skip from one line of code to another. Go-tos are a typical hallmark of procedural, as opposed to object-oriented, programming.

HTML (HyperText Markup Language): The code used to generate hypertext documents on the World Wide Web through the use of tags and attributes. The “hyper” of the title means that users can jump quickly to other files on the Internet by clicking on linked text or images.

Migrate: Digital preservation strategy that involves transferring data from a format or standard that is in danger of becoming obsolete to a current format or standard.” Other interchangeable (if not precisely synonymous) terms include: converting, copying, refreshing, reformatting, transferring.

Mirroring: Duplicating a file, typically a Web site, in another location so as to distribute access to or safeguard the original work.

Network:  An arrangement of devices such as servers, computers, and printers joined by transmission paths by which programs make requests of one another. Local area networks (LAN), metropolitan area networks (MAN), wide area networks (WAN), and the Internet are all examples of networks.

Open source: A technique for writing software in which original authors make source code freely available for modification and improvement by any programmer who wishes to collaborate on the project. The most well-known example of open source software is the Linux operating system.

Operating system: The base-level software on which applications like word processors or Internet browsers run. Also known as software “platform.” Prominent operating systems include Linux, UNIX, Macintosh, and Windows platforms.

Protocol:  A specified, agreed-upon format that determines how computers send and receive data to and from each other on a network. For example, e-mail obeys one protocol (SMTP) while Web pages obey another (HTTP).

Technology preservation: A digital preservation strategy that involves preserving the complete technical environment, such as software, drivers, operating systems, fonts, passwords, and settings, necessary to facilitate access to archived data as well as its functionality, appearance, and behavior. An alternative approach is emulation.

Unique Identifier (UID): An identification marking, tag, database entry or file name which guarantees that the same identification is not used elsewhere. UID is immensely important in archives, libraries, or anywhere else where an item must be located unambiguously.