# Protect from Prying Eyes: Encryption in Oracle 10g - Part 1

**Part 1 | Part 2**

## What is Encryption?

Simply put, encryption is the art of disguising data. Suppose the combination to my wall safe is 37529. Because I am not very good at remembering numbers, I keep on forgetting it. So I come up an idea — I write it down on a piece of paper and tape it on the safe! There it is — it’s there when I need it, and I don’t have to worry about forgetting it. What a great idea!

But, there is a slight problem. If a burglar happens to see it, there is no need for him to try to crack the code — it’s all there in plain sight! So, I change the strategy a little bit — I decided to disguise the combination or alter it in such a way that only I know how it was altered. For instance, I may decide to reverse the digits; the number becomes 92573, which is on the piece of paper. The burglar will see the number 92573, which is not the safe combination. He will not be able to guess the correct one since he does not know how it was altered in the first place.

Yet over time, this method is rendered useless. After a few burglars visit my house, they all learn that I simply reverse the digits. Alas! I have to figure out a new encryption scheme. The modified scheme I come up with is to add a number to the actual value — say, 9. So my safe combination 37529 becomes 37529 + 9 = 37538, which goes to the piece of paper. To decipher the true meaning, the burglar has to know to two things:

- The logic used to perform the encryption (i.e., adding a number to the actual value)
- The number that is added (i.e., 9, in this case)

If I suspect that the burglars have somehow learned the first, I can
simply change the number 9 to something else. It’s important to
understand the differences between the means to achieve the encryption
and the value used to scramble original data. The first concept, the
logic used to scramble is known as an ** algorithm**, and the value used to scramble the actual data is known as the encryption

**. We can change one, keep the other constant, and get a different encrypted value. For instance, I can keep the algorithm the same, but change the key to arrive at a different encrypted value.**

*key*In the case of my safe combination, the algorithm is very simple — I merely add the key to the original data to come up with the encrypted value. However, this simplistic algorithm has no place in reality; encryption algorithms are actually far more complex. It’s not the purpose of this article to go into the details of encryption algorithms; that would fall into the realm of academia. As users, we do not need to delve into the details, but to learn the basics to use in practice. Because the roots of this algorithm are common knowledge, it’s not practical to hide the algorithm itself. But by using a secret key, the encrypted value can remain safe — decipherable only by using the right key.

Since the “key” is literally the *key* to decipher the data, my
encrypted value is safe as the key itself. For instance, in this case,
the burglar knows the algorithm (i.e., that I used a single-digit
number to arrive at the encrypted value). But *what* is the
number? Well, he doesn’t have to work hard — there are only ten single
digit numbers — from zero to nine. He has to make up to ten guesses to
get the right number, something he can definitely accomplish in the
short time span. But what if I used a two-digit number? In that case,
the burglar would have to make 100 guesses — a more difficult feat. I
could frustrate the burglar more with a three-digit key, which forces
him to make up to 1,000 guesses. The point is, while keeping the
algorithm the same, encryption can be made more secure by increasing
the number of digits in the key. For real-life encryption, the same
concept holds true — increasing the size of the key will force the
intruder to make more guesses, and make the code more difficult to
crack. In digital encryption, since computers are used to encrypt and
decrypt, making 1000 guesses, even 1 million guesses is trivial and can
be done in matter of minutes; hence, keys must be very long to offer a
real barrier to guessing.

In summary, you need the following for encryption

- An Algorithm
- A Key

## Types of Encryption

In the encryption case we’ve been talking about, note a very important behavior: the key used to decrypt is the *same* as the one used to encrypt. This type of encryption is known as ** symmetric **key
encryption, since the same key is used in both cases. Symmetric key
encryption makes the algorithms easier to implement; however, there is
a fundamental security risk inherent in this model. Since the same key
must be used to decrypt the data that is encrypted, the key must be
protected along with the data. What if the intruder gets the key? He
will be able to decrypt every encrypted value, seriously compromising
the security of the system. In addition to this, another inherent
problem is key storage itself. And if the key is lost, the encrypted
data can never be decrypted.

To address these issues, another kind of encryption exists, in which
two different keys are used to encrypt and decrypt data. Since the keys
are different, this scheme is known as * asymmetric* encryption. In this method, a set of two keys are generated simultaneously. One key, known as the “

*public*key” is given out to the others. This key is used to encrypt the data. While decrypting, the user uses the other key, known as “

*private*key” to decrypt the value. The public key is made well known to the world and is not made a secret. The private key can be stored securely since it is used only during the decryption process. Since the private key is not used during encryption, this not required to be given out to the sender of the data. This obviously increases the security of the key.

However, note that the public and private keys, although not the same are mathematically connected – they have to be. This makes the guessing of the private key a little bit easier for someone who knows the public key. To reduce the risk, the key must be longer, usually 1024 bits, unlike the typical 128-bit key sizes in symmetric encryption schemes. A larger key also means higher computational processing requirements on the server, which translates to longer elapsed time. This makes asymmetric encryption schemes a little undesirable.

## When to Use What Type

For data-in-motion-type encryptions, in which the data is encrypted to be sent across wire to reduce chances of eavesdropping, the asymmetric encryption is usually used, due to two primary reasons. First, the single key use in symmetric key encryption is not practical; and second, the length of time for which the encryption key must be in use is very short, so much smaller encryption keys can be used. Even if the intruder cracks the key, the data is already transmitted, reducing chances of compromise. In case of encryption of data-at-rest, the key is held for a much longer duration, so a longer key is necessary. To balance the need to reduce key length and reduce risk of exposure, asymmetric encryption schemes are used in data-at-rest.

## Block and Streaming Encryption

Now that we have seen how encryption schemes are commonly used, an
interesting question is raised: How does an algorithm work in data
input? For the simple example of the safe combination, I have used an
algorithm that adds a number to the *entire *input data. Note
that I stress the word, “entire,” which means the entire data must be
available to the algorithm before it can be encrypted. In case of
streaming data, such as one flowing between two servers over the
network or from the Web server to the browser, the data is not
available in its entirety. To encrypt the data that is completely
available, the algorithm breaks it into chunks (of typically eight
bytes) and then applies encryption; but streaming data cannot be broken
into chunks. Hence, a different strategy is employed for streaming
data. The other type, in which the entire data is available, is known
as ** block** encryption.

## Encryption Algorithms

Now that you understand how encryption works, let’s focus on one important part of encryption: the *algorithm*.

One of the first algorithms to be used was the **Digital Encryption Standard (DES)**,
which breaks up data into blocks of 64 bits and encrypts them using a
64-bit key. Of the 64 bits, only 56 are used. A smaller key length
makes this a fast encryption approach; however, with modern computers,
an intruder can easily break the 56-bit key.

To address the security risks, a modified form of DES was introduced — **Triple DES**
or DES3. This makes three passes of the input data through the DES
algorithm, so it is reasonably secure for most applications. Another
form of DES3 uses three keys to further secure the encryption.
Three-key DES3 and two-key DES3 use keys of 156 and 112 bits
respectively.

DES3 was considered adequate for most applications for a significant
amount of time, but advancement in computing resources diminished the
safety of the algorithm. Additionally, the compute-intensive algorithm
proved to be too strenuous on the server housing the database. This led
to the development of **Advanced Encryption Standard (AES)**.

AES uses one of the three key lengths — 128, 192, and 256, depending on application. The more the key length, the longer the computation cycle, but also, the lesser the risk of an intruder breaking it.

Another type of encryption is **RC4**, which is a streaming encryption algorithm.

## Padding

Let’s revisit a point that was made in the previous discussion about
block encryption: an algorithm breaks the input data into blocks of
eight bytes (typically) and encrypts one block at a time. What if the
length of the data is *not divisible by eight*? Then the last
block of bytes will be less than eight bytes, but the algorithm will
not be able to operate on this block because it is smaller in size.

To address this, a value must be added to the data to make the length exactly divisible by eight; this process is known as * padding*.
The value is removed when the data is decrypted. In most cases, a
simple padding with a known value such as zero is acceptable; this
process is known as zero-padding. However, padding by zero also makes
the data somewhat prone to discovery by an intruder, since he can now
guess what zero looks like in encrypted format. Therefore, padding by
zero is not considered cryptographically secure. Instead, a value that
is less vulnerable to break-ins is added by a standard known as

*Public Key Cryptography Standard #5*, or PKCS#5.

Of course, if you know that the data is in lengths of multiples of eight, then there is no need to pad.

## Chaining

Continuing along the previous line of discussion, input data is broken into chunks and then encrypted. Since data is separated, wouldn’t it be nice to somehow de-link the encrypted blocks so that even if a single block is cracked, the rest will be intact?

This is when** chaining** comes in: it specifies how these
chunks of data are linked (or de-linked) during encryption. In the most
common format, a block is XOR-ed with the encrypted value of the
previous block. The resultant value is then passed to the encryption
algorithm. The same is repeated for all the blocks. This scheme is
known as

*. Since the first block has no previous block, a value known as an initialization vector (IV) is used to XOR.*

**Cyclic Block Chaining**In another form of chaining, each block is independently encrypted, known as **Electronic Code Book** chaining method. Other methods are **Cipher Feedback** and **Output Feedback**.

## Hashing

So far, we have talked about a concept in which the input data is masked to protect it from prying eyes. However, this does not protect the data from being modified by an intruder. For instance, assume that we are transmitting the salary of the CIO and a disgruntled employee decided the change the salary from 10 million to 10 thousand by chopping off a few zeroes at the end. How does the receiving end know that this data has been altered? Or, how can we know that the data has not been altered?

To address the issue, a concept called **checksum **is used. It
has been used extensively in the past to ensure filesystem integrity,
and it has been in use for some time. The idea is to calculate a
checksum value by applying some algorithm to an input value. Then, upon
receiving the input value, the receiver also applies the same algorithm
and gets a checksum. He compares the checksum calculated with what came
with the data; if they match, the data has not been tampered with. If
they differ, then the data has been compromised. This checksum is often
the hash value of the input value.

Consider the implications of the checksum concept — the input value itself is not masked, because the intention is not to prevent people from seeing it. The intent is to prevent people from modifying it. Of course, data can be encrypted as well; but the purpose of hashing is not to prevent exposure.

The second important concept to remember is that this hash value is calculated from the input value, but the input value *cannot *be
calculated from the hash value. This is important to understand as it
has use in other applications. For instance, password management
functions can store hash values in tables. When a user wants to
validate passwords, the function can hash the user’s entry and compare
the hash to the hash, instead of comparing plain text. This way, the
actual password is never displayed and is never compromised. Oracle
password management functions the same way — it uses a one-way hash
value in the password column of the USER$ table.

## MAC

Hashing provides an important function — validating the integrity of a piece of data. However, consider this scenario: An intruder might come to know the hashing algorithm used and then use it to calculate the hash value of the input data after changing it. The receiver gets the tampered input data as well as the changed hash value. Since the values match, the receiver will not be able to determine that the data has been compromised.

To prevent this situation, the concept known as **Message Authentication Code** (MAC) was developed, which uses a “key” to calculate the hash value (known as **MAC**
value in this instance). The same key is used by the receiver in
calculating MAC value. Since key is not likely to be known by the
intruder, a changed MAC value will be different and data compromise
will be unearthed.

## Implementing Encryption

With the conceptual foundation in place, let’s delve into the actual task of implementing an encryption system.

In order to implement an encryption system, you do not have to
reinvent the wheel and create the algorithms; they are already
available in supplied packages. Oracle provides a package called **dbms_obfuscation_toolkit** (DOTK) that contains several procedures and functions to allow you to achieve encryption. In Oracle 10*g*, a new package named **dbms_crypto** offers similar but improved functionality. Even though the older package is still available in Oracle 10*g*, it’s recommended that the dbms_crypto be used for all encryption related matters for the following reasons:

- Crypto provided AES encryption methods; DOTK does not. AES is now the
*de facto*standard in many circumstances and is preferred, because of its increased security and speed. - Crypto provides automatic padding. In DOTK, you have to provide padding of data smaller than the block length.
- Crypto provides PKCS#5 padding, which is cryptographically more secure than user-supplied padding (such as padding with zeroes).

And of course, DOTK may be deprecated soon.

Please bear in mind that Crypto (and DOTK, as well) offer only *symmetric*
encryption only (i.e., the same key is used to encrypt and decrypt).
Asymmetric encryption, in which a public-private key pair is used, is
not available in either package. To implement such functionality, you
will need to use Oracle Advanced Security, an extra-cost option for the
database. For most common applications, symmetric key is adequate. We
will focus on Crypto for the remainder of this article; OAS will be
covered in a future article.

## Encryption

From our earlier discussion on defining encryption, you know that to encrypt data, you need the following:

- An encryption algorithm
- An encryption key
- A padding method
- A chaining method

What you choose to use for the key depends on your choice of algorithm. Once you have identified the algorithm to use (say, AES128) you have to use a key that is 128 bits long. Crypto provides constants in the package to indicate different types of algorithms. Table 1 shows these constants with the algorithms and the key lengths they need.

Constant Name |
Description |
Effective Key Length |

ENCRYPT_DES |
Data Encryption Standard. Faster but not secure. | 56 |

ENCRYPT_3DES_2KEY |
Modified Triple Data Encryption Standard. Operates on a block 3 times with 2 keys. | 112 |

ENCRYPT_3DES |
Triple Data Encryption Standard. Operates on a block 3 times but with one key. | 156 |

ENCRYPT_AES128 |
Advanced Encryption Standard | 128 |

ENCRYPT_AES192 |
Advanced Encryption Standard | 192 |

ENCRYPT_AES256 |
Advanced Encryption Standard. | 256 |

ENCRYPT_RC4 |
This is the only stream cipher. |

*Table 1: Encryption algorithms.*

Once you choose your algorithm, you have to decide on what key to use. For example, if you have chosen AES128 as the algorithm, you have to choose a 128-bit key, and this key has to be of a RAW data type.

Suppose we decide to use the AES scheme with 128-bit keys along with CBC chaining and PKCS#5 padding. The following piece of code performs the encryption of a data value “ConfidentialData” with the key as “1234567890123456.”

1 declare

2 l_key varchar2(2000) := '1234567890123456';

3 l_in_val varchar2(2000) := 'ConfidentialData';

4 l_mod number := dbms_crypto.ENCRYPT_AES128

5 + dbms_crypto.CHAIN_CBC

6 + dbms_crypto.PAD_PKCS5;

7 l_enc raw (2000);

8 begin

9 l_enc := dbms_crypto.encrypt

10 (

11 UTL_I18N.STRING_TO_RAW (l_in_val, 'AL32UTF8'),

12 l_mod,

13 UTL_I18N.STRING_TO_RAW (l_key, 'AL32UTF8')

14 );

15 dbms_output.put_line ('Encrypted='||l_enc);

16* end;

SQL> /

Encrypted=C0777257DFBF8BA9A4C1F724F921C43C70D0C0A94E2950BBB6BA2FE78695A6FC

PL/SQL procedure successfully completed.

SQL>

Let’s analyze the above piece of code line by line.

Line |
Description |

2 | The key is defined here. As you can see the key is exactly 16 characters, which it must be for AES to work. |

3 | The input value, which needs to encrypted. |

4 — 6 | These lines need some more explanation; hence it has been provided at the end of this table. |

7 | We define a variable to hold the encrypted value. |

9 | The function encrypt is called. |

11 | The function expects the input value to be in RAW, not varchar2 as is the reality. In this line, we have converted it to raw using the function strings_to_raw in the supplied package utl_i18n. |

13 | As with the input value, the function also expects the key to be RAW as well. Hence we convert it here. |

15 | Finally, we display the encrypted value. However, in reality you will not display the value as it is meaningless; you will probably use it for something else, such as store it. |

The encrypted value, also in RAW, is displayed as a hexadecimal string. This is the basic workings of the encrypt function. Now, let’s take a look at the lines we omitted from the previous discussion.

Remember, to encrypt you need the following components:

- An input value
- A key
- An algorithm
- A padding scheme
- A chaining scheme

In the previously cited piece of code, the input value and the key were provided in lines three and two respectively — but where are the rest of the necessary components?

The rest are all crowded into lines four through six. These options are specified in the package dbms_crypto as named constants. For instance, the AES 128-bit algorithm is specified by the constant ENCRYPT_AES128. Table 1 lists these encryption constants. Similarly, the CBC chaining method is specified by CHAIN_CBC and the PKCS#5 padding is specified by PAD_PKCS5. All these three constants have been specified as having been added together in the input parameter MOD in the function encrypt(). This is exactly how dbms_crypto expects to get the specifics of algorithm — padding and chaining. The combination of these three is known as a modifier to the function.

The following is a list of padding modifiers available:

Constant |
Description |

PAD_NONE |
No padding is done. If length of the data is exactly same as the block size or a multiple thereof, this option is used. |

PAD_ZERO |
The data is padded with zeroes to make it a multiple of block size. |

PAD_PKCS5 |
PKCS#5 based padding is done. This is the safest and the recommended approach. |

The following options are available for chaining:

Constant |
Description |

CHAIN_ECB | Electronic Code Book standard |

CHAIN_CBC | Cyclic Buffer Chaining method |

CHAIN_CFB | Cypher Feedback method |

CHAIN_OFB | Output Feedback |

If you wanted to use no padding and Electronic Code Book chaining, you could have called the ENCRYPT function as:

4 l_mod number := dbms_crypto.ENCRYPT_AES128

5 + dbms_crypto.CHAIN_ECB

6 + dbms_crypto.PAD_NONE;

Similarly you can specify any combination of algorithm, padding and chaining to suit your needs.

*Coming next in part*** 2:** Decryption, key management, key storage, and more.

--

**Arup Nanda** is the
Lead DBA at Starwood Hotels & Resorts. He has been an Oracle DBA
for more than 11 years, touching all aspects of database management —
from modeling to performance tuning to disaster recovery. He is the
co-author of the book *Oracle Privacy Security Auditing* (2003, Rampant Tech Press).

Contributors : Arup Nanda

Last modified 2006-01-06 02:03 PM