Monday, March 2, 2015

Hashing - Summarization at its Extreme

Summarization is often the most boring part of a language course, as it basically requires you to sift through dozens, or maybe hundreds or even thousands, of pages to come up with a concise representation of the whole thing, often constrained by the number of words.

Interestingly, an equivalent concept--hashing--has become the core of modern secure information systems, and an essential element of integrity and hence the mere existence of security.

Hashing is essentially an attempt to summarize an arbitrarily large amount of data into a fixed-length sequence of bits. While human summarization strives to preserve essential information, hashing strives to preserve the uniqueness of data.

The fundamental property of a hash (i.e. result of hashing) is:

For two identical pieces of data, their hashes under a given hashing technique are identical.

The "hashing technique" usually boils down to a hash function. Any function that accepts an arbitrarily long content as input and produces a fixed-length output while adhering to the above rule, can be considered a hash function.

For example, a function f, which adds up literal positions of characters in its input text and takes its modulus over 1000 (so that the output will be finitely bound to the range of integers from 0 to 999), is a simple hash function:

f(a) = 1
f(b) = 2
f(aa) = 1 + 1 = 2
f(ab) = 1 + 2 = 3
f(cryptography) = 3 + 18 + 25 + 16 + 20 + 15 + 7 + 
                  18 + 1 + 16 + 8 + 25 = 172

You may have noticed that aa and b result in the same value over f; this is not a problem because the contract of a hash function never denies the possibility of two different inputs being mapped to the same hash value.

Still, it is highly encouraged for hash functions to produce unique hashes for each unique input, to minimize the risk of false positives (two unequal inputs being interpreted as equal, based on their hash values). Although this is theoretically impossible (an infinite input space cannot be uniquely mapped to a finite output space), advanced hash functions can significantly reduce the chance of false positives.

Fair enough, but how does hashing help security at all?

We can use the fact that a good hash can concisely and almost uniquely represent an arbitrarily long input, in order to generate a small "digest" of a large volume of data (say, a disk image). Now, if we share this file with someone else, the recipient can calculate the hash of the file himself and compare it with the said digest.

If the digests do not match, it's an indication that the data has been modified, either accidentally or maliciously, during transfer; if not, the hashes should have been identical, thanks to the fundamental rule of hashing.

But what if the digests match? Well, we cannot say anything for sure. Maybe some ingenious guy has changed the data such that the hash remains unchanged, or the data is still in its pristine form. The "unlikeliness" of the first situation (and hence the certainty of the second) is what decides the "strength" of the hash function.

Stay tuned for more on hashing!

Sunday, October 5, 2014

readme.txt: Getting Started

No objections, security is a vast subject; however, fortunately, the whole thing is based on a few simple concepts. Understanding this simple facts (and the core concepts, of course!) is vital for properly understanding and appreciating security.

The Building Blocks

The whole mansion is built on the founding concepts of

  • confidentiality,
  • integrity, and
  • availability,

often collectively referred to as CIA.

Confidentiality refers to the simple fact that things should be seen only by those who are supposed to see them. If you are sending a letter to your girlfriend or boyfriend, you won't probably expect his or her parents, or the mailman, or virtually anyone else, to read it.

Confidentiality is different from secrecy, where the mere existence of some fact or datum is not known to anyone other than the intended party. In fact, that's the whole thing about a secret; if you tell me "Hey, I have a secret!", it would no longer be a secret; it would only be a confidential thing (since the actual fact is still known only by you), somewhat different from the popular terminology we use.

Integrity means that something can be modified only by someone who is supposed to do so; in other words, people cannot tamper with stuff that are irrelevant to them. For example, you won't probably want to see someone else altering things written in your personal diary (let alone reading it).

This may lead you to think that integrity is a follow-up of confidentiality. However, this is not always true. For example, think about a top-security letter being sent from one country to another; even if a saboteur is unable to open the letter and see its content, he would still be able to wreck havoc by switching it with a different letter while in transit; hence integrity would be compromised, although the confidentiality of the original message was unaffected.

Availability is all about allowing the intended people to access whatever they are supposed to access, without restrictions; in other words, preventing the unjustifiable withholding of resources. At office, if someone has invaded your desk and won't let you sit there, it can be termed as a breach of availability.

As with confidentiality, availability may be breached without affecting the other two; for example, someone may severe your home's phone line, thereby making phone communication unavailable for you, rather than tap it (breaching confidentiality) or install some scrambling device on it (breaching integrity).

When it comes to security jargon, these facts are defined as:

  • confidentiality: preventing unauthorized access to resources,
  • integrity: preventing unauthorized modification of resources, and
  • availability: preventing the withholding of resources from authorized access.

A plethora of other concepts like reliability, access control, authentication, authorization and non-repudiation spring out from these basics; however, these three, CIA, are the fundamental concepts behind any security concept, mechanism, system, or even breach.