Tuesday, July 30, 2013

Hexadecimal, Base64 and Binary Encoding in .NET

Many times a .NET developer encounters a scenario where they would like to use specialized encoding for data, such as Base-64 encoding, Hexadecimal (Base-16 or "Hex") encoding, or Binary (Base-2) encoding. The .NET framework has some built in ways to make this possible, but these are not treated as true "encodings" and some are very bad for performance. For instance, the .NET framework contains a method that is part of the System.Convert class for converting to and from base-64 but it is not provided in the form of a true "encoding". As for hex encoding and binary (base-2) encodings, this is done through special formatting strings included in a "ToString()" call, which are notoriously poor performance. While the performance hit may not matter for some applications, it undoubtedly will be noticed in larger, resource-intensive applications.

Wouldn't it be nice to have an actual Encoding class to do this for you? Many consider the though but then shy away when it comes time to create an efficient and accurate algorithm to be used by the encoding. Well let me put your mind at ease and save you some tylenol. I'll show you how to do this properly in a reusable set of classes.

I'm not going to spend much time explaining exactly how the algorithms work as it would be very time consuming to do so, but I assure you they work very fast and very accurately. Custom base-64 encoding has notoriously been a daunting task as it is hard for most to develop an algorithm that properly validates base-64 formatting. When developing these algorithms, I studied the .NET frameworks source code to understand the way Microsoft handles these encoding routines internally for best performance and accuracy. It is sad that they didn't implement public Encoding-derived classes but the following code will give you just that.

To get started, we are going to make each of these Encoding classes derive from the abstract System.Text.Encoding base class.

First, let's create the Base64 encoding class as follows:

using System;
using System.Diagnostics;
using System.Globalization;
using System.Security;

namespace System.Text
{
    public sealed class Base64Encoding : Encoding
    {
        const int MAX_CHAR_COUNT = int.MaxValue / 4 * 3 - 2;

        static byte[] char2val = new byte[128]
        {
            0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF,
            0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF,
            0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 
            0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF,
            0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 
            0xFF, 0xFF, 0xFF,   62, 0xFF, 0xFF, 0xFF,   63,
            52, 53, 54, 55, 56, 57, 58, 59,   
            60, 61, 0xFF, 0xFF, 0xFF,   64, 0xFF, 0xFF,
            0xFF, 0, 1, 2, 3, 4, 5, 6,    
            7, 8, 9, 10, 11, 12, 13, 14,
            15, 16, 17, 18, 19, 20, 21, 22,   
            23, 24, 25, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF,
            0xFF, 26, 27, 28, 29, 30, 31, 32,   
            33, 34, 35, 36, 37, 38, 39, 40,
            41, 42, 43, 44, 45, 46, 47, 48,   
            49, 50, 51, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF,
        };

        static string val2char = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";

        static byte[] val2byte = new byte[]
        {
            (byte)'A',(byte)'B',(byte)'C',(byte)'D',(byte)'E',(byte)'F',(byte)'G',(byte)'H',
            (byte)'I',(byte)'J',(byte)'K',(byte)'L',(byte)'M',(byte)'N',(byte)'O',(byte)'P',
            (byte)'Q',(byte)'R',(byte)'S',(byte)'T',(byte)'U',(byte)'V',(byte)'W',(byte)'X',
            (byte)'Y',(byte)'Z',(byte)'a',(byte)'b',(byte)'c',(byte)'d',(byte)'e',(byte)'f',
            (byte)'g',(byte)'h',(byte)'i',(byte)'j',(byte)'k',(byte)'l',(byte)'m',(byte)'n',
            (byte)'o',(byte)'p',(byte)'q',(byte)'r',(byte)'s',(byte)'t',(byte)'u',(byte)'v',
            (byte)'w',(byte)'x',(byte)'y',(byte)'z',(byte)'0',(byte)'1',(byte)'2',(byte)'3',
            (byte)'4',(byte)'5',(byte)'6',(byte)'7',(byte)'8',(byte)'9',(byte)'+',(byte)'/'
        };


        [SecuritySafeCritical]
        unsafe public override int GetByteCount(char[] chars, int index, int count)
        {
            if (chars == null)
            {
                throw new ArgumentNullException("chars");
            }
            if (index < 0)
            {
                throw new ArgumentOutOfRangeException("index", "Value must be non-negative");
            }
            if (index > chars.Length)
            {
                throw new ArgumentOutOfRangeException("index", "Offset exceeds the buffer size..");
            }
            if (count < 0)
            {
                throw new ArgumentOutOfRangeException("count", "Value must benon-negative.");
            }
            if (count > chars.Length - index)
            {
                throw new ArgumentOutOfRangeException("count", "Value exceeds the remaining buffer size.");
            }
            if (count == 0)
            {
                return 0;
            }
            if ((count % 4) != 0)
            {
                throw new FormatException("Invalid Base64 length.");
            }
            fixed (byte* _char2val = char2val)
            {
                fixed (char* _chars = &chars[index])
                {
                    int totalCount = 0;
                    char* pch = _chars;
                    char* pchMax = _chars + count;
                    while (pch < pchMax)
                    {
                        Debug.Assert(pch + 4 <= pchMax, "");
                        char pch0 = pch[0];
                        char pch1 = pch[1];
                        char pch2 = pch[2];
                        char pch3 = pch[3];

                        if ((pch0 | pch1 | pch2 | pch3) >= 128)
                        {
                            throw new FormatException("Invalid Base64 sequence.");
                        }

                        int v1 = _char2val[pch0];
                        int v2 = _char2val[pch1];
                        int v3 = _char2val[pch2];
                        int v4 = _char2val[pch3];

                        if (!IsValidLeadBytes(v1, v2, v3, v4) || !IsValidTailBytes(v3, v4))
                        {
                            throw new FormatException("Invalid Base64 sequence.");
                        }

                        int byteCount = (v4 != 64 ? 3 : (v3 != 64 ? 2 : 1));
                        totalCount += byteCount;
                        pch += 4;
                    }
                    return totalCount;
                }
            }
        }

        [SecuritySafeCritical]
        unsafe public override int GetBytes(char[] chars, int charIndex, int charCount, byte[] bytes, int byteIndex)
        {
            if (chars == null)
            {
                throw new ArgumentNullException("chars");
            }
            if (charIndex < 0)
            {
                throw new ArgumentOutOfRangeException("charIndex", "Value must non-negative.");
            }
            if (charIndex > chars.Length)
            {
                throw new ArgumentOutOfRangeException("charIndex", "Offset exceeds buffer size.");
            }
            if (charCount < 0)
            {
                throw new ArgumentOutOfRangeException("charCount", "Value must be non-negative.");
            }
            if (charCount > chars.Length - charIndex)
            {
                throw new ArgumentOutOfRangeException("charCount", "Size exceeds remaining buffer size.");
            }
            if (bytes == null)
            {
                throw new ArgumentNullException("bytes");
            }
            if (byteIndex < 0)
            {
                throw new ArgumentOutOfRangeException("byteIndex", "Value must be non-negative");
            }
            if (byteIndex > bytes.Length)
            {
                throw new ArgumentOutOfRangeException("byteIndex", "Offset exceeds buffer size.");
            }
            if (charCount == 0)
            {
                return 0;
            }
            if ((charCount % 4) != 0)
            {
                throw new FormatException("Invalid Base64 length.");
            }
            fixed (byte* _char2val = char2val)
            {
                fixed (char* _chars = &chars[charIndex])
                {
                    fixed (byte* _bytes = &bytes[byteIndex])
                    {
                        char* pch = _chars;
                        char* pchMax = _chars + charCount;
                        byte* pb = _bytes;
                        byte* pbMax = _bytes + bytes.Length - byteIndex;
                        while (pch < pchMax)
                        {
                            Debug.Assert(pch + 4 <= pchMax, "");
                            char pch0 = pch[0];
                            char pch1 = pch[1];
                            char pch2 = pch[2];
                            char pch3 = pch[3];

                            if ((pch0 | pch1 | pch2 | pch3) >= 128)
                            {
                                throw new FormatException("Invalid Base64 sequence.");
                            }

                            int v1 = _char2val[pch0];
                            int v2 = _char2val[pch1];
                            int v3 = _char2val[pch2];
                            int v4 = _char2val[pch3];

                            if (!IsValidLeadBytes(v1, v2, v3, v4) || !IsValidTailBytes(v3, v4))
                            {
                                throw new FormatException("Invalid Base64 sequence.");
                            }

                            int byteCount = (v4 != 64 ? 3 : (v3 != 64 ? 2 : 1));
                            if (pb + byteCount > pbMax)
                            {
                                throw new ArgumentException("bytes", "Array is too small.");
                            }

                            pb[0] = (byte)((v1 << 2) | ((v2 >> 4) & 0x03));
                            if (byteCount > 1)
                            {
                                pb[1] = (byte)((v2 << 4) | ((v3 >> 2) & 0x0F));
                                if (byteCount > 2)
                                {
                                    pb[2] = (byte)((v3 << 6) | ((v4 >> 0) & 0x3F));
                                }
                            }
                            pb += byteCount;
                            pch += 4;
                        }
                        return (int)(pb - _bytes);
                    }
                }
            }
        }

        [SecuritySafeCritical]
        unsafe public int GetBytes(byte[] chars, int charIndex, int charCount, byte[] bytes, int byteIndex)
        {
            if (chars == null)
                throw new ArgumentNullException("chars");
            if (charIndex < 0)
                throw new ArgumentOutOfRangeException("charIndex", "Value must be non-negative.");
            if (charIndex > chars.Length)
                throw new ArgumentOutOfRangeException("charIndex", "Offset exceeds buffer size.");

            if (charCount < 0)
                throw new ArgumentOutOfRangeException("charCount", "Value must be non-negative.");
            if (charCount > chars.Length - charIndex)
                throw new ArgumentOutOfRangeException("charCount", "Size exceeds remaining buffer space.");

            if (bytes == null)
                throw new ArgumentNullException("bytes");
            if (byteIndex < 0)
                throw new ArgumentOutOfRangeException("byteIndex", "Value must be non-negative.");
            if (byteIndex > bytes.Length)
                throw new ArgumentOutOfRangeException("byteIndex", "Offset exceeds buffer size.");

            if (charCount == 0)
                return 0;
            if ((charCount % 4) != 0)
                throw new FormatException("Invalid Base64 length.");
            fixed (byte* _char2val = char2val)
            {
                fixed (byte* _chars = &chars[charIndex])
                {
                    fixed (byte* _bytes = &bytes[byteIndex])
                    {
                        byte* pch = _chars;
                        byte* pchMax = _chars + charCount;
                        byte* pb = _bytes;
                        byte* pbMax = _bytes + bytes.Length - byteIndex;
                        while (pch < pchMax)
                        {
                            Debug.Assert(pch + 4 <= pchMax, "");
                            byte pch0 = pch[0];
                            byte pch1 = pch[1];
                            byte pch2 = pch[2];
                            byte pch3 = pch[3];
                            if ((pch0 | pch1 | pch2 | pch3) >= 128)
                                throw new FormatException("Invalid Base64 sequence.");

                            int v1 = _char2val[pch0];
                            int v2 = _char2val[pch1];
                            int v3 = _char2val[pch2];
                            int v4 = _char2val[pch3];

                            if (!IsValidLeadBytes(v1, v2, v3, v4) || !IsValidTailBytes(v3, v4))
                                throw new FormatException("Invalid Base64 sequence.");

                            int byteCount = (v4 != 64 ? 3 : (v3 != 64 ? 2 : 1));
                            if (pb + byteCount > pbMax)
                                throw new ArgumentException("bytes", "Array size too small.");

                            pb[0] = (byte)((v1 << 2) | ((v2 >> 4) & 0x03));
                            if (byteCount > 1)
                            {
                                pb[1] = (byte)((v2 << 4) | ((v3 >> 2) & 0x0F));
                                if (byteCount > 2)
                                {
                                    pb[2] = (byte)((v3 << 6) | ((v4 >> 0) & 0x3F));
                                }
                            }
                            pb += byteCount;
                            pch += 4;
                        }
                        return (int)(pb - _bytes);
                    }
                }
            }
        }

        public override int GetCharCount(byte[] bytes, int index, int count)
        {
            return GetMaxCharCount(count);
        }

        [SecuritySafeCritical]
        unsafe public int GetChars(byte[] bytes, int byteIndex, int byteCount, byte[] chars, int charIndex)
        {
            if (bytes == null)
                throw new ArgumentNullException("bytes");
            if (byteIndex < 0)
                throw new ArgumentOutOfRangeException("byteIndex", "Value must be non-negative");
            if (byteIndex > bytes.Length)
                throw new ArgumentOutOfRangeException("byteIndex", "Offset exceeds buffer size.");
            if (byteCount < 0)
                throw new ArgumentOutOfRangeException("byteCount", "Value must be non-negative.");
            if (byteCount > bytes.Length - byteIndex)
                throw new ArgumentOutOfRangeException("byteCount", "Exceeds remaining buffer size.");
 
            int charCount = GetCharCount(bytes, byteIndex, byteCount);
            if (chars == null)
                throw new ArgumentNullException("chars");
            if (charIndex < 0)
                throw new ArgumentOutOfRangeException("charIndex", "Value must be non-negative.");
            if (charIndex > chars.Length)
                throw new ArgumentOutOfRangeException("charIndex", "Offset exceeds buffer size.");
  
            if (charCount < 0 || charCount > chars.Length - charIndex)
                throw new ArgumentException("chars", "Array is to small.");
 
            // We've computed exactly how many chars there are and verified that
            // there's enough space in the char buffer, so we can proceed without
            // checking the charCount.
 
            if (byteCount > 0)
            {
                fixed (byte* _val2byte = val2byte)
                {
                    fixed (byte* _bytes = &bytes[byteIndex])
                    {
                        fixed (byte* _chars = &chars[charIndex])
                        {
                            byte* pb = _bytes;
                            byte* pbMax = pb + byteCount - 3;
                            byte* pch = _chars;
 
                            // Convert chunks of 3 bytes to 4 chars
                            while (pb <= pbMax)
                            {
                                pch[0] = _val2byte[(pb[0] >> 2)];
                                pch[1] = _val2byte[((pb[0] & 0x03) << 4) | (pb[1] >> 4)];
                                pch[2] = _val2byte[((pb[1] & 0x0F) << 2) | (pb[2] >> 6)];
                                pch[3] = _val2byte[pb[2] & 0x3F];
 
                                pb += 3;
                                pch += 4;
                            }
  
                            // Handle 1 or 2 trailing bytes
                            if (pb - pbMax == 2)
                            {
                                // 1 trailing byte
                                pch[0] = _val2byte[(pb[0] >> 2)];
                                pch[1] = _val2byte[((pb[0] & 0x03) << 4)];
                                pch[2] = (byte)'=';
                                pch[3] = (byte)'=';
                            }
                            else if (pb - pbMax == 1)
                            {
                                // 2 trailing bytes
                                pch[0] = _val2byte[(pb[0] >> 2)];
                                pch[1] = _val2byte[((pb[0] & 0x03) << 4) | (pb[1] >> 4)];
                                pch[2] = _val2byte[((pb[1] & 0x0F) << 2)];
                                pch[3] = (byte)'=';
                            }
                            else
                            {
                                // 0 trailing bytes
                                Debug.Assert(pb - pbMax == 3, "");
                            }
                        }
                    }
                }
            }
 
            return charCount;
        }

        [SecuritySafeCritical]
        unsafe public override int GetChars(byte[] bytes, int byteIndex, int byteCount, char[] chars, int charIndex)
        {
            if (bytes == null)
                throw new ArgumentNullException("bytes");
            if (byteIndex < 0)
                throw new ArgumentOutOfRangeException("byteIndex", "Value must be non-negative.");
            if (byteIndex > bytes.Length)
                throw new ArgumentOutOfRangeException("byteIndex", "Offset exceeds buffer size.");
            if (byteCount < 0)
                throw new ArgumentOutOfRangeException("byteCount", "Value must be non-negative.");
            if (byteCount > bytes.Length - byteIndex)
                throw new ArgumentOutOfRangeException("byteCount", "Size exceeds remaining buffer size.");

            int charCount = GetCharCount(bytes, byteIndex, byteCount);
            if (chars == null)
                throw new ArgumentNullException("chars");
            if (charIndex < 0)
                throw new ArgumentOutOfRangeException("charIndex", "Value must be non-negative.");
            if (charIndex > chars.Length)
                throw new ArgumentOutOfRangeException("charIndex", "Offset exceeds buffer size.");
            if (charCount < 0 || charCount > chars.Length - charIndex)
                throw new ArgumentException("chars", "Array is too small.");

            // We've computed exactly how many chars there are and verified that
            // there's enough space in the char buffer, so we can proceed without
            // checking the charCount.

            if (byteCount > 0)
            {
                fixed (char* _val2char = val2char)
                {
                    fixed (byte* _bytes = &bytes[byteIndex])
                    {
                        fixed (char* _chars = &chars[charIndex])
                        {
                            byte* pb = _bytes;
                            byte* pbMax = pb + byteCount - 3;
                            char* pch = _chars;

                            // Convert chunks of 3 bytes to 4 chars
                            while (pb <= pbMax)
                            {
                                pch[0] = _val2char[(pb[0] >> 2)];
                                pch[1] = _val2char[((pb[0] & 0x03) << 4) | (pb[1] >> 4)];
                                pch[2] = _val2char[((pb[1] & 0x0F) << 2) | (pb[2] >> 6)];
                                pch[3] = _val2char[pb[2] & 0x3F];

                                pb += 3;
                                pch += 4;
                            }

                            // Handle 1 or 2 trailing bytes
                            if (pb - pbMax == 2)
                            {
                                // 1 trailing byte
                                pch[0] = _val2char[(pb[0] >> 2)];
                                pch[1] = _val2char[((pb[0] & 0x03) << 4)];
                                pch[2] = '=';
                                pch[3] = '=';
                            }
                            else if (pb - pbMax == 1)
                            {
                                // 2 trailing bytes
                                pch[0] = _val2char[(pb[0] >> 2)];
                                pch[1] = _val2char[((pb[0] & 0x03) << 4) | (pb[1] >> 4)];
                                pch[2] = _val2char[((pb[1] & 0x0F) << 2)];
                                pch[3] = '=';
                            }
                            else
                            {
                                // 0 trailing bytes
                                Debug.Assert(pb - pbMax == 3, "");
                            }
                        }
                    }
                }
            }

            return charCount;
        }
        
        public override int GetMaxByteCount(int charCount)
        {
            if (charCount < 0)
            {
                throw new ArgumentOutOfRangeException("charCount", "Value must be non-negative.");
            }
            if ((charCount % 4) != 0)
            {
                throw new FormatException("Invalid Base64 length.");
            }
            return charCount / 4 * 3;
        }

        public override int GetMaxCharCount(int byteCount)
        {
            if (byteCount < 0 || byteCount > MAX_CHAR_COUNT)
            {
                throw new ArgumentOutOfRangeException("byteCount", 
                    string.Format("Value must be within the range of 0 and {0}", MAX_CHAR_COUNT));
            }
            return ((byteCount + 2) / 3) * 4;
        }

        private bool IsValidLeadBytes(int v1, int v2, int v3, int v4)
        {
            return ((v1 | v2) < 64) && ((v3 | v4) != 0xFF);
        }

        private bool IsValidTailBytes(int v3, int v4)
        {
            return !(v3 == 64 && v4 != 64);
        }

    }
}

I apologize for the length but as you can see, there is a lot going on here. The algorithm make's heavy use of bit-shifting and incorporates static look-up tables for the best performance possible. We are also making heavy use of pointers here. This is done to avoid any unnecessary reference calls, reducing the overhead cost and dramatically increasing performance. When we are using our pointers, we are also using the "fixed" keyword to make sure that the references the pointers point to do not get moved in memory while we are using them. This is a very important point. If we didn't "fix" our references in memory, we would have no way of being sure that the runtime hasn't moved our references to another memory address and could result in memory leaks and unexpected results. We override the important methods from the abstract base class that will be used by the "plumbing" to allow us to use calls to GetBytes(...) and GetString(...) calls when using this class. You may have also noticed the use of the [SecuritySafeCritical] attribute. This is used to let the compiler know that although there is code being used in an "unsafe" context (via pointers), that it should not give warnings when used in code that hasn't specified to allow unsafe code.

Next we'll create the Binary encoding class as follows:

using System;
using System.Diagnostics;
using System.Globalization;
using System.Security;

namespace System.Text
{
    public sealed class BinaryEncoding : Encoding
    {
        const int MAX_CHAR_COUNT = int.MaxValue / 8;

        static string val2char = "01";

        public override int GetByteCount(char[] chars, int index, int count)
        {
            return GetMaxByteCount(count);
        }

        [SecuritySafeCritical]
        unsafe public override int GetBytes(char[] chars, int charIndex, int charCount, byte[] bytes, int byteIndex)
        {
            if (chars == null)
                throw new ArgumentNullException("chars");
            if (charIndex < 0)
                throw new ArgumentOutOfRangeException("charIndex", "Value must be non-negative.");
            if (charIndex > chars.Length)
                throw new ArgumentOutOfRangeException("charIndex", "Offset exceeds buffer size.");
            if (charCount < 0)
                throw new ArgumentOutOfRangeException("charCount", "Value must be non-negative.");
            if (charCount > chars.Length - charIndex)
                throw new ArgumentOutOfRangeException("charCount", "Size exceeds remaining buffer space.");
            if (bytes == null)
                throw new ArgumentNullException("bytes");
            if (byteIndex < 0)
                throw new ArgumentOutOfRangeException("byteIndex", "Value must be non-negative.");
            if (byteIndex > bytes.Length)
                throw new ArgumentOutOfRangeException("byteIndex", "Offset exceeds buffer size.");
            int byteCount = GetByteCount(chars, charIndex, charCount);
            if (byteCount < 0 || byteCount > bytes.Length - byteIndex)
                throw new ArgumentException("bytes", "Array size is too small.");
            if (charCount > 0)
            {
                Array.Reverse(chars);
                fixed (char* _val2char = val2char)
                {
                    fixed (byte* _bytes = &bytes[byteIndex])
                    {
                        fixed (char* _chars = &chars[charIndex])
                        {
                            char* pch = _chars;
                            char* pchMax = _chars + charCount;
                            byte* pb = _bytes;
                            while (pch < pchMax)
                            {
                                Debug.Assert(pch + 8 <= pchMax, "");
                                if (pch[0] == _val2char[1]) pb[0] |= (1 << 0);
                                if (pch[1] == _val2char[1]) pb[0] |= (1 << 1);
                                if (pch[2] == _val2char[1]) pb[0] |= (1 << 2);
                                if (pch[3] == _val2char[1]) pb[0] |= (1 << 3);
                                if (pch[4] == _val2char[1]) pb[0] |= (1 << 4);
                                if (pch[5] == _val2char[1]) pb[0] |= (1 << 5);
                                if (pch[6] == _val2char[1]) pb[0] |= (1 << 6);
                                if (pch[7] == _val2char[1]) pb[0] |= (1 << 7);
                                pb++;
                                pch += 8;
                            }
                        }
                    }
                }
            }
            return byteCount;
        }

        public override int GetCharCount(byte[] bytes, int index, int count)
        {
            return GetMaxCharCount(count);
        }

        [SecuritySafeCritical]
        unsafe public override int GetChars(byte[] bytes, int byteIndex, int byteCount, char[] chars, int charIndex)
        {
            if (bytes == null)
                throw new ArgumentNullException("bytes");
            if (byteIndex < 0)
                throw new ArgumentOutOfRangeException("byteIndex", "Value must be non-negative.");
            if (byteIndex > bytes.Length)
                throw new ArgumentOutOfRangeException("byteIndex", "Offset exceeds buffer size.");
            if (byteCount < 0)
                throw new ArgumentOutOfRangeException("byteCount", "Value must be non-negative.");
            if (byteCount > bytes.Length - byteIndex)
                throw new ArgumentOutOfRangeException("byteCount", "Size exceeds remaining buffer space.");
            int charCount = GetCharCount(bytes, byteIndex, byteCount);
            if (chars == null)
                throw new ArgumentNullException("chars");
            if (charIndex < 0)
                throw new ArgumentOutOfRangeException("charIndex", "Value must be non-negative.");
            if (charIndex > chars.Length)
                throw new ArgumentOutOfRangeException("charIndex", "Offset exceeds buffer size.");
            if (charCount < 0 || charCount > chars.Length - charIndex)
                throw new ArgumentException("chars", "Array size is too small.");
            if (byteCount > 0)
            {
                Array.Reverse(chars);
                fixed (char* _val2char = val2char)
                {
                    fixed (byte* _bytes = &bytes[byteIndex])
                    {
                        fixed (char* _chars = &chars[charIndex])
                        {
                            char* pch = _chars;
                            byte* pb = _bytes;
                            byte* pbMax = _bytes + byteCount;
                            while (pb < pbMax)
                            {
                                pch[0] = ((pb[0] & 1) > 0) ? _val2char[1] : _val2char[0];
                                pch[1] = ((pb[0] & 2) > 0) ? _val2char[1] : _val2char[0];
                                pch[2] = ((pb[0] & 4) > 0) ? _val2char[1] : _val2char[0];
                                pch[3] = ((pb[0] & 8) > 0) ? _val2char[1] : _val2char[0];
                                pch[4] = ((pb[0] & 16) > 0) ? _val2char[1] : _val2char[0];
                                pch[5] = ((pb[0] & 32) > 0) ? _val2char[1] : _val2char[0];
                                pch[6] = ((pb[0] & 64) > 0) ? _val2char[1] : _val2char[0];
                                pch[7] = ((pb[0] & 128) > 0) ? _val2char[1] : _val2char[0];
                                pb++;
                                pch += 8;
                            }
                        }
                    }
                }
            }
            Array.Reverse(chars);
            return charCount;
        }

        public override int GetMaxByteCount(int charCount)
        {
            if (charCount < 0)
            {
                throw new ArgumentOutOfRangeException("charCount", "Value must be non-negative.");
            }
            if ((charCount % 8) != 0)
            {
                throw new FormatException("Invalid binary length.");
            }
            return charCount / 8;
        }

        public override int GetMaxCharCount(int byteCount)
        {
            if (byteCount < 0 || byteCount > int.MaxValue / 8)
            {
                throw new ArgumentOutOfRangeException("byteCount", 
                    string.Format("Value must be within range of 0 and {0}.", MAX_CHAR_COUNT));
            }
            return byteCount * 8;
        }
    }
}

A little less going on in the Binary encoding class due to few possible character values (only 1's and 0's).

Finally, lets create the Hexadecimal encoding class as follows:

using System;
using System.Diagnostics;
using System.Globalization;
using System.Security;

namespace System.Text
{
    public sealed class HexEncoding : Encoding
    {
        const int MAX_CHAR_COUNT = int.MaxValue / 2;

        static byte[] char2val = new byte[128]
        {
            0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF,
            0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF,
            0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF,
            0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, 0x08, 0x09, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF,
            0xFF, 0x0A, 0x0B, 0x0C, 0x0D, 0x0E, 0x0F, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF,
            0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF,
            0xFF, 0x0A, 0x0B, 0x0C, 0x0D, 0x0E, 0x0F, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF,
            0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF,
        };

        static string val2char = "0123456789ABCDEF";


        static HexEncoding()
        {
            for (char ch = '0'; ch <= '9'; ch++)
                Debug.Assert(char2val[ch] == ch - '0', "");

            for (char ch = 'A'; ch <= 'F'; ch++)
                Debug.Assert(char2val[ch] == ch - 'A' + 10, "");

            for (char ch = 'a'; ch <= 'f'; ch++)
                Debug.Assert(char2val[ch] == ch - 'a' + 10, "");
        }


        public override int GetByteCount(char[] chars, int index, int count)
        {
            return GetMaxByteCount(count);
        }

        [SecuritySafeCritical]
        unsafe public override int GetBytes(char[] chars, int charIndex, int charCount, byte[] bytes, int byteIndex)
        {
            if (chars == null)
                throw new ArgumentNullException("chars");
            if (charIndex < 0)
                throw new ArgumentOutOfRangeException("charIndex", "Value must be non-negative.");
            if (charIndex > chars.Length)
                throw new ArgumentOutOfRangeException("charIndex", "Offset exceeds buffer size.");
            if (charCount < 0)
                throw new ArgumentOutOfRangeException("charCount", "Value must be non-negative.");
            if (charCount > chars.Length - charIndex)
                throw new ArgumentOutOfRangeException("charCount", "Size exceeds remaining buffer space.");
            if (bytes == null)
                throw new ArgumentNullException("bytes");
            if (byteIndex < 0)
                throw new ArgumentOutOfRangeException("byteIndex", "Value must be non-negative.");
            if (byteIndex > bytes.Length)
                throw new ArgumentOutOfRangeException("byteIndex", "Offset exceeds buffer size.");
            int byteCount = GetByteCount(chars, charIndex, charCount);
            if (byteCount < 0 || byteCount > bytes.Length - byteIndex)
                throw new ArgumentException("bytes", "Array size is too small.");
            if (charCount > 0)
            {
                fixed (byte* _char2val = char2val)
                {
                    fixed (byte* _bytes = &bytes[byteIndex])
                    {
                        fixed (char* _chars = &chars[charIndex])
                        {
                            char* pch = _chars;
                            char* pchMax = _chars + charCount;
                            byte* pb = _bytes;
                            while (pch < pchMax)
                            {
                                Debug.Assert(pch + 2 <= pchMax, "");
                                char pch0 = pch[0];
                                char pch1 = pch[1];
                                if ((pch0 | pch1) >= 128)
                                    throw new FormatException("Invalid hexadecimal sequence.");
                                byte d1 = _char2val[pch0];
                                byte d2 = _char2val[pch1];
                                if ((d1 | d2) == 0xFF)
                                    throw new FormatException("Invalid hexadecimal sequence.");
                                pb[0] = (byte)((d1 << 4) + d2);
                                pch += 2;
                                pb++;
                            }
                        }
                    }
                }
            }
            return byteCount;
        }

        public override int GetCharCount(byte[] bytes, int index, int count)
        {
            return GetMaxCharCount(count);
        }

        [SecuritySafeCritical]
        unsafe public override int GetChars(byte[] bytes, int byteIndex, int byteCount, char[] chars, int charIndex)
        {
            if (bytes == null)
                throw new ArgumentNullException("bytes");
            if (byteIndex < 0)
                throw new ArgumentOutOfRangeException("byteIndex", "Value must be non-negative.");
            if (byteIndex > bytes.Length)
                throw new ArgumentOutOfRangeException("byteIndex", "Offset exceeds buffer size.");
            if (byteCount < 0)
                throw new ArgumentOutOfRangeException("byteCount", "Value must be non-negative.");
            if (byteCount > bytes.Length - byteIndex)
                throw new ArgumentOutOfRangeException("byteCount", "Size exceeds remaining buffer space.");
            int charCount = GetCharCount(bytes, byteIndex, byteCount);
            if (chars == null)
                throw new ArgumentNullException("chars");
            if (charIndex < 0)
                throw new ArgumentOutOfRangeException("charIndex", "Value must be non-negative.");
            if (charIndex > chars.Length)
                throw new ArgumentOutOfRangeException("charIndex", "Offset exceeds buffer size.");
            if (charCount < 0 || charCount > chars.Length - charIndex)
                throw new ArgumentException("chars", "Array size is too small.");
            if (byteCount > 0)
            {
                fixed (char* _val2char = val2char)
                {
                    fixed (byte* _bytes = &bytes[byteIndex])
                    {
                        fixed (char* _chars = &chars[charIndex])
                        {
                            char* pch = _chars;
                            byte* pb = _bytes;
                            byte* pbMax = _bytes + byteCount;
                            while (pb < pbMax)
                            {
                                pch[0] = _val2char[pb[0] >> 4];
                                pch[1] = _val2char[pb[0] & 0x0F];
                                pb++;
                                pch += 2;
                            }
                        }
                    }
                }
            }
            return charCount;
        }

        public override int GetMaxByteCount(int charCount)
        {
            if (charCount < 0)
            {
                throw new ArgumentOutOfRangeException("charCount", "Value must be non-negative.");
            }
            if ((charCount % 2) != 0)
            {
                throw new FormatException("Invalid hexadecimal length.");
            }
            return charCount / 2;
        }

        public override int GetMaxCharCount(int byteCount)
        {
            if (byteCount < 0 || byteCount > int.MaxValue / 2)
            {
                throw new ArgumentOutOfRangeException("byteCount", 
                    string.Format("Value must be within range of 0 and {0}.", MAX_CHAR_COUNT));
            }
            return byteCount * 2;
        }
    }
}

There you have it! These classes will ensure valid encoding and decoding of their respective types and are used just like any other encoding class such as UTF8, ASCII, etc.

Unfortunately, we are not able to attach these classes as static properties to the Encoding class that would have allowed us to call them like: Encoding.Base64.GetBytes() or Encoding.Binary.GetString() because Encoding is an abstract class and extension methods may only be attached to object instances.

Something we can do however, is create our own class to provide easy access to encoding methods. Let's do that just for demonstration purposes. We're going to make this class provide static access to all expected encoding types the abstract Encoding class provides, plus we'll include our own as well.

The class will be called Encodings. Create the Encodings class as follows:

namespace System.Text
{
    public sealed class Encodings
    {
        private static readonly Encoding binaryEncoding = new BinaryEncoding();
        private static readonly Encoding base64Encoding = new Base64Encoding();
        private static readonly Encoding hexEncoding = new HexEncoding();

        public static Encoding ASCII
        {
            get { return Encoding.ASCII; }
        }

        public static Encoding Hex
        {
            get { return hexEncoding; }
        }

        public static Encoding Binary
        {
            get { return binaryEncoding; }
        }

        public static Encoding Base64
        {
            get { return base64Encoding; }
        }

        public static Encoding BigEndianUnicode
        {
            get { return Encoding.BigEndianUnicode; }
        }

        public static Encoding Default
        {
            get { return Encoding.Default; }
        }

        public static Encoding Unicode
        {
            get { return Encoding.Unicode; }
        }

        public static Encoding UTF32
        {
            get { return Encoding.UTF32; }
        }

        public static Encoding UTF7
        {
            get { return Encoding.UTF7; }
        }

        public static Encoding UTF8
        {
            get { return Encoding.UTF8; }
        }
    }
}

Unlike the System.Text.Encoding class, this class contains no methods itself. It is simply designed as a way to access static instances of each encoding type. I marked the class as sealed incase you choose to add other functionality to the class, but if you don't plan on doing so, you could mark this class as static, since it contains only static members.

Let's give our awesome new encodings a test run! Create a console application that either has these classes in it or references the assembly that contains them (if created in a class library, which I would personally suggest).

In the console applications Main() method in the Program.cs class, add a using reference to the System.Text namespace, since this where our classes reside. Next, modify your Main() method to match this:

        static void Main(string[] args)
        {
            string s = "Hello world!";

            var binaryEncoded = Encodings.Binary.GetString(Encodings.UTF8.GetBytes(s));

            var binaryDecoded = Encodings.UTF8.GetString(Encodings.Binary.GetBytes(binaryEncoded));

            var hexEncoded = Encodings.Hex.GetString(Encodings.UTF8.GetBytes(s));

            var hexDecoded = Encodings.UTF8.GetString(Encodings.Hex.GetBytes(hexEncoded));

            var base64Encoded = Encodings.Base64.GetString(Encodings.UTF8.GetBytes(s));

            var base64Decoded = Encodings.UTF8.GetString(Encodings.Base64.GetBytes(base64Encoded));


            Console.WriteLine(s);

            Console.WriteLine(binaryEncoded);

            Console.WriteLine(binaryDecoded);

            Console.WriteLine(hexEncoded);

            Console.WriteLine(hexDecoded);

            Console.WriteLine(base64Encoded);

            Console.WriteLine(base64Decoded);


            Console.ReadLine();
        }

If you get curious and want to benchmark the above classes, you will find they are right on par with the native encoders in the .NET framework. Congratulations! You just did something many programmers have shuttered at the thought of, and you barely even broke a sweat doing it!

As always, feel free to use the code provided in this tutorial or modify it anyway you see fit as I am providing it completely open-source, (of course a mention of gratitude would always be appreciated :) ). Feel free to post questions or any cool modifications you come up with in the comments below. Happy coding!

No comments :

Post a Comment