Tag: Streams

Decompressing text

This post shows how to compress a String to reduce the amount memory it consumes, and this post shows how to use the CompressText function. To be able to read the content of the string, it must be decompressed (or inflated) again. The DecompressText function is one way to do this.

Private Function DecompressText(ByVal B() As Byte) As String
   Dim Result As New System.Text.StringBuilder()
   Using MemStream As New System.IO.MemoryStream(B)
      Using GZStream As New System.IO.Compression.GZipStream(MemStream, _
         IO.Compression.CompressionMode.Decompress)
      Do
         'Note that this makes 1024 bytes in VB.
         Dim Buffer(1023) As Byte
         Dim BytesRead As Integer = GZStream.Read(Buffer, 0, 1024)
         If BytesRead > 0 Then
            Result.Append( _
               System.Text.Encoding.UTF8.GetString(Buffer, 0, BytesRead))
         End If
         If BytesRead < 1024 Then
            Exit Do
         End If
      Loop
      GZStream.Close()
      Return Result.ToString()
      End Using
   End Using
End Function

Now, imagine that B is a byte array returned from the CompressText function. B holds the bytes of a compressed text string. B is passed to the DecompressText function and the function returns the inflated string again. Example:

'Create some text.
Dim S As String = "This is some text that I want to compress. Preferably it's " & _
"a long string loaded from a text file or some XML document."

'Assign the compressed version to the variable B.
Dim B() As Byte = CompressText(S.ToString())

'Decompress it, and display the result.
Dim Decompressed As String = DecompressText(B)
Console.WriteLine(Decompressed)

Have you seen a more elegant way to handle strings in memory than what the .NET Framework offers?

Compressing Genesis

From here, I have downloaded Genesis to see what the GZip stream is good for. GZip is suitable for compressing text, because the file format is totally clean (uncompressed) and the Deflate algorithm manages therefore to compress text to a high ratio. And I like the simile that deflating Genesis leaves very little left. It sort of works with my naturalistic worldview.

The function, CompressText is unchanged, so I only show the Main subroutine.

Sub Main()

    'Load genesis.
    Dim S As String
    Using Sr As New System.IO.StreamReader("genesis.txt", System.Text.Encoding.UTF8)
        S = Sr.ReadToEnd()
        Sr.Close()
    End Using

    Console.WriteLine("{0} characters, {1} bytes.", S.Length, S.Length * 2)

    'Compress it, and display the result.
    Dim B() As Byte = CompressText(S)
    Console.WriteLine("Compressed to {0} bytes.", B.Length.ToString())

    Console.WriteLine("Difference: {0}%", (((S.Length * 2) / B.Length) * 100).ToString("n0"))
End Sub

In this case, the uncompressed version is 588% of the size of the compressed version.

Compressing text

I had a situation today when I had to fit about 12.000 bytes of text in 5.000 byte storage. In .NET, this is so easy to do. Just a few lines of code is required to convert a long String to a short Byte array. Examine the CompressText function. This particular example, compresses 252 characters to 287 bytes, but on larger text pieces, the effect is better. Enjoy!

Module Module1

    Sub Main()

        'For the example, create some text!
        Dim S As New System.Text.StringBuilder()
        S.Append("This is some text that I want to compress. ")
        S.Append("Uncompressed, this text of 252 characters ")
        S.Append("consumes 504 bytes of memory with UTF-8 ")
        S.Append("encoding. The effect of the compression ")
        S.Append("is much better with a larger piece of text. ")
        S.Append("Compressing XML this way is very effective.")

        'Compress it, and display the result.
        Dim B() As Byte = CompressText(S.ToString())
        Console.WriteLine("Compressed to {0} bytes.", B.Length.ToString())

    End Sub

    Private Function CompressText(ByVal T As String) As Byte()
        Dim B() As Byte = System.Text.Encoding.UTF8.GetBytes(T)
        Using MemStream As New System.IO.MemoryStream()
            Using GZStream As New System.IO.Compression.GZipStream(MemStream, _
             IO.Compression.CompressionMode.Compress)
                GZStream.Write(B, 0, B.Length)
                GZStream.Close()
                Return MemStream.ToArray()
            End Using
        End Using
    End Function

End Module