Recently, I visited a friend of mine who is the owner of a small business. I found him frantically typing in Excel 2003. He told me that he had recently started working on a project with a client and my friend’s obligation was to provide his expertise to his client on particular matters. The client provided my friend with data and my friend had to analyze them and report his findings to the client.
Every day, the client provided my friend with a text file of no more than 1,000 rows. A few of these rows were blank. The other rows contained tokens (words) separated with one space from one another. My friend needed to obtain an Excel spreadsheet with the data, but here is the catch: for each line of the text file, he needed to separate each one of the last five words. This was because the first part of each line was a string that resembled a sentence or a description and, thus, it had spaces between the words.
Let me give you an example. If the text line was
11 22 33 44 55 66 77 88 99
then my friend needed to produce the following strings:
“11 22 33 44”, “55”, “66”, “77”, “88”, “99”.
An another example: If the text line was
aa bbb c ddd ee fffff gg hhhh ii j kkkkk ll mm nn oo pp qq
then my friend needed to produce the following strings:
“aa bbb c ddd ee fffff gg hhhh ii j kkkkk ll”, “mm”, “nn”, “oo”, “pp”, “qq”.
So, each day, my friend obtained the text file from his client for that day and he pasted its contents in Excel. The contents would go into the column A. Then my friend cut and pasted each one of the last five words from each line into the columns B, C, D, E and F, respectively.
Each daily file contained hundreds of lines and my friend would spend up to six hours doing the tedious, housekeeping work of separating the last five tokens from each line. Even if my friend knew about the “Text to Columns” functionality in Excel, that would nit have benefited him, because “Text to Columns” separates every word from the others. My friend needed only the last five words to be separated and the problem was that the first part had variable numbers of spaces from line to line.
My friend was tired and frustrated and was thinking about dropping this account. The only part that was tedious was the separation of the last five words from each line. From then on, things were easy for my friend. He would easily complete his analysis, and anyway, that was what he wanted to do and knew how to do it.
He also told me that he though about giving this work to his secretary or another one of his employees, but he was afraid that they might be miffed. He once gave the daily file to his secretary and she was not able to finish it during her work hours. He was also afraid that whoever of his employees would get this tedious work, would be overwhelmed and miffed and he did not want to have unhappy subordinates. So, he took upon himself this unthankful job and now he was thinking of dropping the account altogether.
I told my friend that I would try to find a solution and that this might not be hard. The next day, I visited him again and presented him with an Excel VBA that I wrote specifically for his situation. When the Excel VBA macro would run, it would process the first 1,000 lines of the Excel spreadsheet. If the line would not be blank, then the macro would delete the last five tokens from the first cell and would put them in the columns B, C, D, E and F next to it. I also created test data for the occasion. I used the lyrics of one of my favorite Stones song and I also created other test data and even wrote a small poem! I merged all that together to create the following test file:
Love is strong and you're so sweet aa bb cc dd ee You make me hard you make me weak ff gg hh ii jj Love is strong and you're so sweet kk ll mm nn oo And some day, babe we got to meet pp qq rr ss tt A glimpse of you was all it took uu vv ww xx yy A stranger's glance it got me hooked zz abc def ghi jkl And I followed you across the stars mno pqr stu vwx yz I looked for you in seedy bars a bc def ghij klmno What are you scared of, baby This is only a test. It's more than just a dream It is boring coming up I need some time With test data like that We make a beautiful team Maybe I'll just write a-song Beautiful team To which you'll sing along Love is strong and you're so sweet So what do you think And some day, babe we got to meet my friend? Will you stay Just anywhere out in the park till the end? Will you Out on the street and in the dark help me pull this through? I followed you through swirling seas If you do, I'll help Down darkened woods with silent trees you, too. Cause this is Your love is strong and your so sweet what friends are for and You make me hard you make me weak you should learn this for I wait for you until the dawn sure. Now the test comes My mind is ripped my heart is torn to an end, no more And love is strong and your so sweet me data left to churn. Hurray! Your love is bitter it's taken neat OK, just this last line.
I created an Excel spreadsheet and added a module to which I copied my macro:
Option Explicit Sub BreakItApart() Dim myRowCounter As Integer Dim myLetterCounter As Integer Dim myInitialString As String Dim myLength As Integer Dim myPlace1 As Integer Dim myPlace2 As Integer Dim myPlace3 As Integer Dim myPlace4 As Integer Dim myPlace5 As Integer For myRowCounter = 1 To 1000 myInitialString = "" myLength = 0 myPlace1 = 0 myPlace2 = 0 myPlace3 = 0 myPlace4 = 0 myPlace5 = 0 'I use Trim() to get rid of any leading and trailing spaces. myInitialString = Trim(Cells(myRowCounter, 1).Value) myLength = Len(myInitialString) If myLength = 0 Then 'Do nothing, skip blank lines Else For myLetterCounter = 1 To myLength If Mid(myInitialString, myLetterCounter, 1) = " " Then myPlace1 = myPlace2 myPlace2 = myPlace3 myPlace3 = myPlace4 myPlace4 = myPlace5 myPlace5 = myLetterCounter End If Next Cells(myRowCounter, 1).Value = Left(myInitialString, myPlace1) Cells(myRowCounter, 2).Value = Mid(myInitialString, myPlace1, myPlace2 - myPlace1) Cells(myRowCounter, 3).Value = Mid(myInitialString, myPlace2, myPlace3 - myPlace2) Cells(myRowCounter, 4).Value = Mid(myInitialString, myPlace3, myPlace4 - myPlace3) Cells(myRowCounter, 5).Value = Mid(myInitialString, myPlace4, myPlace5 - myPlace4) Cells(myRowCounter, 6).Value = Mid(myInitialString, myPlace5) End If Next End Sub
I copied the contents of my test file in the first column, thus producing the following:
Then I run the macro, thus producing the following: