torsdag, juli 08, 2010

Macros to handle Excel time series


I was looking at some climatological indices like SOI, NP, AMO, NAO, DMI and some solar stuff as well, but also things like GISS's global temperature anomaly table. Best way to do that really - unless you're truly weird - is to load the indices tables into a program like Excel and run them through the diagram wizard. I have Office 2003, and there are some nice options to play with:

You can plot comparable diagrams with two Y-axes, and compute running means, linear trends and polynomials for phasing analyses , which are all nice things... I ran into a problem however, and it goes sort of like this:

If you have a 2D table like this one:

                      Jan                Feb
1880              -1                  1
1881              -2                  2

Then Excel will plot it as separate time series on the same time line; which means that you will get two coloured curves: 1 for January and 1 for February. If someone knows of a way where you mark up that data and get Excel to plot one curve with 4 data points then I would very much like to hear about it! - I have looked high and low, not finding a clue that someone would even consider this an issue.

Now the problem, which may be only mine, because I'm an idiot and haven't seen some obvious light in sky can be remedied, at a cost, depending on your solution. What you can do is:


1) Accept the garbage that Excel will actually feed you, which renders a coherent trend analyses a practical impossibility.

2) Compute a mean value by month or year and then plot that, which effectively removes outliers and hence some things of interest.

3) Rearrange the data so that they look something like this:

1880 Jan -1
1880 Feb 1
1881 Jan -2
1881 Feb 2

Indeed, out there on the web, some of the data actually comes in this format, but it is a rare phenomenon.

Anyhow, if you're into trend analysis and spotting all that odd stuff in the data, what you'll want to reach for is that third option, except: you must now face the gruesome reality of hours of labour rearranging a table stretching from say 1856 to 2009, and then realise you have 10 - 15 of those.

Imagine my horror: after days of searching for an automated way of doing this, to come up completely empty. Well, not entirely completely, although for most people it would have been; I actually came up with a somewhat practical solution: all I had to do was program an algorithm in Visual Basic to do the trick.

Now, for a lot of people this is clearly not the option, that they would choose, but fortunately I have some programming experience; indeed I have previously programmed more difficult solutions than this one, albeit that coding was done in Basic, Pascal and Java, but Visual Basic looked at first glance like a mix of those languages, so why not give it a go? - After all I only had to learn the basics.

And I'll even share my hard work with all of you, just in case some poor soul have the same problem, and as of yet, has not solved it - all because I believe in that sort of thing - sharing, that is...

But I must briefly return to the beginning, because in order to produce this problem for myself in the first place, I had had to solve another problem: Often times data indices are served cold on the web, that is: in the shape of plain text tables; if you copy paste those to Excel initially all you'll get is garbage, however, if you keep the selection selected after pasting and choose the Text to Column option in the Data menu, you'll get acquainted with a nice wizard which solves that problem neatly - it can even convert a period notation to a comma notation and vice versa.

Well, now that we know how to quite easily convert all those nasty text tables into something that Excel can actually plot, we'll naturally be or not be more interested in what was originally, and likely ultimately, only my problem.

Let's start with the simple part:

Sub ManyToOne()

 Dim z, x, y As Double

 For y = 2 To Selection.Rows.Count
  For x = 2 To Selection.Columns.Count
   z = z + 1
   Sheets("Makro").Cells(z, 1) = Selection.Cells(y, 1) & " " & Selection.Cells(1, x)
   Sheets("Makro").Cells(z, 2) = Selection.Cells(y, x)
  Next x
 Next y

End Sub

Of course, this is as simple as it gets. Though I will not bother you by explaining the actual mechanics, I can tell you, that this small algorithm will convert a 2D table into the single column (or row) like structure required by excel for a contiguous time series, and as suggested in solution 3.

Create a macro with the dialog boxes in the Functions menu. In the editor paste the above replacing all other text. Insert a new Spreadsheet called: Makro. This is important, because the algorithm will only output to a spreadsheet called Makro, unless you change that in the source code.

Now, find yourself a nice 2D table in a format like the one I showed in the beginning. Mark up everything in a box: years, months and the data itself. You have to be precise with your selection or something might crash. Run the macro from the Functions menu, and then go to the Makro spreadsheet to view the result. You can now plot all the data in a single series.

Using settings in the Functions menus dialogs you can even assign a shortcut key to the macro.

If you prefer rows to columns select everything, copy it, and right-click on a cell to the right of the table columns, choose paste-special from the menu and tag transpose in the dialog - click . The columns are now inverted to rows.

And now I wonder, wouldn't there be at least one person maybe even two, out of all the people in the world, who would lean back in an arrogant posture and say something like: Well, that's very neat of you, but that was an easy trick; what if I wanted to go the other way around? You said yourself, that the data sometimes come in the format we now got, but isn't there something to be said for the 2D format?

And that might very well be so, so here it is:

Sub OneToMany()

 Dim z, y, x, tx, bx As Long
 Dim t As String
 Dim m As Variant

 m = Array("Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec")

 y = 2
 x = 1

 For z = 1 To Selection.Rows.Count
  t = Selection.Cells(z, 1)
  tx = InStr(1, t, " ")
  While tx > 0 And tx < 5
   t = Mid(t, tx + 1, Len(t) - tx)
   tx = InStr(1, t, " ")
  Wend
  If Len(t) > 4 Then t = Left(t, 4)
  If z = 1 Then Sheets("Makro").Cells(y, 1) = t
  If z > 1 And Val(t) <> Val(Sheets("Makro").Cells(y, 1)) Then
   If bx < x Then
    For tx = 0 To x - 2
     If tx < 12 Then Sheets("Makro").Cells(1, tx + 2) = m(tx)
     If y = 3 Then
      If bx > 1 Then
       Sheets("Makro").Cells(y - 1, x - tx) = Sheets("Makro").Cells(y - 1, bx)
       bx = bx - 1
      Else
       Sheets("Makro").Cells(y - 1, x - tx) = ""
      End If
     End If
    Next tx
   End If
   y = y + 1
   bx = x
   x = 2
   Sheets("Makro").Cells(y, 1) = t
  Else
   x = x + 1
  End If
  Sheets("Makro").Cells(y, x) = Selection.Cells(z, 2)
 Next z

End Sub

See, that's programming I like!

Even though this algorithm assumes a data format that goes something like this: [1880 1] or [1 1880] it can handle some deviation from that assumption. It will provide up to 12 months a year, but if your data are 3 months means or even weekly or daily means you can ignore that, and set up your own stickers when the routine is done. It does require labels though, that are 4 characters long, like 1881 to identify each unique column. A label as a whole can be longer than 4 characters if its individual components are separated by spaces (and only spaces). It's imperative however, that the unique identifying component be 4 characters or digits wide, and that no other component be more than 3 characters or digits wide. If you use wisely the Text to columns wizard in the Functions menu when you have pasted a plain text table into Excel, you should be able to provide the algorithm with a format it can work on.

Sometimes these tables come with only the last months of the first year and the first months of the last year, the algorithm will account for such a possibility.

When all that's said and done, be aware, that this algorithm runs on a set of assumptions, which are not always true, and that it can and will return garbage if you do not handle it according to above specifications.

If you want to set up a timeline which it can handle without any problems the following algorithm will do the trick:

Sub YearMonthSeries()

 Dim m As Variant
 Dim i, a, y As Integer
 Dim s, e As String

 m = Array("Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec")

 s = InputBox("Input start year", "Time Series")
 e = InputBox("Input end year", "Time Series")

 For i = Val(s) To Val(e)
  For a = 0 To 11
   y = y + 1
   Sheets("Makro").Cells(y, 1) = Str(i) & " " & m(a)
  Next a
 Next i

End Sub

That's all folks! Hope I've been helpful to one or two people...

Per

Ingen kommentarer: