Showing posts with label python. Show all posts

Monday, November 12, 2012

Real-Time Graphing of Arduino Data (Python, matplotlib, and PyQt)

A simple LED controlled by a photocell. adafruit.com

This is just going to be a quick example of how to read some serial data off of the arduino and make a real-time plot of that data using python (python 3.2), matplotlib, and PyQt. Nothing really fancy going on here and nothing that hasn't been done before (check the Resources) but this does combine a few disparate ideas and does get it working with python3, so that's something. Our test device is a simple LED hooked up to a photocell so that as it gets darker the LED gets brighter. The raw analog input from the photocell is written out to the serial port. On the python end, we simply read off the serial port and update a graph. All code is available on github.

Arduino

The arduino setup is fairly simple and mostly follows the tutorial supplied by adafruit.com listed under 'Simple Demonstration of Use'. Interfacing properly with the serial port, or, rather, getting python to correctly read the data from the serial port, did require altering a few things in the arduino code. Full source here.

The first thing to notice is our byte val = 0; at the top of the file. This is the variable that we are going to use to store the raw analog value off the arduino and I couldn't get it to work without it reading into a byte directly (as opposed to an int). Other than that, there is not much different from the adafruit tutorial except for the fact that I split the bulk of the code into a separate function adjust_led.

void loop(void) {
  val = analogRead(photocellPin);
  Serial.println(val,DEC);
  adjust_led(val);
  delay(100);
}

void adjust_led(int photocellReading){
  // LED gets brighter the darker it is at the sensor
  // that means we have to -invert- the reading from 0-1023 back to 1023-0
  photocellReading = 1023 - int(photocellReading);

  if(photocellReading < 0){
    photocellReading = 0;
  }
  if(photocellReading > 250){
    //now we have to map 0-1023 to 0-255 since thats the range analogWrite uses
    // for our purposes we only map 250-600 to get a better light range. 
    // feel free to experiment here.
    LEDbrightness = map(photocellReading, 250, 600, 0, 255);
  } else {
    LEDbrightness = 0;
  }

  analogWrite(LEDpin, LEDbrightness);
}

Pretty standard stuff. In the loop we read the analog pin and write it out to the serial port with a newline and using a DEC format. This is all that is needed in order to plot from python and the rest of the arduino code deals with adjusting the lighting on the LED. Two things to note: if the photocellReading is less than zero we bump it to zero; if less than 250 we don't turn on the light. So, ideally, the LED should stay off until it is "sufficiently" dark and then it should turn on and proceed to get brighter as it gets darker. The code is not perfect. For some reason I see some flickering and spiking going on and the overall transition isn't as smooth as I would like in terms of the visible effect of the LED. However, since our purpose is to get a graph going, and because looking at the graph might help us debug what is going on with our arduino, we just ignore all that for now and go on.

Python

There are two python files that we deal with. The first, SerialData.py, is fairly generic and should be able to use any serial data from an arduino. The second, light_sensor_plot.py, creates our PyQt gui, deals with our data, and creates the graph.

SerialData.py

The code is lifted almost directly from this tutorial (on his github this file is called Arduino_Monitor.py). A few differences:

        buffer = buffer + ser.read(ser.inWaiting()).decode()

I had to add .decode() while reading the buffer to get anything to work. I think this is a result of the difference in how python3 handles byte values versus python2.

        if not self.ser:
            return 0

I return a zero (0) instead of 100 if our serial port is not read. Ideally shouldn't even get here. I also remove the line that prints "bogus" on the ValueError exception. See my full code.

light_sensor_plot.py

Most of basics of this file are lifted from Chapter 6 of Sandro Tosi's Matplotlib for Python Developers. In that chapter you will find a section called 'Real-time update of a Matplotlib graph' which does what it says on the tin. In this example Tosi is graphing some cpu values from psutil and all I really do is strip all of that out, get our data from SerialData.py instead, and graph away. A lot of Tosi's code deals with the cpu values whereas for our data we don't need to do any additional processing. Perhaps the biggest difference is that Tosi is reading values once per second for a maximum of 30 seconds and sets up his graph accordingly. Our graph, however, wants updates more than once per second but also wants to scroll the graph accordingly. The update value is easy enough:

self.timer = self.startTimer(100)

This will give us an update every 100 ms, which should be smooth enough. The scrolling graph turns out to be just as easy:

        # force a redraw of the Figure - we start with an initial
        # horizontal axes but 'scroll' as time goes by
        if(self.cnt >= self.window_size):
            self.ax.set_xlim(self.cnt - self.window_size, self.cnt + 15)

So all we do here is check to see if our iteration count is greater than our window_size (30) and if so we set the x limit to be a 'window' that follows the data, with window_size space behind and 15 spaces ahead.

Other than that there is not actually much to the example. I think ideally I would like to split it up so that we have a file for reading from the arduino (currently SerialData.py), one for generating the gui, and one for dealing with the context specific data. You can see that our gui is really simple. Tosi and Eli go on to add fancy naivgation bars and all that so you can build from here.

Real-time graph of serial data values read from arduino.

Resources

Tuesday, October 2, 2012

Data Alignment with Pandas - Scrubbing Data

So I had written, or, rather, attempted to write, a long post about how I successfully installed Pandas, battled the dragons of library version numbers, slayed the inconsistencies in my setup, and emerged victorious with the Princess of Matplotlib firmly saddled to my pony, but in the end it turns out I have no idea how I got things working. But they are working and it's best to let sleeping dragons lie. (Actually, this is not quite true. I'm basically just using the system python3.2 installing things via apt and pip. Perhaps later I'll go back and set up an appropriate virtualenv with pythonbrew and blog about that.) From here on out I just assume you have things working too.

A common task when starting analysis would be to get all of our data lined up and in a format where we can actually do something usable with it. For this project we have a number of data sources that we want to scrub and then align.

File Format

The dimm and mass data include six datetime columns and one value column.


% head 2009.dimm.dat
2009 09 22 07 36 36 0.41370476
2009 09 22 07 38 02 0.44429658
...

% head 2009.mass.dat
2009 09 22 07 35 41 0.26
2009 09 22 07 37 06 0.23
...

The weather data is as follows:


year month day hour minute(HST) wind_speed(kts) wind_direction(dec) temperature(C) relative_humidity(%) pressure(mb-if present)

% head cfht-wx.2009.dat
2008 12 31 23 59 9 274 -4.79 95 612.8
2009 01 01 00 04 10 275 -4.83 94 612.8
...

Our goal is to put all of these items into one DataFrame with a datetime index and columns for dimm, mass, and the various weather pieces. To do all this we will need to do a little pre-processing of the files, which exist for years 2009 to 2012, a little pulling of that data into pandas, and then a little manipulation in order to align everything properly. So, let's see how that's done. Note that all of our data is freely available from the following:

http://wxws.ifa.hawaii.edu/current/seeing/data/ (dimm and mass)
http://mkwc.ifa.hawaii.edu/archive/wx/cfht/ (weather data)

Pre-Processing

It is a whole lot easier to concatenate all the datetime fields into one and then just pull those in to pandas. While we could do that as we pull in the data it is probably faster to just do it on the command line using our good buddy awk:


% for file in `ls *{dimm,mass}*`; do
for> cp $file $file.bak;
for> awk '{print $1$2$3$4$5$6,$7}' $file.bak > $file;
for> done

% cat *dimm.dat > all_dimm.dat
% cat all_dimm.dat | sort | uniq -w 14 > all_dimm_uniq.dat

% cat *mass.dat > all_mass.dat
% cat all_mass.dat | sort | uniq -w 14 > all_mass_uniq.dat

% head -n2 all_dimm_uniq.dat
20090922073636 0.41370476
20090922073802 0.44429658

% tail -n2 all_dimm_uniq.dat
20120925052836 0.29
20120925190417 nan

There are probably sexier ways to accomplish the above without the unwieldy print statement in the awk, but it took me all of seven seconds to do, so I call it good. Here you can see we process all the dimm and mass files, eliminating the spaces between the datetime fields, and then we concat all the files into an all_dimm.dat and all_mass.dat respectively. Because we have identified duplicate time entries in the files we go ahead and sort and uniq the .dat files and eliminate the dupes based off the first 14 characters (the datetime stamp). Wash and repeat with the weather data using an even more obtuse print statement. Note that the weather data doesn't include seconds so we just add them in with our print statement. In fact, I should show that just because of its hideousness:


awk "{print \$1\"00\",\$2,\$3,\$4,\$5,\$6}" all_weather.dat.bak > all_weather.dat

In the end we should have three files, all_dimm.dat, all_mass.dat, and all_weather.dat, all of which have a unique datetime value for the first column, followed by a space, and then any remaining values (which is just dimm for dimm, mass for mass, and all those other things listed above for weather). So let's start pulling that into pandas.

Pull Into Pandas

For the record, I am using ipython3 notebook --pylab=inline for my work. To get it into fancy blog format, see Fernando Perez's helpful post. I had to alter this to work with python3.2 and will be sending my pull request to him shortly (it's kind of hackish for just my system right now. Hey, I'm a busy guy).

In [50]:

cd /home/wtgee/Projects/seeingstats/

/home/wtgee/Projects/seeingstats

In [51]:

import pandas as pd

In [52]:

dimm = pd.read_table('data/all_dimm.dat', sep=' ', header=None, index_col=0, names=['date','dimm'], parse_dates=True)

In [53]:

mass = pd.read_table('data/all_mass.dat', sep=' ', header=None,   index_col=0, names=['date','mass'], parse_dates=True)

In [54]:

weather = pd.read_table('data/all_weather.dat', sep=' ', header=None, index_col=0, names=['date','wind_speed','wind_dir','temp','rel_humidity','pressure'], parse_dates=True)

So what we have done is use pd.read_table to pull in each of the respective files. Nothing tricky here, just specifying the file, a space separator, no header, and our index is our first column which we use with parse_dates=True so that we will get a DatetimeIndex. We also go ahead and name our columns at the same time.
At this point we have three DataFrames that we would like to combine into one. However, before we can do that we need to do a little more scrubbing.

In [55]:

dimm.index

Out [55]:

<class 'pandas.tseries.index.DatetimeIndex'>
[2009-09-22 07:36:36, ..., 2012-09-25 19:04:17]
Length: 233060, Freq: None, Timezone: None

In [56]:

dimm.head(3)

Out [56]:

	dimm
date
2009-09-22 07:36:36	0.413705
2009-09-22 07:38:02	0.444297
2009-09-22 07:40:53	0.421856

In [57]:

mass.head(3)

Out [57]:

	mass
date
2009-09-22 07:35:41	0.26
2009-09-22 07:37:06	0.23
2009-09-22 07:38:33	0.22

In [58]:

weather.head(3)

Out [58]:

	wind_speed	wind_dir	temp	rel_humidity	pressure
date
2008-12-31 23:59:00	9	274	-4.79	95	612.8
2009-01-01 00:04:00	10	275	-4.83	94	612.8
2009-01-01 00:09:00	10	272	-4.83	93	612.8

As you can see above, our DatetimeIndex for each set currently have no Frequency. Furthermore, our data is not aligned in terms of timestamps (for instance, our first dimm reading is at 07:36:36 while our first mass reading is 07:35:41). Since our weather data is in 5 minute intervals and this seems like a pretty sane default to start with, we will need to do some massaging of the dimm and mass data to get them there as well. The general technique for this would be to convert the DataFrames to a Frequency of Seconds, forward-filling (or 'pad'ding) the data while we do that and then resample all that data into 5 minute intervals using the mean values.
Here's where we pull out our beefy computer since we are going to be creating data values for every second for a three year period. Luckily, our work is easy while the computer's work is hard:

In [59]:

dimm = dimm.asfreq('S', how='pad').resample('5min', how='mean')

In [60]:

mass = mass.asfreq('S', how='pad').resample('5min', how='mean')

In [61]:

weather = weather.asfreq('S', how='pad').resample('5min', how='mean')

In [62]:

dimm.head(3)

Out [62]:

	dimm
2009-09-22 07:40:00	0.429001
2009-09-22 07:45:00	0.421856
2009-09-22 07:50:00	0.636297

In [63]:

mass.head(3)

Out [63]:

	mass
2009-09-22 07:40:00	0.236667
2009-09-22 07:45:00	0.230000
2009-09-22 07:50:00	0.320000

In [64]:

weather.head(3)

Out [64]:

	wind_speed	wind_dir	temp	rel_humidity	pressure
2009-01-01 00:00:00	9	274	-4.79	95	612.8
2009-01-01 00:05:00	10	275	-4.83	94	612.8
2009-01-01 00:10:00	10	272	-4.83	93	612.8

One last simple task for now is to remove all the weather dates from 2009 that we do not have dimm and mass data for:

In [66]:

weather = weather['2009-09-22':'2012-09-22']

In [67]:

dimm = dimm['2009-09-22':'2012-09-22']

In [68]:

mass = mass['2009-09-22':'2012-09-22']

So at this point we have data for a specified range of dates. Note, however, that our individual times our going to be off. For weather data we have a reading every five minutes for 24 hours each day while for the dimm and mass we have five minute intervals but only for observing times, that is, sunset to sunrise. Because of the nature of the instruments, however, the dimm and mass data don't line up precisely for what they consider sunset and sunrise. So next time we will be looking at how to get values for each of the DataFrames that actually line up.

Sunday, September 16, 2012

Biting Pythons

There, I went ahead and used the obligatory python-pun. I'm sure everyone learning or using Python is required to do this at some point or another. I'm glad I got it out of the way.

So I poured right into the Pandas documentation and tried to run through their examples with some of the data that I have. Almost immediately I ran into a a bug in panda's date_converter. I think anyway. I'm always really skeptical that I have found an actual bug and think instead I am probably just doing something wrong. Still, this case seems pretty straight-forward. So, me thinks, I'll just hop into the code and edit it some. Yay open source! And that's when the biting started.

Part of my whole problem with python has been just learning the ecosystem. Lately I've been spoiled with perlbrew, cpanm, and local::lib and foolishly started using python without looking for equivalents. So, not only was my python3.2 tied up with the system but all my modules were going into /usr/local/. I was tempted to work around this but figured since I am really at the beginning of using python I would start it over and use pythonbrew which supports venv (virtualenv). Then I can use pip (I think this is what the cool kids are using these days as opposed to easy_install) and that means I can hack on the pandas. So, like all things, what was going to be a two minute bug fix now involves re-building my whole environment. :)

Looking at Pandas, however, it looks like it will be a great boon. Obviously, since it is designed specifically for what I am doing, it has a lot of features I can utilize. Actually, the entire script I have so far can probably be reduced to about three lines.