Academiclog

Monday, July 31, 2006

Retrieve from Sender

Columnist Bill Thompson contemplates the problems caused by spammers forging his address, and he comes up with his own solution. And I just don't like it.

The problem, here, is that it's very easy to forge addresses. If I want to send a mail from pm@pm.gc.ca and claim to be the prime minister (the prime minister of Canada has a crummy email address, but that's irrelevant...), then all I have to do is say I'm sending from there, and the Internet will not tell me that I can't.

Mr. Thompson suggests that the solution to this is to send only the headers, and then when someone goes to pick up their mail, they should then go back to pm.gc.ca to get the rest of the message... if the mail wasn't really at pm.gc.ca, then it'd be quite clear at that point that the message was forged. It's simple enough.

So, what happens if someone trips on a cable and my link to pm.gc.ca is down when I happen to check my mail? Well, I guess I have to keep trying. Not so bad with one mail, but I get over a hundred mails per day. I read news while it downloads right now, but I'd be able to read a whole paper some days if I had to wait for all those server connections.

It makes so much more sense for the computers to do the waiting, and figure out all that authorization stuff while I'm still otherwise occupied. And indeed, the 'net was designed with the idea of nuclear war in mind, so sure enough, we've made it so that if someone blows up the province of Saskatchewan, I can still get my mail from there because I have a local copy... Or if malicious terrorists cut all the cables around the prime minister's office, anything he's already sent will have made it to the recipients, and he won't have to pray that people checked their mail that morning.

What really gets ignored here, though, is that this, like all the other make-sure-the-sender-is-authorized scemes, requires that everyone does the same. Even right now, I could log in, check my mail from teh local copies, and send out little probes to the servers to say "is this what you sent?" -- but unless they know how to respond, it's not going to make my life any easier.

Thompson notes this himself when he points out that his problem lies in getting his ISP to change their infrastructure. So he obviously understands the underlying issues. But why propose a new solution that doesn't address that more pertinent part of the problem? I agree with him that changing the underlying infrastructure would likely do a lot to curb spam, but it's like telling someone with a broken car to buy a new one rather than get it fixed, when it turns out that the real problem is that they don't have enough money to afford repairs.

Thursday, May 11, 2006

A year of bugs

This interesting little survey of browser bugs doesn't tell me anything I wasn't aware of just from reading bugtraq now and again, but it's always nice to have actual numbers to go with the general sense of malaise I have about browser security:

It looks at the space of time between a vulnerability being found and a patch being created for three browsers: IE, Mozilla (apparently they sort of look at the whole family), and Opera.

They define a browser as "safe" when no known remote code execution bugs exist.

Here's the rundown:

IE: 2% safe. There were a grand total of 7 days in 2004 when no bug existed. More than half the year, there was a worm or virus in the wild exploiting one of these bugs.

Mozilla: 85% safe. There were 56 days where there was a publicly known vulnerability and no patch. Many bugs were reported privately to the Mozilla team. It is worth noting that Mozilla-type browsers were not being targeted in 2004 by malware writers due to lack of popularity. The more it grows, the bigger the target it becomes.

Opera: 83% safe. 65 days unpatched. The opera bugs happened to intersect, which could have made the patch take longer, or maybe made the total time shorter because both bugs could be put in one patch. It was also not targetted by malware writers.

All in all, somewhat interesting to put some firm numbers on the gap between discovery of a vulnerability and availability of a patch. And this doesn't even count the days people spend before they have time to patch! (With IE, that sort of gap could completely cover the few safe days they've got!)

Tuesday, March 28, 2006

Choice in the Classroom

I was actually looking up some information on John Aycock, the primary author of this paper, when this title caught my eye. I was lucky enough to hear Dr. Aycock give a presentation earlier today which I hope to discuss later, but I just wanted to make a couple of notes on this paper first.

Aycock and Uhl suggest that it is reasonable to give students more Choice in the Classroom, specifically in a university setting. In particular, they look at flexibility for assignment deadlines and the weights assigned to each assignment to indicate how much of the final mark each comprises.

Assignment due dates are usually arbitrary within a fairly small set of possible dates. There are a number of constraints: there needs to be enough time between each and any tests/exams, there needs to be sufficient time for students to complete each assignment and for TAs to mark them, the students need to have learned the material in-class either before the assignment is given out or at least with enough time for them to finish it after the material's been covered in class, and so on. Usually there's a couple of days flexibility so that the prof can give an extension, though, and "assignment marking does not always happen with the eagerness of vultures descending upon a carcass." (What a metaphor!) So, given that this is true, why not allow students a little leeway?

What they attempted was a "Time bank" system, where each student starts the term with two banked days, and then if necessary, they could use those days to extend their assignment deadlines. That is, they could choose to turn in two of their assignments one day late, or they could use both days at once and turn in one assignment two days late (or, of course, they could turn everything in on time). There was no penalty or bonus for using the days or not, and days couldn't be used in fractions. This seemed to be very successful, and reduced the number of extensions requested by students. It even seems like a fairly simple thing to implement in conjunction with an online submission system -- I'd love to see it tried. I also wonder what would happen if you were allowed to put days back into your bank by turning in assignments a day or two early. You'd have to limit the maximum total to 2 days out of consideration for the marking TAs, but I wonder if people would take advantage of this?

For the marking weights, they used a contract system, where students chose the weights they wanted for each assignment, within the values required. (eg: two assignments, total worth 40% of the mark, each one worth 10%-30% depending upon the student's choice) This was apparently less well-received, and the authors comment that some students seemed to have anxiety about choosing such values. I suspect I'd see this with my students if I tried it -- some of them are very concerned about getting things exactly right, and would have fits assuming there was some "right" way to do this, even though as an instructor I know it's somewhat arbitrary. The other large problem with this is that students didn't know at the beginning of term how difficult the assignments would be for them, making it hard for them to make an educated guess. Giving more information about the assignments helped, but I have to say, this seems less like a benefit to the student than the time banks.

Between these two ideas, though, is the thought that students should be able to tailor their educations to suit them better: know you're going out of town the weekend before an assignment is due? Make sure that assignment is worth less so it won't matter as much if you're distracted. Two assignments due at the same time? Extend one of them a day or two using your banked days. Students always complain that instructors make decisions without thinking about other courses, and instructors know that it's hard to please everyone (although when most of your class is taking the same other course, I do think it's reasonable to expect a prof to try to adjust dates a bit -- many at least try to schedule midterms on different days by asking students). But these seem like they might be interesting techniques for customizing a course... worth thinking about for teachers!

Monday, February 13, 2006

Digital Security Seminar: Optimising Malware

Optimizing Malware (abstract) doesn't sound like something one would really want to do as a "good guy" in computing, and really, it isn't. In fact, this paper (or at least the associated talk which I attended) is more about developping a set of metrics to see how optimized a given piece of malware is for its purpose. Measurements make it easier to compare things, and goodness knows malware (viruses, spyware, worms, etc.) has a ways to go in efficiency, and it could be useful to be able to put them on a scale and see if there are any trends.

Dr. Fernandez actually stood up and said, up front, that there was nothing particularly new here, just a nice organization of existing knowledge about the ways in which malware is typically not good software.

That malware contains horrible code is of no surprise to most people who've looked at virus code, but I find a lot of people are shocked the first time they find out how utterly inefficient and buggy some viruses are. (Mind you, people are also shocked when they find out you can just read virus code without getting a virus, so what can I say?) If you're interested, read up on the evolution of Code Red if you want a nice example of buggy virus code (and how it's still causing problems world-wide). Also, read up on the Morris/Internet worm if you're curious about more advanced virus technology. Yes, that's right, the first Internet worm contains stuff rarely seen in more recent viruses. (I have good academic papers on these subjects, but I don't seem to have links handy right now, so try wikipedia or something.)

The point is, it's known that current malware is not terribly efficient and could be much better. Dr. Fernandez suggested that his metrics could eventually be used for security types to make guesses about more advanced malware so that we could build more advanced systems. And here's where the most interesting question of the talk came up: why?

Current malware, as inefficient as it is, isn't being perfectly handled. We haven't built any perfect virus scanners yet. Why push towards problems that don't yet exist when we can't solve those that do?

There's always the "So we're prepared for the future!" argument towards doing co-evolution by yourself (as in, trying to evolve both sides on your own), but, well, I have to admit that as fun as academic thought-exercises are, the person who asked the question has a point about how it might be better to use our combined brain power to solve real-world problems rather than spending time making up ones that don't yet exist.

All in all, though, an interesting talk, and a potentially useful way to look at malware.

Thursday, January 12, 2006

How do you find academic papers of interest?

Keywords: research papers, search, tools

I'm currently in what I'll call an "exploration" phase of my research. I have a number of ideas percolating in the back of my head, but since I'll be committing to a thesis topic sometime soonish, I'd like to take a bit of extra time to explore a broader base of ideas and see if something more compelling emerges.

Today, I was doing a bit of academic surfing -- trying to find papers without having a specific topic in mind, just some broader research areas -- and I got to thinking about how we find information of interest. How do people find papers of interest to them?

To summarize what I do, here's a list in no particular order:
For specific searches:

Web search engines (eg Google)
Academic search engines (eg Citeseer)
Information sites (eg Wikipedia)
Reading papers cited by ones I found interesting
Reading papers that cite ones I found interesting

For more general topics of interest:

Checking bibliographies
Looking up colleagues'' web footprints (their personal webpages, papers that mention their names, you name it)
Asking colleagues what they've seen of interest
Conference proceedings
Following along from one point of interest

When I'm searching for a particular subject, it can be fairly easy to find papers. A few keywords in Google is usually the place to start, since then I can researchers' personal pages which are often very handy (note to self: my own academic page is just a listing of papers, and it wasn't even up-to-date until a few minutes ago when I looked at it and noticed. I'll have to do more work on that.), as well as links to papers. The problem with Google as an academic search tool is well-known, however: it is difficult to tell at a glance which papers come from reputable, peer-reviewed sources, and which ones are just random term papers put online by students or professors, or random essays written by the internet's plethora of know-it-alls.

This is not to say that non-refereed papers aren't useful. I've found that on occasion they provide excellent summaries of work, and those simple overviews often help me figure out where to keep looking. In addition, their bibliographies can provide a wealth of useful citations for further reading on a subject. Of course, they can be completely useless or, worse, incorrect or misleading. That's true of anything you read, though.

Of similar value is Wikipedia. Although the entries themselves have to be taken with a grain of salt, they often contain beautiful plain-language summaries as well as links to useful readings and researchers of note. Similarly, any site that collects links and summarizes information can be handy. I'll admit that even about.com has been known to help me find things.

If I'm looking for more of a pure academic work then I'll switch to something like the venerable Citeseer. Because Citeseer seems to have had a lot of issues with slowness and downtime in the past year, I'll occasionally bite my tongue and use the more questionable Google Scholar. (I could write a whole essay on what I like and don't like about Google scholar, but that is for another time.)

However, all of these things work better when I have a fairly specific subject in mind. For today, I was looking for "recent papers that might be of interest to me" -- pretty broad, and although there's some fascinating work on AI agents to handle "of interest to me" (mental note: look for more papers on that subject), currently I can't just type that into a search engine.

So what do I do then?

I started by looking up a bibliography that's maintained on my specific field, artificial immune systems. (AIS bibliography) It doesn't seem to contain my latest paper, even though it was published in the artificial immune systems conference this summer, but who am I to complain when I hadn't even listed it on my website until today?

I also went and looked up some academic colleagues. The people who I've met and enjoyed conversations with at conferences are likely to have other interesting papers, or links of interest on their websites. One even had a slightly old reading list which I'm poking through at the moment. (He also studies methods for finding documents of interest to people. Maybe I should be trying his software.)

When I'm on campus, I'll sometimes just stop by and talk to people and ask if they've read anything interesting lately. Sometimes, people have excellent recommendations, and the ensuing discussion is always fun. Access to so many smart, interesting people who have time to discuss research interests is one of the major perks of university affiliation. Not that my coworkers in other jobs haven't been smart and interesting, but for programming coworkers, chatting about research tended to be outside the job description, whereas even busy profs can make time to discuss things because that *is* part of their jobs.

I can check out conference proceedings themselves for papers of interest. The last conference I went to had all sorts of interesting presentations and I haven't followed up by reading all of the related papers, and there are of course plenty of interesting conferences I haven't gotten to of late.

And, of course, once I find one source of interest, I follow through on all those things I discussed earlier: look up the author(s), read papers in the bibliography or papers that cite this one, check out other papers at the same conference/journal, etc.

So, those are some of the things that work for me, from an academic perspective. How do other people find things to read?

Tuesday, June 14, 2005

Building a Reactive Immune System for Software Services

Keywords: self-healing computing, computer security, automated patching, zero-day attacks, patch creation

I got excited seeing the title of this article, but Building a Reactive Immune System for Software Services doesn't really contain any immunology. There's no immune system involved, per say, it's just that it's a self-protecting system of sorts.

The idea is quite clever. First, the bug must be found using some extra instrumentation. Then, once the fault has been pinpointed, the code is recompiled so that the problem section can be run in an emulator. The emulator makes a backup version of the state before executing any code, and if, for example, a buffer overflow occurs (the emulator checks for memory faults), the program is reverted to the pre-fault state. Thus, when it returns to non-emulated code, there will be an error since some code has not been executed, but the hope is that this error is something which the rest of the application can handle. The emulator tries to make intelligent guesses about return codes with some heuristics: -1 if return value is int, 0 if unsigned int, some cleverness with pointers so that NULLs won't be de-referenced.

I wasn't too sure if blocking out random sections of code was really all that viable, but it turns out that servers really are pretty robust that way. They tested with apache (Where would academia be without open source software to play with?), and tried applying their technique to 154 functions. In 139 of the 154 cases, their tests showed that the altered apache did not crash, and often all of the pages were served. Results for bind and sshd were similar. Tests with actual attacks showed that rather than crashing, the servers could continue execution and serve other requests. It certainly sounds promising for servers that need to have high-availability!

The performance impact for this selective emulation is not too severe, but it does require access to the source code to compile in the patch, and there needs to be some fairly heavy instrumentation to determine where the patch should be put. There is also some risk that the emulation will open up new security flaws -- imagine what would happen if the emulated function was required for input validation.

But overall, I found this pretty interesting. I'd been working on a somewhat related idea for a term project, and the talk given by Dr. Keromytis about STEM has gotten me thinking that what I thought was a fun but not-terribly-viable idea might actually be worth pursuing.

I'm hoping to look at all the papers cited in the related works section of this one, but for those looking for a little bit more right now, I direct you to Val's summary of the failure-oblivious computing work. Neat stuff!

Wednesday, June 08, 2005

Patch-on-Demand Saves Even More Time?

Keywords: computer security, automated patching, zero-day attacks, patch creation

As mentioned in the previous post, I went to see a talk by this author and was really interested in what he's been doing, so I'm looking through his papers. This little 3 page magazine article, Patch-on-Demand Saves Even More Time? is a very short introduction to a problem in security patching: how do we make sure it happens fast enough? Right now, quite a lot of the newsworthy security issues are related to flaws that have been known (and patched) for months, but not all of them are like that. The idea of a "Zero day attack" is that it's possible for the flaw and exploit to occur on the same day, leaving users with no advance warning in order to prepare their systems. There's great worry that these will become more common.

So how should we handle this issue? His answer: Automation! Actually, it's a pretty common answer, it's just that the system he's looking at creates the patches as well as applying them, whereas most of the systems we see are all about applying existing (human-created) patches in an automated or semi-automated way.

Automated patch creation is not a perfect answer, but it's got a lot of potential... for good and bad. How can we be sure the cure isn't worse than the disease, as it were? "The risks of relying on automated patching and testing as the only real-time defense techniques are not fully understood." While I doubt they'll ever be fully understood, I am curious if there's any way we can understand them in a general way short of implementing a bunch of ideas and generalizing.

Anyhow, he's had some good results (which he discussed in the talk I attended) with such an automated patching system that took source code and basically emulated the bits that were known to have problems. I'll probably talk about his system later, but I thought I'd mention paper first as an easy read and introduction to this particular problem, before I start looking at proposed solutions.

This is a collection of write-ups about interesting papers and other academia-related things I've encountered in the course of my work.

Expected topics include: artificial intelligence/artificial life, evolutionary computing, spam detection, computer security, autonomic computing, self-healing systems, artificial immune systems, biology, immunology, algorithms, and data mining.