Project Ideas

Security

Sandbox cookies or dilute their ability to undermine your privacy by generating fake traffic out-of-line scenes.

Build a sandbox that fakes permissions. It defaults to pass thru with learning. When it learns enough examples it fakes the responses, until the user tells it the app is failing or if the app is updated, when it returns to its learning state.

Bimodal Program Analysis

RefiNym learns type refinements for C#. RN++ would do the same for C/C++ or for TypeScript.

Some tests in a test suite validate that the program has some behaviour; other tests validate that the program does NOT have some behaviour. The former are positive tests; the latter are negative. This project would build a tool to automatically classify tests into positive and negative

Program Analysis

The goal is build a tool that can separate a program's source code into its core logic and ancillary logic for handling corner cases and errors. The tool would be useful in code review, in focusing static analysis, as new kind of test coverage measure: core logic covered. The first step would an empirical study to learn and harvest features that separate the kinds of code: is there enough signal for humans to separate the two? The next problem will be establishing the ground truth. The final task would be to build a classifier, possibly using deep learning. This is an ambitious project; a distinction thesis would only need to make progress on it.

Classical Software Engineering

Write a tool that overlays a GUI. For a web app, the tool could a browser extension. This tool allows an enduser to capture, the steps and their locations in a GUI, like a web app, that trigger a bug. This tackles the problem of field bug reproduction.

Test Case Intent When a developer adds a test case to a test suite, they did so for a reason. To validate either a correct behaviour exists or an incorrect behaviour does not. This project would seek to recover that intent.

Infer an input probability distribution from test suite. Does a test suite just define a finite support?

git blame stops at the last write to a line. The project would implement recursive git blame, which would use a similarity measure to chase writes until no line is found within a specified similarity. One challenge will be when more than one line exceeds the similarity threshold.

Replicate Reid Holmes' surprising coverage claim in Coverage is not strongly correlated with test suite effectiveness.

Program Transformation

Functionalizer: takes a code snippet and turns it into a function.

Idempofier Extract functions into a test harness that makes their execution Idempotent, like system calls, then loop them to learn a function summary. Use case is concretisation in symbolic execution.

Build a test harness to discover, with high probability, loop-carried dependencies,

Build an automocker, a tool that takes a function signature and produces a mock object for it.

Empirical Software Engineering

Study the libreOffice take over and the SSL take-over. https://people.gnome.org/~michael/blog/2015-08-05-under-the-hood-5-0.html

Identify a program's core logic: See above.

Life cycle/expectancy of code snippets: Vary organism size from subline to file. See http://www.karolikl.com/2015/08/the-results-of-my-developer-survey.html?m=1

Github project viability/predict popularity: What is project popularity? And so on. May relate to the tweet cascade work of the Stanford professor, who studied under Kleinberg.
http://www.pnas.org/content/109/16/5962

Miscellaneous

Can we use information theory to separate random and benchmark SAT formulae and escape Koenig’s critique?