Languages are defined by their grammar and their semantics.
The grammar says that "a $ 3 ;" is incorrect Java. The semantics
describe what exactly a grammatically correct piece of code actually
means. For example, what does it mean to write something like "a = 4;"? It's
context dependent - it's an error if you haven't defined "a" to be some
number (int/long/float/etc.) variable.
Imagine that I give you the grammar of the Greek language. Using it
you can identify which Greek sentences are correct and which are wrong
but without knowing the semantics you cannot say what the correct
sentences actually mean - it's all Greek to you! Well, the same
happens with computers.
Now it's getting trickier - how do you know that your C++ compiler correctly implements the C++ standard (i.e., the C++ semantics as expressed in English by the standard committee)? The C++ compiler developers have tried to interpret what the committee had written in English and codify that into their compiler but it's not an easy task - bugs have been identified on many occassions.
There's a way to make this task easier by using a recently developed framework called the K Framework. There you use a specific formal (not just English) language to describe the semantics of a language and there are tools to produce the compilers for you automatically (like writing a UML statechart and then getting running code from it directly).
This framework has been used to produce executable formal semantics for the C language (!!!!), published at POPL'12 (a very interesting but maybe a bit difficult to understand paper).
A much easier but slightly outdated primer into the K Framework is available, showing how the framework works for a simple calculator language - should be much easier to follow (and I'm more than glad to help with it). Figure 4, on p. 19 shows the whole input for defining the grammar of a simple calculator and its semantics - have a quick look at it, to see how simple it is.
There's a (hopefully up-to-date...) set of tutorial videos.
The tools of the K Framework are available from the K Framework's Tools Github page.
A paper describing the translation of FSP into another formal language (Lotos) may
be helpful in understanding how FSP is supposed to work when unsure
(mainly a note to myself).
You need to be able to think abstractly
and be very, very methodical - giving super attention to detail.
Initial steps:
We have an informal definition of the Xcd language semantics, described in English and pseudo-code in Mert Ozkaya's PhD thesis. Mert has also written the tool that translates Xcd designs into the Spin tool's language so that they can be verified. This tool is supposed to have followed the English language Xcd semantics but it's quite possible that some errors have been done (or indeed that the original semantics were not always correct - we're only humans...).
When you write a Java program, you compile to see if there are any
grammar/type errors, and, if not, you run it with different input
values to test that it's correct.
When you produce a UML diagram, how do you know that it's correct?
You don't... :-(
We have created a new language to describe architectures of software
systems and their behaviour, called Xcd (pronounced "exceed"). Here's
the Xcd
home page.
This language allows to specify architectural designs in a textual
form, as if you were programming. It also allows you to check these
designs for errors. There's a tool that translates the designs
into another language that is easier to check automatically
(through an external tool called Spin - a model checker, see What is Spin).
An article describing the Xcd language has been published at
the CBSE'14 conference:
Issue:
Spin, is not just a tool
- it has so many configuration options that it'd be more right to call it a family of tools.
When compiling its verifier, pan (Protocol ANalyser), it can use a
number of options: http://spinroot.com/spin/Man/Pan.html#D1, e.g., -DBITSTATE, -DBFS,
-DNOREDUCE, etc.
The code is then compiled selectively according to these directives
- whether it'll be exhaustive or bitstate exploration, depth-first or
breadth-first, with or no partial-order reduction, etc.
In a perfect world, I'd like to refactor its code entirely, so that instead of selecting code based on these directives I can use C++ Template Metaprogramming to do the selection.
As a first step, it'd be nice if we could:
enum class search_type { exchaustive, bitstate }; #ifndef BITSTATE search_type search_type_val = search_type::exhaustive; #else search_type search_type_val = search_type::bitstate; #fi(same for the other options).
/* Your job! Set of classes that when instantiated at the end will print the list of directives used at compilation */
What's needed:
This will allow the compiler to compile the *whole* code, not just the one that has been selected by the directives.
This will allow us to look for bugs in Spin, some of which could be masked now. I'm already developing some crazy C pre-processor macros to facilitate this task (one must have fun from time to time...;-).
The whole code? Well, not exactly... You see, Spin contains code inside strings (in files called pangenX.[ch]), that it uses to produce another executable called pan (Promela ANalyzer) given some model - it's the pan executable that actually checks the model. Obviously, this code is never seen by the compiler when Spin is compiled - only seen when compiling pan. In itself that's not too bad, as one can refactor it as we've just described for the Spin code and check for bugs. But some of this code is quite likely similar to the code that Spin itself uses - Spin can simulate a model (i.e., execute one of its execution paths), while pan model-checks it (i.e., executes all its execution paths).
So it'd be really, really nice if we could find a way to refactor this code so that common parts are only written once and so forth. This may require using C++, especially lambda functions and/or template meta-programming. It might also require some crafty C pre-processor macros and shell scripts to turn the same code into a string version (for pan) and a normal version (for Spin).
What's in it for you? You can demonstrate that you can understand and modify a complex piece of software and do so in a systematic manner. Employers with brains would appreciate that, cause that's the majority of what professional programmers do - few write programs from scratch.
Issue: The code that Spin has internally to be able to simulate models is quite similar to the C code it produces to verify models. The verification code can also execute fragments of C code that can be included in Spin's models (as they're getting compiled along with everything else), while the simulator cannot execute them.
We'd like a version of Spin that enables the produced C program to also simulate the model - one small step towards merging the two facets of Spin.
Even better if the produced C program can support all of Spin's parameters (so one can use it as a full replacement for Spin when simulating a model).
(or use hobbes to execute the checker code directly inside Spin...)
There's a tool called Spin, that takes a model and checks it automatically.
Developed at Bell Labs, received the very prestigious ACM System Software award in 2002, its developer now heads a research lab at NASA.
https://spinroot.com/spin/whatispin.html
Spin's written in C. Plain, good, old C. No classes/generic code - simple... (he, he, he).
We'd like to modify Spin itself so that it better supports our Xcd language.
More info (+ binaries and source code) on Spin can be found here: https://spinroot.com/spin/whatispin.html
/* receive a message consisting of three data values from channel chan1 and assign these into d1, d2, d3. */ chan1 ? d1, d2, d3;
You can now also do:
/* only receive a three valued message and assign the first value to d1 if the second value in the message is equal to 3 and the third value is equal to the current value of d4 */ chan1 ? d1, 3, eval(d4);
We need:
/* only receive a three valued message into d1, d2, d3, if its values satisfy the predicate after the : character, where $i is the i-th message component */ chan1 ? d1, d2, d3 : ($1 * $2 == $3 + d4);Note: Spin has different types of message reception.
// https://spinroot.com/spin/Man/receive.html c ? m; /* receive the first message in the buffer if it matches m */ c ?? m; /* receive any matching m non-deterministically */ c ? < m >; /* receive the first message in the buffer if it matches m but don't remove it from the channel buffer */ c ?? < m >; /* receive any matching m non-deterministically but don't remove it from the channel buffer */ // https://spinroot.com/spin/Man/poll.html c ? [ m ]; /* poll channel to see if you could receive the first message in the buffer if it matches m */ c ?? [ m ]; /* poll channel to see if you could receive any matching m non-deterministically */
/* select(i, min, max); */ i = min; do /* Non-deterministic loop, can choose any value between min and max */ :: i < max -> ++i :: break od
We'd like a select(i, min, max, predicate) - assign i a value between
min/max that satisfies predicate; block if no such value exists. The latter part is important (and difficult) - the statement shouldn't be executable at all if there is no value within the range for which predicate is true.
And of course, if the predicate is true for some value then it
shouldn't block, so the following "implementation" is wrong:
/* select(i, min, max, pred); */ i = min; do /* Non-deterministic loop, can choose any value between min and max */ :: i < max -> ++i :: pred(i) -> break /* Bug: if pred(i) is true, one may choose to advance i nevertheless and get stuck in values of i for which pred(i) isn't true. We could store the last value of i that satisfies pred(i) */ odAnother attempt - but is it correct?
/* select(i, min, max, pred); */ i = min; last_i=min-1; do /* Non-deterministic loop, can choose any value between min and max */ :: pred(i) -> last_i = i ; /* Store the last value of i that satisfies pred(i) */ if :: i < max -> ++i :: break fi :: else -> if :: i < max -> ++i :: (i == max) && (last_i != min) -> i = last_i ; break fi od
We can then modify the Xcd translator to take advantage of these features (especially the generalized message reception, which should help decrease the state space considerably).
This would also help demonstrate the usefulness of these features by
comparing how the two versions of the translator behave
memory/time-wise.
Note: The following project ideas may depend on the semantics project above as the current Xcd tool may not be able to treat the cases explored - so I consider them risky and would rather have someone complete the Xcd semantics project first.
Here is a paper discussing how to represent the patterns in the Gang of Four book using aspect-orientation. https://www.cs.ubc.ca/labs/spl/papers/2002/oopsla02-patterns.html
The code from that was here: https://hannemann.pbworks.com/w/page/16577895/Design%20Patterns It's still accessible through the Internet archive https://web.archive.org/web/20090826161450/http://hannemann.pbworks.com/Design+Patterns
Along with a PhD student here at City we're developing a language to describe architectures. All related publications, the tool, and some case studies are here: https://www.staff.city.ac.uk/c.kloukinas/Xcd/index.html
The goal is to take the patterns described in POSA (vol 1) and describe them in Xcd.
POSA books: https://www.dre.vanderbilt.edu/~schmidt/POSA/
Similar to the patterns project - the goal here is to describe some algorithms, like (from C++): swap, sort, find, etc. Also map-reduce (don't think that the Xcd language can support it now but it's good to try).
There are two ways to describe a protocol: a decentralized one, where you give the behaviour of each participant, and a centralized one where you describe at each state which participant can move and what the next state would be.
The Xcd language supports the former.
We'd like to extend it so as to be able to describe a property for such a protocol, which would be expressed as a centralized behaviour.
Concerns language design, and compiler changes.
The current version of the Xcd language does not have any notion of time (like FSP).
But in many systems time needs to be modelled.
So we'd need to extend the syntax to include clocks and expressions on them, and then extend the compiler that translates Xcd models into the language of a model-checker called Spin.
More info (+ binaries and source code) on Spin can be found here: https://spinroot.com/spin/whatispin.html
Note - Spin itself does not support time.
There are two groups that have tried to extend it so that it does:
We'd preferrably like to use the first extension, as it can support more than time.
Looking at the syntax extensions that these two have done for Spin, would be a good way to start on deciding how to extend the syntax of Xcd.
You don't need to understand the underlying theory about model-checking time (well, not all of it).
Like previous projects, it involves working on the syntax of the language and on its compiler.