NEWSLETTER

From the AAR President

Announcement of CADE Elections

New Journal

Annual Logic Summer School

IJCAI

Call for Papers - RTA 2005

Postdoc Position

The Nature of Mathematical Proof

The Paradox of the Case Study

Now that you have recovered from a relaxing summer,
I know you will be eager to take on the challenge of writing papers,
preparing tutorials, and attending conferences.
This *AAR Newsletter* provides numerous opportunities for meeting
that challenge.
For a different kind of challenge, I recommend
that you read Alan Bundy's article on "The Paradox of the Case Study."
I hope that this is the beginning of a series of discussions
about the value of such case studies--not just for Ph.D. candidates
but for the field of automated reasoning in general.

E-mail: afelty@site.uottawa.ca

An election of CADE trustees will be held soon and all AAR members will receive a ballot. The following people are currently serving as trustees of CADE Inc.:

Franz Baader (CADE-19 Program Chair, elected 10/2003)

David Basin (IJCAR 2004 Program Co-chair: term expired)

Peter Baumgartner (elected 10/2003)

Maria Paola Bonacina (Secretary, appointed 9/1999, resigned 5/2004)

Gilles Dowek (elected 9/2001: term expired)

Amy Felty (Secretary, appointed 5/2004)

John Harrison (elected 9/2001: term expired)

Michael Kohlhase (elected 10/2000, reelected 10/2003)

Neil Murray (Treasurer)

Robert Nieuwenhuis (CADE-20 Program Chair)

Frank Pfenning (President, elected 10/1998, re-elected 9/2001: term expired)

Andrei Voronkov (Vice President, CADE-18 Program Chair, elected 10/2002)

Toby Walsh (elected 10/2002)

David Basin (IJCAR 2004 Program Co-chair: term expired)

Peter Baumgartner (elected 10/2003)

Maria Paola Bonacina (Secretary, appointed 9/1999, resigned 5/2004)

Gilles Dowek (elected 9/2001: term expired)

Amy Felty (Secretary, appointed 5/2004)

John Harrison (elected 9/2001: term expired)

Michael Kohlhase (elected 10/2000, reelected 10/2003)

Neil Murray (Treasurer)

Robert Nieuwenhuis (CADE-20 Program Chair)

Frank Pfenning (President, elected 10/1998, re-elected 9/2001: term expired)

Andrei Voronkov (Vice President, CADE-18 Program Chair, elected 10/2002)

Toby Walsh (elected 10/2002)

The terms of David Basin, Gilles Dowek, John Harrison, and Frank Pfenning are expiring. The position of David Basin is taken by Robert Nieuwenhuis, as CADE-20 program chair. The position of Maria Paola Bonacina is now taken by Amy Felty. We must also replace Harald Ganzinger. Thus, there are four positions to fill.

The following candidates were nominated:

David Basin
nominated by T. Walsh and R. Constable

Gilles Dowek
nominated by F. Pfenning and C. Kirchner

Juergen Giesl
nominated by B. Gramlich and C. Lynch

Rajeev Goré
nominated by D. Galmiche and J. Slaney

Hans de Nivelle
nominated by F. Baader and P. Baumgartner

Geoff Sutcliffe
nominated by S. Schulz and W. McCune

Cesare Tinelli
nominated by A. Voronkov and M. Kohlhase

**Statement by Maria Paola Bonacina**

It is a pleasure and honor to be nominated for the CADE Inc. board of trustees. CADE is in many ways my home conference, that I attended for the first time back in 1986, right after finishing my undergraduate degree, and I have attended most of the times since then. Automated deduction is at the heart of my research programme, including definition, design, implementation, evaluation, both empirical and machine-independent, of theorem-proving methods: more details can be found on my web page. Being a CADE trustee is about service and leadership, or leadership understood as service in the highest sense. It is about listening, remembering, foreseeing consequences, taking responsibilities. To that I would bring the rare blend of experience and institutional memory that comes from having served as secretary of AAR (1997-2004), secretary and ex-officio trustee of CADE (1999-2004), chair of the steering committee of FTP (1999-2003), and coordinator of the steering committee of IJCAR (2002-2004). As secretary of AAR and CADE, I contributed to increase membership by more than 100%, I extended the Newsletter with summaries of CADE workshops and PhD thesis abstracts, I advocated growth of funding of the Woody Blesdoe Student Travel Award, I was involved with the reform that made CADE democratic, and I handled five trustee elections. As ex-officio trustee, FTP president and IJCAR coordinator, I promoted systematically the IJCAR concept, that combines unity with respect for diversity. I foresee CADE and IJCAR alternating fruitfully for many years to come. My vision for either conference is to strive for scientific excellence, nurturing creativity, cross-fertilization of ideas, openness to both new theoretical paradigms and new applications, preserving historical ties with neighbour conferences, chiefly within FLoC, while being open to new ones.

**Statement by David Basin**

I have been working in automated deduction for roughly 18 years, on a range of topics including logics, logical frameworks and metalogics, inductive theorem proving, decision procedures, and applications. Currently much of my work centers around theorem proving in higher-order logic and model checking applied to problems in information security.

Over the last decades, there has been a growing interest in different facets of deduction, which although desirable, has been accompanied by a fragmentation of research communities. I would like to help CADE retain its status as the premier conference in Automated Deduction both by helping to ensure its standards and the topicality of its scope. To reduce fragmentation, I am strongly in favor of continuing the successful IJCAR series, as well as the participation within FLOC.

**Statement by Gilles Dowek**

In my previous statement, three years ago, I emphasized two points. First, the need to develop community structuring events such as the IJCAR conference. Second, the need to strengthen the links of our community with closely related communities, in particular the proof checking and proof theory communities.

I think that in the last three years a lot has been done in these two directions by many people in the CADE community and in the steering committee. I think that this effort must be continued, in particular with participation in IJCAR and FLOC.

Comparing the situation today with the situation three years ago, I see a third point we should work on: the development of automated theorem proving outside Europe and North-America. In my opinion, this would not weaken automated theorem proving in Europe and North-America, but strengthen it.

**Statement by Juergen Giesl**

CADE is and should remain the major conference for research on automated deduction. Therefore, I think it is important also to attract researchers from neighboring fields and to keep the scope of CADE as broad as possible. CADE should be open to any aspect of automated deduction and it should cover everything from theoretical results and practical contributions (e.g., implementation techniques and system descriptions) up to applications of automated reasoning. Attracting more submissions on applications may for example be achieved by inviting suitable PC-members and by selecting invited speakers from application areas. In general, I think that the current form of organizing CADE (sometimes as a stand-alone event, sometimes as IJCAR, and sometimes within FLoC) is very good. In addition, we may also consider co-locating CADE with other conferences on potential application areas.

I would be very happy to join the board of CADE trustees in order to help keeping CADE successful in the future. More information about me is available on my web page.

**Statement by Rajeev Goré**

I was a programme co-chair of the first IJCAR and am a very strong supporter of IJCAR, particularly as part of FLoC. I believe that it is important to let the sub-communities have their own meetings in non-IJCAR years, if they so desire, since this often opens up new problem areas which eventually become a part of automated reasoning.

With various colleagues, I have worked on modal tableaux (TABLEAUX), interactive theorem proving (TPHOLs), proof-theory (CSL, CADE), and term rewriting (CSL). I therefore believe that I have a fairly broad perspective on automated reasoning. I particularly welcome the move towards accepting more papers on implementations, as long as they are of a standard judged as acceptable by experts.

I lament the low quality of reviews returned by some of our colleagues. I believe that you should not accept to be on a PC if you are too busy to write good reviews or too busy to find colleagues who can write them for you.

**Statement by Hans de Nivelle**

My research contributions are in the area of resolution refinements, resolution decision procedures, implementation, and proof transformations.

In our field, we currently have a trend towards diversification. Instead of trying to solve the big fundamental undecidable problems, researchers now work on methods for specialized problems with lower computational complexity. I support this trend, and hope that it will bring new applications, and nice theoretical results.

On top of the diversification by subject, there has been a fragmentation by ideology in the last ten years, which I think is harmful. Therefore, I support all attempts to join the main conferences, or at least colocate them as often as possible.

In our field, researchers working on implementation do not get the credits that they deserve. Successful implementation requires a sense for the practical that is hard to make explicit, and therefore hard to judge by theorists. One could consider having implementors as invited speakers, and ensuring that implementation papers are always reviewed by a mixture of theorists and implementors. I do not favour creating a different category for implementation papers on conferences.

**Statement by Geoff Sutcliffe**

I believe that the advancement and application of automated reasoning is dependent on recognizing and supporting the synergetic interaction of theoretical progress, high performance implementation (techniques), and application requirements. It is not possible for work in any one area to prosper in a vacumn without the others. I strongly support research and development that leads to empirically successful implemented systems. I expect CADE to remain the major forum for the presentation of new research in all aspects of automated deduction, and as such must reflect the depth, breadth, and interaction of all aspects of automated reasoning. Such an approach allows a broad automated reasoning community to use CADE as the preferred conference for the presentation of research results. I support CADE's efforts to provide financial assistance that enables young and poor researchers to attend the conference. I am in favour of maintaining CADE's leading role in the IJCAR and FLoC conference series.

Personal background: I started my academic career at Edith Cowan University in 1986, taught at James Cook University from 1992 to 2001, and am now an Associate Professor at the University of Miami. I am probably best known in the automated reasoning community for the CADE ATP System Competitions, which I have been (co-)organizing since 1996, and my ongoing maintenance of the TPTP Problem Library. Additionally, my "ARTists" research group is active in several aspects of automated reasoning. I have been attending CADE since CADE-10 in 1990, and have hosted CADE twice - in Australia in 1997 and in Miami in 2003. I have been on the CADE program committee four times, and have organized three CADE workshops. My web page provides all the details.

**Statement by Cesare Tinelli**

My main research interest is automated reasoning, with a focus on reasoning modulo built-in theories--theories implemented by means of specialized reasoners. I have been particularly active in the combination of decision procedures and their integration into general purpose calculi. I have served as a PC member or co-organizer for a number of AR related events (see my web page for a complete list), and I am currently a steering committee member of FTP and FroCoS. I'm also a coordinator of the SMT-LIB initiative, whose main goal is to establish a library of benchmarks for satisfiability modulo theories (SMT).

As a trustee I would push for CADE to continue its efforts in attracting papers over the whole spectrum of AR research, with a renewed emphasis of attracting more work on applications. One way to achieve this at the trustees level could be to establish a set of (non-strictly binding and renewable) guidelines for future CADE PC chairs on the composition of the PC, the selection of invited speakers, and the type of solicited submissions. On a personal level, I would use my ties to the growing SMT community--which seems to be moving more and more toward forums such as CAV, CP and SAT--to encourage a continued flow of CADE submissions from that community.

As a current member of the FTP and FroCoS SC's I strongly support the idea of having a larger comprehensive event like IJCAR on a regular basis, to avoid an excessive fragmentation of the AR community into several smaller events. Consistently with that, I also support the participation of CADE to FLoC as IJCAR, not as a stand-alone conference. As a CADE trustee I would work toward achieving a better interaction and coordination between CADE and other AR events such as TABLEAUX, FTP, FroCoS and so on, especially for what concerns matters related to IJCAR and FLoC. In this respect, I am in favor of establishing a permanent IJCAR SC, made of representatives of the various events, with the main goals of coordinating the organization of IJCAR and the interaction with FLoC.

A new journal, *Logical Methods in Computer Science,*
will open to submissions on September 1, 2004.
The journal, which
will be devoted to all theoretical and practical topics in
computer science related to logic in a broad sense,
will be freely available on the
Web.

The Australian National University will host the annual Logic Summer School December 6-17, 2004. The School will consist of short courses on aspects of pure and applied logic taught by experts from Australia and overseas. In addition, time will be set aside each day for practical classes, discussions, and software demos.

The school is suitable for IT professionals using formal methods or problem-solving technologies; teachers who teach logic whether in computing, mathematics, or philosophy; and students who are going on to do research in logic or a related field;

The cost of the school is $1,650/person (GST inclusive). Students in full-time education are eligible for a reduced fee of $120/person (GST inclusive). Scholarships are also available for a limited number of students.

The closing date for early registrations is October 29, 2004. Registrations received after this date are subject to 20% surcharge.

For additional information about this course, please contact Ms Diane Kossatz, the Automated Reasoning Group, phone: 61 (02) 6125 8630, Fax: 61 (02) 6125 8651.

The Artificial Intelligence community in the United Kingdom is pleased to host the Nineteenth International Joint Conference on Artificial Intelligence (IJCAI-05) in Edinburgh, Scotland, July 30-August 5, 2005. For full details and the most up-to-date information regarding IJCAI-05, please see the Web site.

**Call for Papers and Posters**

The IJCAI-05 Program Committee invites submissions of full technical papers for IJCAI-05. Submissions are invited on substantial, original, and previously unpublished research on all aspects of artificial intelligence. Papers reporting work of high quality and high promise, but which is found by the program committee to be insufficiently mature for publication in the conference proceedings, may be considered for inclusion in the poster session track. Authors that wish to use this option must explicitly state so in the submission and submit to the poster track in parallel.

The IJCAI-05 Program Committee also invites submissions for posters. These are intended for the presentation of work that meets the high standards of the IJCAI conference but that is more topical and preliminary than the work presented in formal papers.

*Important Dates for Technical Papers and Posters:*

February 1, 2005: Electronic Paper and Title Page Submission Deadline

February 15, 2005: Electronic Poster Submission Deadline

April 1, 2005: Paper and Poster Author Notifications Sent

April 15, 2005: Camera-Ready Copy Deadline

August 2-5, 2005: IJCAI-05 Technical Sessions

February 15, 2005: Electronic Poster Submission Deadline

April 1, 2005: Paper and Poster Author Notifications Sent

April 15, 2005: Camera-Ready Copy Deadline

August 2-5, 2005: IJCAI-05 Technical Sessions

**Call for Workshop Proposals**

The IJCAI-05 Program Committee invites proposals for the Workshop Program, to be held July 30-August 1, 2005, immediately prior to the technical conference. IJCAI-05 workshops provide an informal setting where participants discuss specific technical topics in an atmosphere that fosters the active exchange of ideas. Workshops at the boundaries between subareas of AI, and between AI and other fields, are particularly encouraged, as are those that focus on new and emerging topics or on applications. In order to encourage interaction and a broad exchange of ideas, each workshop is limited to 40 participants, with ample time allotted for general discussion. Attendance is limited to active participants only.

Workshops can vary in length, but most will last a full day. Workshop attendees need not register for the main IJCAI conference but are encouraged to do so.

*Important Dates for Workshops:*

October 15, 2004: Proposal Submission Deadline

November 15, 2004: Acceptance Notification

December 15, 2004: Deadline for Posting of Call for Participation

January 15, 2005: IJCAI-05 Workshops Programme Announced

May 5, 2005: Camera-ready Workshop Notes Deadline

July 30-August 1, 2005: IJCAI-05 Workshops

November 15, 2004: Acceptance Notification

December 15, 2004: Deadline for Posting of Call for Participation

January 15, 2005: IJCAI-05 Workshops Programme Announced

May 5, 2005: Camera-ready Workshop Notes Deadline

July 30-August 1, 2005: IJCAI-05 Workshops

**Call for Tutorial Proposals**

The IJCAI-05 Program Committee invites proposals for tutorials, to be held July 30-31, 2005, immediately prior to the technical conference. Tutorials should serve one or more of the following objectives:

- Motivate and explain a topic of emerging importance for AI
- Survey a mature area of AI research and/or practice
- Provide instruction in established but specialized AI methodologies
- Present a novel synthesis combining distinct lines of AI work

To broaden and improve the topic coverage provided, we also invite suggestions as to what tutorial topics and presenters might be welcome.

*Important Dates for Tutorials:*

November 1, 2004: Proposal Submission Deadline

December 3, 2004: Acceptance Notification

January 1, 2005: Title, Abstract, and Speaker Biography Deadline

May 23, 2005: Syllabus and Course Handouts Deadline

July 30-31, 2005: IJCAI-05 Tutorials

December 3, 2004: Acceptance Notification

January 1, 2005: Title, Abstract, and Speaker Biography Deadline

May 23, 2005: Syllabus and Course Handouts Deadline

July 30-31, 2005: IJCAI-05 Tutorials

**Call for Nominations for IJCAI-05 Awards**

At each conference, IJCAI presents awards to distinguished members of the AI community.

- The IJCAI Award for Research Excellence is given to a scientist who has carried out a program of research of consistently high quality, yielding several substantial results.
- The IJCAI Computers and Thought Award is presented to an outstanding young scientist in the field.

Nominations for these two awards are solicited from the AI community at large, with two reference letters being required in support of the nominations.

*Important Dates for IJCAI-05 Awards:*

October 15, 2004: Names of Nominees Due

November 15, 2004: Nominations Due

August 2, 2005: IJCAI-05 Computers and Thought Award Presentation

August 4, 2005: IJCAI-05 Research Excellence Award Presentation

November 15, 2004: Nominations Due

August 2, 2005: IJCAI-05 Computers and Thought Award Presentation

August 4, 2005: IJCAI-05 Research Excellence Award Presentation

The 16th International Conference on Rewriting Techniques and Applications (RTA 2005) will be held April 19-21, 2005, in Nara, Japan, as part of the Federated Conference on Rewriting, Deduction, and Programming (RDP).

RTA is the major forum for the presentation of research on all aspects of rewriting. Typical areas of interest include the following:

- Applications: case studies; rule-based (functional and logic) programming; symbolic and algebraic computation; theorem proving; system synthesis and verification; proof checking.
- Foundations: matching and unification; narrowing; completion techniques; strategies; constraint solving; explicit substitutions; tree automata.
- Frameworks: string, term, and graph rewriting; lambda-calculus and higher-order rewriting; proof nets; constrained rewriting/deduction; categorical and infinitary rewriting.
- Implementation: compilation techniques; parallel execution; rewriting tools.
- Semantics: equational logic; rewriting logic.

Submission categories include regular research papers and system descriptions. Also problem sets and submissions describing interesting applications of rewriting techniques will be very welcome. Authors should send the title and an abstract of their submission by November 12, 2004; full papers are due November 19, 2004.

Please consult the Web for further details.

The increasing use of computers both within mathematics and to automate mathematical reasoning has raised new questions about the nature of mathematical proof. The meeting "The Nature of Mathematical Proof" will take place October 18-19, 2004, at the Royal Society, London. The meeting is free to all, but registration is required; on-line registration is on the Web.

The meeting will present and contrast the different viewpoints, including experimental mathematics vs mathematical rigor, automated vs human proofs, and formal vs rigorous arguments. A recurrent theme will be, What role does proof play in the way mathematicians learn and think?

The meeting is organized by Professor Alan Bundy; Professor Donald MacKenzie; Sir Michael Atiyah, OM FRS; and Professor Angus MacIntyre.

Speakers, chairs, and panelists include the following:

Richard A. DeMillo, Georgia Institute of Technology

Michael J. C. Gordon, The University of Cambridge Computer Laboratory

Paul J. Cohen, Stanford University

Robert D. MacPherson, School of Mathematics, Institute for Advanced Study, Princeton

Don Zagier, Max-Planck-Institut fuer Mathematik, Bonn

Michael Aschbacher, Department of Mathematics at the California Institute of Technology

Sir Peter Swinnerton-Dyer, Bt KBE FRS, University of Cambridge

Cliff Jones, University of Newcastle

Rod Chapman, Praxis Critical Systems

E. Brian Davies, King's College London

Michael J. C. Gordon, The University of Cambridge Computer Laboratory

Paul J. Cohen, Stanford University

Robert D. MacPherson, School of Mathematics, Institute for Advanced Study, Princeton

Don Zagier, Max-Planck-Institut fuer Mathematik, Bonn

Michael Aschbacher, Department of Mathematics at the California Institute of Technology

Sir Peter Swinnerton-Dyer, Bt KBE FRS, University of Cambridge

Cliff Jones, University of Newcastle

Rod Chapman, Praxis Critical Systems

E. Brian Davies, King's College London

Further information and registration details can be found on the Royal Society Web site or from Suzi White (suzi.white@royalsoc.ac.uk).

The new Radon Institute for Computational and Applied Mathematics (RICAM) of the Austrian Academy of Science in Linz, Austria, offers a postdoc position in the frame of the Theorema Project.

The Theorema Project aims at creating a system that supports the entire process of Mathematical Theory Exploration (inventing mathematical concepts, inventing and verifying propositions, inventing mathematical problems, inventing and verifying algorithms, building up and manipulating structured mathematical knowledge bases, etc.).

**Prerequisites:**

- Ph.D. in mathematics or computer science
- Expertise or, at least, interest in computational mathematics, computational logic, and software development.

Applications (CV, publication list, etc.) should be sent to the Theorema project leader, Professor Bruno Buchberger, buchberger@risc.uni-linz.ac.at.

For information on Theorema see the Web site.

In order to demonstrate the scalability of automated reasoning techniques, it is important to embark on large-scale case studies. As a field, we need to present a reward system for the completion of such case studies. However, there is a paradox at the heart of the case study. It is this paradox, and its resolution, that I intend to explore in this note.

The classic example of the theorem-proving case study is Shankar's proof of Gödel's incompleteness proof using the Nqthm theorem prover [3]. Shankar's reward for this case study was that he received a Ph.D. Note that the theorem prover was developed by third parties: Shankar's supervisors Bob Boyer and J Moore.

It is my contention that any good piece of scientific or engineering research develops a hypothesis and then provides evidence to support (or refute) that hypothesis. What is the hypothesis in the case of a theorem-proving case study? The obvious hypothesis is that the theorem prover is capable of proving hard theorems. The evidence supporting this hypothesis is the automated proof of a hard theorem.

What would be the ideal such evidence? Presumably, that using
the most obvious and direct representation of the theory and the
theorem, that the theorem prover could prove the theorem totally
automatically and within minutes if not seconds. Herein lies the
paradox. If it was that easy, then this work would scarcely be
worthy of a Ph.D. However, if the work was so hard that it *was* worthy of a Ph.D., then the evidence supporting the hypothesis
would be rather thin. If the student had had to work really hard
to find an appropriate representation of the theory and the
theorem, and if the student had had to work even harder to search a huge
search space, guiding the prover at nearly every step, then the
prover would have been shown to be barely capable of proving the
hard theorem, and then only under the direction of an expert in
the field. It seems that the student's success is inversely
proportional to the success of the theorem prover. This is the
paradox of the case study. What industry wants is push-button and
fast technology, but it seems that we are unable to reward case
studies that demonstrate such technology^{1}.

Of course, this analysis is superficial. Any theorem prover has a
range of application: from simple theorems that it can prove totally
automatically, to hard ones at the limit of its capacity, requiring
expert guidance over multiple steps. The case study typically
explores the outer limits of this range: those theorems that are
barely reachable and then only under expert guidance. So a more
sophisticated hypothesis is that the outer limits of the theorem
prover under test exceeds that of rival theorem provers. But then, of
course, the evidence should include unsuccessful experiments to prove
the same theorem using each of these rival provers. This is seldom
demonstrated. Indeed, although the experimenter will usually not attempt any
case studies using rival provers, the champions of many rival
provers frequently take up the challenge, subsequently showing that
their provers are equally, if not more, capable of completing the case
study. This refutation of the implicit hypothesis often occurs *after*
the original student has been rewarded with a Ph.D.

A further problem with this kind of case study is it becomes an awful slog. The student has to put his head down and push on relentlessly for weeks and months, even years, overcoming one difficulty after another, until he finally produces a huge proof, which no one else may ever study in detail. It is easy to get disheartened, to wonder whether it is all worth it or what the point is. The attrition rate for case-study students is much higher than for other kinds of project.

One way out of this paradox is to seek an alternative hypothesis
to those presented in the preceding section. I propose that the case
study be seen as an *analysis* of the system under test. We
are asking not just whether the system can prove this hard
problem but how *easy* the prover makes producing this
proof: what kind of tools does it provide to support the
representation of the theory or the theorem; what kind of
search-control tools help the user guide the proof; what kind of
visualization tools help the user to understand the structure of
proof; what kind of analysis tools help the user to identify
problems and solve them. The student might also try to classify
the types of intervention that were required to guide the proof.
For instance, was it enough to set the parameters of existing
heuristics or tactics, was it necessary to write new kind of
tactics, or was a major overhaul of the theorem prover necessary?
In such an analysis, a large part of the work will be in the * development* of an appropriate hypothesis.

As an exemplar of what I have in mind, I will describe the kind of analysis that we did in Francisco Cantu's Ph.D. project, in which he used our Clam inductive proof planner to verify various hardware algorithms and systems [1]. He kept a record of the kind of intervention that was necessary to guide the proofs. He classified these interventions by the kind of person who would be needed to make such interventions. For instance, choices between existing tactics might be made by the hardware developer within a user company without any deep knowledge of the theorem prover. However, if new tactics were required, then it might be necessary to call in the company's local expert. Finally, if a major overhaul of the prover was needed, then it would probably be necessary to ask the prover's supplier to issue a new version. Francisco tried to demonstrate that most interventions were of a simple nature that could be handled by the user or, at least, within the user's company, and did not require supplier intervention. He further demonstrated that the interventions were front-loaded and lumpy. That is to say, most interventions were required at the beginning of a series of experiments on a new class of algorithms or systems. After tuning the prover to this new class, most subsequent proofs required little or no human guidance. In this way, he was able to develop a usage scenario for the prover, showing that it could be a practical tool within a company doing hardware development.

This kind of analysis is both more sophisticated and more interesting than testing the simple hypotheses that this prover can prove hard theorems, or harder ones than its rivals. It avoids the case study paradox since the effort required from the student is now directly proportional to the value of the analysis that is produced from the case study. Moreover, it is much more interesting work, since the student must continually pay attention, not only to the course of the proof, but to the experience of proving and to the assistance provided (or not provided) by the theorem prover. It is this analysis, rather than the proof, that is the product of the case study. This is much more likely to be useful to the developers of the theorem prover and to the user community, so, unlike the gory details of the proof itself, the analysis will be widely disseminated and appreciated.

In this note, we have argued for a more sophisticated hypothesis to underlie case study research. The simplistic hypothesis that the prover can prove hard theorems--or harder ones than its rivals--leads to a paradox in which the success of the student is inversely proportional to the success of the prover, and where the proof process becomes a relentless and unrewarding slog. The more sophisticated hypothesis consists of an analysis of the strengths and weaknesses of the prover. It consists of a series of small hypotheses: the prover provides good visualization or analysis tools, the prover provides good automatic search control guidance, and so forth. It leads to more interesting research because the student must demonstrate awareness at multiple levels simultaneously: at the object-level of guiding the search for a proof and at the meta-level of the experience of the proof process. It also leads to research that will be more useful to both the developers of the prover and their user community.

Superficially, this discussion seems relevant only to the interactive theorem-proving community. This is not correct. Case studies are also carried out within the totally automated theorem proving community, for instance, EQP's proof of the Robbins algebra conjecture [2]. Note that this proof was not produced from a single run of the prover on the problem. Rather, it took place over a decade in which the proof was divided up into lemmas, the theorem prover was developed and refined, and a series experiments was tried with a wide variety different parameter settings. The kind of analysis proposed above is just as relevant to this kind of case study as to those using explicitly interactive provers. Indeed, we badly need such analytical case studies of automated provers, to better understand and improve their usability.

1. Cantu, Francisco, Bundy, Alan, Smaill, Alan, and Basin, David. (1996). Experiements in automating hardware verification using inductive proof planning.
In M. Srivas and A. Camilleri, eds., *Proceedings of the Formal Methods for Computer-Aided Design Conference, Lecture Notes in Computer Science* no. 1166,
Springer-Verlag, pp. 94-108.

2. McCune, W. (1997) Solution of the Robbins problem. *J. Automated Reasoning,* 19(3):263-276.

3. Shankar, N. (1994) *Metamathematics, Machines, and Gödel's Proof.* Cambridge University Press.

Gail Pieper

pieper@mcs.anl.gov

2004-09-09