In this section I leave some texts to download freely, and which deal with some computer science subjet I was involved in either professionally or for my own interest.
Texts are provided in HTML or PDF, the latter being also given in the DVI format (compressed according to the ZIP format) commonly used in scientific communications: to read them free software is available: Acrobat Reader for PDF files and TeX for DVI files.
Sorry, but much of the material is in Italian, I will try to translate it whenever possible...
Si tratta di brevi note che introducono le principali tecniche crittografiche in modo (spero) semplice anche per un non matematico, toccando vari argomenti in modo ovviamente non approfondito.
Nel 2005, approdato al gruppo di Finanza Quantitativa di Capitalia ho redatto una proposta per uniformare lo stile nei codici sorgenti in C e C#. La metto ora in rete, perché contiene alcune osservazioni generali sulla codifica dei programmi che possono risultare utili, e anche alcuni suggerimenti a proposito delle trappole del C.
The following programs free to download from my pages are given as source code, which everyone can use and modify for his/her own sake, but not for the sake of gain (the GPL license is attached to any source file). Most of these programs are written in C, my favorite language, mainly according to the ANSI 89 standard, available on any compiler: some of these free compilers are available for download on the Net and are listed in this section of my site. To download a program just click the icon or its name.
To write an app for iPhone is as much as difficult as to be invited to a Papal Audience: one needs to subscribe a developer site, to have some Apple computer, to learn Objective-C, to pay 99$ for publication and to hope all is going to work. As an alternative one may use HTML5 features, in particular Javascript API, with which the Safari version in the iPhone is compliant. Useful info on Matt Might Blog. I wrote, as an exercise a small app which plots the graph of a function of one variable x and which allows, with finger gestures, to move and zoom the graph. In addition to algebraic expressions of variable x, some simple commands allow the user to clean the graph and to define variables and functions to be used in expressions. Much can be done to make it better, but it's just an exercise, perhaps useful to some calculus student trying to study a function of a real variable. To install the app just insert into Safari the URL http://www.caressa.it/iphone/funplot.html and, once loaded, from Safari menu used to add a URL to favourites, click on "Add to Home Screen": then an icon with a f(x) should appear on your desktop, ready to be used. More information on http://www.caressa.it/iphone/funplot_help.html. Of course the "app" works also on a usual computer if opened with Firefox (or any other HTML5-compliant browser): in this case use the arrow keys to move the graph and the PGUP-PGDOWN keys to zoom.
It is a library, provided as interface files (.h) along with their implementations (.c), which implements some classic algorithms of Numerical Analysis, namely methods concerning matrices, linear systems and eigenvalues. Initially it was a simple collection of C functions which I wrote while studying for a Ph.D. course (Numerical Calculus, held by Prof. A.Pasquali). I also put here some notes, in Italian!, taken during that course [.dvi version]. Gradually, this collection of functions evolved into an actual library which exports types (vectors, matrices, complex numbers, ...) and which allow to deal with vector and matrices with coefficients in any numerical field, even user-defined: the last version is zipped into this file and it is described in a tutorial.
This is a program I used to assemble these WEB pages from html pieces: starting from a configuration file and by the various pieces it writes both the html source of the pages and a javascript file for the menus (which uses the free package Tigra Menu from SoftComplex). Of course it is an ad hoc program, which I wrote just to simplify the composition and maintaining of these pages, and which scarcely will be useful to other people unless they change it to fit their aims, or just to clone my site.
Another small program I wrote to help in writing WEB pages of my sites, but this may be useful also to someone else: it is an html pre-processor, thus a program which takes one or more text files, parses them and performs some substitutions into their html tags, which allow to use stenographic forms for complicated tags, and also to define tags depending on parameters. Moreover a html file may include other files as well. Of course I designed and implemented it before learning PHP, however it is small and funny.
It's a didactic program which solves systems of linear equations, by means of Gauss elimination algorithm (even non square systems): equations must be written, in the usual algebraic notation, in a text file, and the program shows all steps in the solution, and computes it.
A simple and short program which dumps a directory on a html file, selects the file to print and, possibly, descends into sub-directories; it can be compiled either on Unix(TM)-like systems or on the Windows(TM) operative system (an executable file for Windows(TM) is enclosed in the distribution).
This is a version, working and documented, of a program whose pieces I use since some years to build documentation for my code: of course it has not the power nor the versatility of doxygen, say, but when I started to write it I didn't know doxygen (which maybe did not exist at the time). I don't know whether it can be useful to anyone (different from me), since it strongly relies upon my very way of developing code (starting from source files with only comments and gradually adding code). Anyway, here it is.
It is just a simple program which needs Windows API (SDK Win32 and NOT MFC) which allows to plot the graph of a function with type double f(double) provided in a file to compile jointly to the source code of the program: a simple way to plot even if one does not know Win32 API, but has a compiler which can compile it.
A weird and inefficient small program which needs Windows API (SDK Win32 and NOT MFC) which allows to plot the graph of a function defined implicitely provided in a file to compile jointly to the source code of the program: a simple way to plot even if one does not know Win32 API, but has a compiler which can compile it.
It is just a simple program which needs Windows API (SDK Win32 and NOT MFC) which allows to plot the graphs of one or more functions provided as table of number defined in a finite number of points: it suffices to provide an array double A[n_points][n_values]: a simple way to plot even if one does not know Win32 API, but has a compiler which can compile it.
Contains come simple classes aimed to numerical linear algebra:vector, matrices, symmetric matrices, linear systems which constitute a C++ version of the previous library, with in addition an interface for linear algebra (operations between matrices and all that). It is no longer maintained and it was practically not tested, hence it will be full of bugs: but maybe it can be useful to someone.
It is a Java class implementing an algebraic calculator: one builds an object of this class by initializing it via a string denoting an expression to evaluate; next one can define variables (whose values are strings in turn) and evaluates the expression. Variables behave like macro, but they can also be constant.
It is a Java applet which asks for a function and draws its graph: it contains a parser for algebraic expression, which understand the mathematical notation (for example "3|x|-2log x/sen(x+pi)" instead of "3*abs(x)-2*log(x)/sin(x+3.14)"). The program requires an expression which must contain the "x" variable as unknown (warning: one can join letters to mean multiplication, but separate by one or more spaces: "xe^x" is written as "x e^x", or as usual "x*e^x").
Hereafter I list come books in computer science which I liked, for several reasons, and hence that I strongly suggest to read or to consult: some of them are not recent, but, even if technically backdated, often they are still a worth while reading.
A. Aho, R. Sethi, J. Ullman, Compilers. Principles, Techniques and Tools, Addison-Wesley, 1986.
One of the classic references in compiler writing: it contains many examples on data structures and the theory of formal languages, and excellent exercises. A mine of informations.
M. Arbib, A. Kfoury, R. Moll, A Programming Approach to Computability, Springer, 1982.
Here the computability theory is developed in a way familiar to people interested in computer science: instead of Turing machines, Markov systems or recursive function theory, the arguments are dealt with in terms of programming languages.
J. Bentley, Programming Pearls, Addison-Wesley, 1999.
This is a precious collection of lectures on the art of software developing, distilled in "columns" published on the Communications of the ACM: they do constitute a delighful, brilliant and deep introduction to various aspect of software developing, with examples, suggestions and much more. To be read carefully even if, since it is very well written, one tends to read it like a novel, or a tales collection.
P. Darnell, P. Margolis, Software Engineering in C, Springer, 1986.
It stands, according to me, as the best book to learn how to program with C: it is full of precious suggestions useful to learn any language, and contains a complete project of a C interpreter written in C itself and developed according to the techniques explained in the book.
D. Gries, Compiler Construction for Digital Computers, Wiley, 1972.
A text on compiler writing, very backdated, but still valuable to understand the structure of a compiler and because of its chapters on formal languages, macro processors and data structures.
D.R. Hanson, C Interfaces and Implementations, Addison-Wesley, 1997.
This book shows that all sentences about the supposed superiority of object oriented languages (C++, Java, Modula-3,...) over procedural ones (C, Fortran, Modula-2, ...) are just a fiction: the text consists indeed in a series of modules written in pure C which show how dynamical data structures, threads, exceptions and other things can be performed in an elegant, efficient and useful way in C. To read it is a duty for everyone who wants to implement projects of a certain size in C.
B. Kernighan, R. Pike, The Practice of Programming, Addison-Wesley, 1999.
A text any programmer should carefully read: not only it contains a brilliant presentation of the basic themes of programming (algorithms, data structures, complexity) but it is also rich in very elegant examples in C, C++, Java and other languages, and it explains in details how to face practical problems which happens in the development of software (testing, debugging, profiling etc.).
B. Kernighan, P.J. Plauger, Software Tools.
This book is only in apparency backdated: it is a collection of programs in Fortran, complete and working, which are, after all, the basic ones of the UNIX system: editors, macro processors, text formatters and much more.
B. Kernighan, D. Ritchie, The C Programming Language, Prentice Hall, 1978.
The classic handbook, reference and tutorial on C (the second edition explains the ANSI standard), which is actually a complete book on programming, rich in examples and brilliant in the exposition.
D. Knuth, The Art of Computer Programming, 3 voll, Addison-Wesley, 1968, 1973, 1975.
It needs not presentation: it's just the definitive text on algorithms, data structures and programming, which had several editions. The programs are written in the machine language of the MMIX processor, invented by Knuth just to this aim. It is a text which require some effort, especially to solve exercises, which are very well designed, but it repays a lot. It is not only a beautiful book, but also a well written one, a feature which scarcely is to be found in technical literature.
A. la Mothe, Tricks of the Windows Game Programming Gurus, 2 Voll, SAMS, 1999.
A book which teach how to build video-games (2D in the first volume, 3D in the second volume) for Windows systems; it contains an introduction to Win32 programming, short and efficient, and many other interesting informations on data structures, artificial intelligence and the applications of linear algebra and analytical geometry to computer graphics. It is written in a (maybe too much) informal style, but it is very clear.
M. Minsky, S. Papert, Perceptrons, MIT, 1969.
An old text in artificial intelligence, mathematically rigorous but easy to understand: it develops perceptron theory, one of the most criticized machines of artificial intelligence. In spite of that, they are the ancestors of the famous neural networks, and they have the advantage to have a solid background, even if it establish its limits.
T. Mitchell, Machine Learning, Wiley, 1997.
A concise and complete exposition of the most (to me) fascinating argument of artificial intelligence: learning machines. The discussion on Bayesian methods is a masterpiece.
R. Sedgewick, Algorithms in C, Addison-Wesley, 1990.
A collections of fundamental algorithms and programs which implement them in C (also there are versions of this book written in C++ and Java): practically all the basic algorithms of computer science. Each algorithm is explained in great details and motivated, so that the book is a sort of non theoretical and simplified version of Knuth books, is good both as a reference and also to self-study.
N. Wirth, Algorithms + Data Structures = Programs, Prentice Hall, 1976.
A classic of programming: the text refers to programming in languages of the Pascal family, but I think it is still up to dated, especially in the theoretical aspects, since it introduces the basic tools and the fundamental algorithms, explained in a complete and rigorous way. The source code is among the most elegant which one can find in a book, and the exposition is brilliant.
N. Wirth, Programming in MODULA-2, Springer, 1986.
This is a handbook on the Modula-2 language, but it can constitute an elegant and short introduction to programing tout court. Beautiful, simple and elegant.
Computer languages, as human languages, come and go and are subjected to changes, dialects (not to say extinction). Hereinafter I leave some links on the programming languages I find interesting and aesthetically pleasant, while I do not mention the one I use (as Java) but which I repute just as tools imposed by the fashion of the time.
According to me, C still remains the best general purpose programming language (of course specific problems require specifics languages, and if efficiency is not the crucial issue in a project, one can also use an interpreted language as Perl or Awk). In many recent books, say of the last decade, it is claimed that the programming paradigm which was the framework of C, thus imperative programming, is obsolete and that object oriented programming is the present and future paradigm (even if it goes back to 1967). But I think that object oriented programming is just an avatar of structured programming in an event-driven environment: one can program with public and private data in C too, and create and destroy dynamical objects, and so on.
Instead, object oriented programming is just a technique useful in some applications in which the problem can be formulated in terms of objects: for example in discrete event simulations, where many different objects pertaining to a same class must be generated, or in the developing of operative systems or GUI. In other terms, is just as recursion, a disguise with which, in some contexts, problems naturally appear: but, as it is a nonsense to say that recursive oriented programming (aka functional programming) is better than any other programming scheme, the same apply to object oriented programming, in my opinion. In effects in object oriented languages, like Java, it is impossible not to use objects, even when they are not needed: for example there's no point in defining a class which eventually will contain just one object, but the language, in its cumbersome syntax, obliges to do so!
The core of Java, as an object oriented programming language, is very simple and elegant, but, to make it work on any kind of application, its library has been made a chaotic heap of stuff. Moreover this library evolved in parallel with its implementations, and this makes at last the language heavy and difficult. A rational and authoritative criticism to object oriented programming as an universal programming paradigm is given in a page of Paul Graham's site.
That's why in the sequel I do not list Java, nor C&, nor other popular and used languages, but I dwell upon my aesthetic criteria, providing some informations on languages I like, and which I believe to be elegant, beautiful and, in a word, that I would like to have designed!
Computer programming is a job which may be pleasant and creative, as the composition of novels or symphonies, or also boring like the compilation of bank forms or telephone directories. The most interesting programming activity is doubtless compilers implementation (not the most complicated: video-games are perhaps the most complex programs to develop). According to me, programming, languages and compilers are strongly related matters: here I leave some links on the latter.
Artificial Intelligence Laboratory at M.I.T: a fundamental collection of resources about artificial intelligence and logical and functional programming.
Algol 60 is the star whose light brights over every other imperative language: elegant, simple and rigorous it is an invitation to programming. I think is absurd that it died at the half of '70s and today is considered a dead language not less than Assyrian while dinosaurs like Fortran or Cobol still waste RAMS inside computers in this world. With Algol, in 1958, all syntactic and semantic structures till living in programming languages were born, the same which will be iterated ad nauseam in the following decades: for instance in PL/1, in Pascal, in C, down to Java. Has striking wrote C.A.R. Hoare, a great of the computer science:
Algol 60 was a great achievement; it was a significant advance over most of its successors.
On the other hand, Algol 68 was a language alas born before its age, and prematurely killed by lacking of implementations and users, but with his ashes C++ were fertilized: polymorphism, operator overloading and many other of its features were borrowed from Algol 68. The other spawn of Algol 60 was Simula 67, in which the concept of a class first appeared: in other words, the following equation holds true: Algol 60:Simula 67=C:C++, and Stroustrup applied to C the same procedure Nygard applied to Algol 60 to get Simula 67.
Algol 60 References is the best starting point to explore the world of Algol.
The Algol Bulletin on line! The complete collection of this magazine devoted to Algol, for the joy of scholars and amateurs, with papers written by great names in computer science.
marst - Algol to C translator is an Algol 60 compiler whose target language is C: combining its output with the input of a C compiler, you get an Algol 60 compiler.
Algol 68 Genie it is an interpreter for a subset of Algol 68, developed by Marcel van der Veer. Remarkable, as the references given in the site are.
Actually there's just one programming language, namely the assembly, the one spoken by the machine: all the others are intermediate forms of communication between this one and the human language. I think it is a good investment to spend some time in learning machine language, even if it changes from processor to processor. Consider, just to quote the most outstanding example, that The Art of Computer Programming by D. Knuth, the best programming book ever written (and to be written), uses machine language for its examples.
WEBster is a site full of resources: in particular Art of Assembly Programming, an on-line book on 80x*86 assembly programming starting from scratch, by Randall Hyde, and HLA a free assembler.
The language I use more and which I repute more synthetic, efficient and interesting to program with is C (for example these web pages are set up with the aid of some C programs I wrote to this aim): from the short program with a few lines to the huge project with several thousand of lines of code, C, projected at the middle of '70s, along with the development of the UNIX operative system, still remains the best solution, after assembly of course. Neither C++ nor Java can be viewed as evolutions of it, but rather as degenerations (C++ should substitute it, but it is less efficient, more complicated and too dispersive) or have different aims (Java is a language oriented to WEB and graphical applications).
C is a low level language, and this does not mean that it is worse than other ones, but only that it allows to a deep mastering of the machine on which programs run. Other features of a low level language are:
the possibility to address directly chunks of memory (by mean of pointers);
the possibility to address directly chunks of code (e.g. by mean of the goto statement);
easy casting between different data types which share similar inner representation, even if they are conceptually different;
a strong interaction with the hosting operative system and the possibility to have access to peripheral units;
to have a simple but flexible syntax and an expressive semantics;
For instance C is a low level language since it has pointers, goto statements, implicit casting of numerical data, I/O and graphic machine-oriented libraries, etc. Vice versa, an high level language must have:
Built in data structures (strings, lists, tables, trees, ...);
High level libraries to interact with the hosting operative system: logical devices, windows, ...;
A rigorous syntax and an orthogonal semantics;
For example, Java and Visual Basic are high level languages: the best high level language I know is, among the ones still used, Lisp and its relatives, like Scheme, while the best high level language of all times is the dead and buried Algol 68.
These distinctions are not rigid: a low level language may have data structures, if this does not make its syntax heavy, or it does not add superfluous concepts: it makes no sense to include lists if there are pointers, for example. Analogously, a high level language may have some feature of the low level ones.
And indeed there are many hybrids: for example FORTRAN is not a low level language since it does not include pointers nor a strong interaction with the machine, but it has no data structures too, nor well designed algorithmic structures, since its syntax is prehistoric. On the other extreme we have C++, which contains both low level structures (it includes C as a subset!) and high level structures which may be implemented via the formers (for example it has pointers, lists, vectors, strings, arrays, tables, and so on.)
Moreover, the distinction between low and high level languages deals a lot with the use of these languages, thus it involves the style and the skills of the programmer: a lazy programmer with a poor style should not use a low level language, which makes him/her free to write confused and complicated programs; instead, he/she must adopt a high level language, which forces him/her to a rational programming style and which can bound the damages derived from his/her indolence.
Vice versa a programmer gifted and with a good style could write well-structured, elegant and complex programs in a low level language, with in addition the gain in efficiency: a good programmer eventually passes to assembly language.
Anyway, here I list some superstitions about C, which are obviously false, but which often are stated as a motivation for the usage of C++:
With C one cannot separate interfaces and implementations;
With C one cannot perform data and algorithm hiding;
With C one cannot develop in a rational way large and complicated projects;
With C one cannot write a program aimed to symbolic manipulation;
With C is difficult to handle dynamic data structures;
All that is of course false: what C++ does to accomplish these requirements is to extend C by introducing many unnecessary features, and some dangerous features. Actually, C++ is not a single language, but a family of languages compressed into one. One could transform it into an excellent language by dropping most of C features which it includes (but the tricky motivation of C++ success is just the fact that it does include C, and in this way can borrow millions of lines of already developed code).
The latest version of C++ (the one described in the third edition of Stroustrup's Book) is a complete multi-level language, rather difficult to handle globally, but useful locally: in some sense it is a family of languages, since according to your programming style and software engineering skills, it can be an extension of C, an object-oriented language, a module oriented language, a high level language with polymorphism and generic programming facilities, and much more: it is a great language when its resources are used horizontally, but a huge and nastly machinery when used vertically... However its standard library is well projected, useful and it is its main feature: notice the difference with the chaotic heap of classes which is the Java library!!!
Nevertheless C++ is not satisfactory as it promises to be: for a comprehensive criticism to C++ (and also C) see C++??: A critique of C++ by Ian Joyner. I don't share his viewpoint about Java, but his remarks are interesting and stimulating.
Here I leave some links about C (history, programming resources, Win32 library, free compilers, &c.):
General informations about C
Dennis Ritchie home page: the inventor of C: the page contains precious historical documents, as the code of the first C compiler under a UNIX system).
Bjarne Stroustrup Home Page: the inventor of C++, which of course extol the virtues of his creature on his page, which contains a lot of interesting material.
C programming a page with many resources: programs, compilers, links, books,...
Forth is a language compact and extremely efficient, near to the machine logic but surprisingly elegant: its main feature is to be both interpreted and compiled, in the sense that its basic functions are written as machine code, while the others, which can built upon the former, are compiles by means of pointers, whose scansion's makes very fast the interpretation process.
Moreover, the mechanism of parameters passing to functions is not the usual one, but is implemented directly by leaving such parameters on a stack (which is actually the way compilers implement it) and this makes the language quite efficient, and also explains its notation, bizarre at a first view, thus Polish reverse notations.
Forth Interest Group is the main resource for people interested in Forth: here you'll find documents, compilers, programs and so on.
colorForth is the (old) site of Chuck Moore, Forth designer and implementor (amongst other things).
Lisp
Lisp is a chief-work of elegance, simplicity and theoretical deepness: it was invented by John McCarthy as a notation useful to define in a rigorous way the semantic of the concept of a function evaluation: this notation and its basic principles were so well founded and simple that McCarthy immediately realized that he could write an interpreter for that notation, written in the notation itself!
Lisp is the reification of Church λ-calculus, which is a version of computability theory equivalent to those given by Turing machines. The idea is so simple and genial that Lisp is still alive and well, and it remains the older and not yet surpassed ancestor of a spawn of languages, functional languages, which allow, in many problems whose nature is non numerical, to get efficient and elegant solutions: namely these languages are aimed at solving problems which present themselves in a recursive way.
The nowadays most used version of Lisp is Common Lisp (although Clojure is the state-of-the-art), a complex and huge language (maybe too much), with which one can program to solve any need. Derivatives of Lisp are Scheme and Emacs, and, through the functional language ML, languages of the Caml family. Moreover logical languages, like Prolog and Goedel, are strongly related to Lisp.
John Mc Carthy Home Page: a giant of computer science, inventor of Lisp.
The Gödel Programming Language (a declarative, general-purpose programming language in the family of logic programming languages.)
Modula
Modula-2 is, according to me, the best languages belonging to Pascal family: Pascal was a direct descendant of Algol-W, created for didactic purposes at the ETH of Zurich, by Niklaus Wirth. Pascal was a neat language, simple and compact but not aimed at the development of complex projects. Modula was its successor, which incorporated a mechanism to subdivide programs into modules, thus in independent parts which is possible to compile separately, and which can cooperate using their pieces.
Modular programming is possible also in C, but Modula provide a rigorous syntax to develop projects according to this method and, as C, includes a library of system modules to perform operations on the operative system (I/O, graphic, etc.). Object oriented languages nowadays very spreaded (C++, Java, etc.) proudly provide data hiding and encapsulation, which indeed are synonyms of modular programming. The last version of Modula, thus Modula-3, has been contaminated by object oriented programming and constitutes a huge and inconvenient language, not better than Ada.
modula-2 is a site with many resources: tutorials, links to (mostly free compilers), Win32 API, source code, and more.
www.modulaware.com Oberon-2 and Modula-2 Technical Publication Ubaye's First Independent Modula-2 & Oberon-2 Journal.
ETH Oberon Home page Oberon is the name of a modern integrated software environment for single-user workstations. Oberon is also the name of a programming language in the Pascal/Modula tradition and a highly effective and compact operating platform. The Oberon project was launched in 1985 by Niklaus Wirth and Jürg Gutknecht. While this project was originally targeted towards in-house built hardware, ported versions of the Oberon language and system are now available for numerous commercial platforms.
TeX (pronounced "tek" as in Greek) is a programming languages invented by Donald Knuth which is oriented to the production of perfectly edited text. Thus, the source code consists in a description of the text (a document, an article, a book) and, after compilation, a code is produced in the form of a "dvi" file, which can be used to print or view the document. TeX has completely changed the world of scientific text production and exchange, since it allows to write any kind of data, in particular mathematical formulas, without invoking such complicated and inefficient instruments like Word's Equation Editor just to quote one. The text produced by TeX is at the same level of high quality professional printings, and it outperforms any other word processor.
However, TeX is an actual programming languages, with which one can do amazing things. The most used version is LaTeX, by Leslie Lamport, which is just a huge collection of TeX libraries which are de facto a standard and which simplify life to people which uses TeX just to produce uniform texts, without particular format needs.
Donald Knuth Home Page: the inventor of TeX, METAFONT and CWEB, a system of programming languages aimed at producing professional texts, with no sort of typographical limitation, and he is the author of The Art of Computer Programming, the chief and masterwork of computer science.
TeX Users Group Home Page is the main site about TeX: documentation, compilers, and much more, of course all for free.
XML
XML is a language whose source codes contains both data and the description of these data (or just their description) so to make it simple to retrieve or elaborate them. It is not a programming language, but a description language: nevertheless, its simplicity and generality makes of it a flexible and powerful expression tool. Say, HTML, the language with which WEB pages are written, like the present one which is in its XML version XHTML, is a notation derived from XML, which simplifies it (in turn, XML is a simpler version of SGML, a language judged too general for most of the purposes).
W3C: World Wide Web Consortium maintains all informations on XML and related languages.
In 2001 (from April to June) I have been a consultant at Engineering s.p.a. to work to a project of automatic text classification: I implemented the classic IR-like Rocchio algorithm, a famous and not performant method, and I projected and implemented in C a library whose functions consist in a complete classification system based on the vector space model and which uses the classical Rosenblatt perceptron learning model with the Delta rule by Widrow-Hoff.
In practice, it is a system which, based upon a corpus of documents whose classification is known, learn to correctly classify them and as a such can be employed to classify new documents on the base of what it learned from the given ones (with the term "to classify" I mean to attach labels to documents: for example to reckon the author, the subject of the document and so on); the system I developed does not depend on the language, in the sense that its analysis of the texts is purely statistical and does not use techniques involving the grammar structure of one or more languages: this makes it less performing than systems based upon linguistic analysis, but more flexible.
Hereafter I leave some links on automatic classification, included a link to McCallum page, which contains a system to classify English documents very powerful and free:
Texts, text centres, resources and programs on the Web.
CiteSeer: Computer Science a huge electronic archive of technical papers in computer science, and in particular on artificial intelligence, machine learning and text classification.
Yiming Yang's HomePage: here there are a lot of interesting works by this researcher and a version of the Reuter corpus, a collection of texts used for tests and experiments.
GNU + Cygnus + Windows is a spectacular collection of programs which provides in practice a UNIX system inside Windows (it includes also an implementation of the X graphic system).
www.spychecker.com offers Ad-Aware, a free software which inspects inside your computer looking for intruders.
Steve Gibson maintains a very interesting site on spy-wares, thus those little spy programs which are scattered on the net and which maybe you have inside your computer without knowing it.
Numerical Recipes is the web site of a collection of free books which offer a comprehensive and thorough treatment of numerical analysis from an operational perspective, with Fortran or C programs.
Netlib Repository at UTK and ORNL is a collection of mathematical software, papers, and databases. The Netlib repository contains freely available software, documents, and databases of interest to the numerical, scientific computing, and other communities. The repository is maintained by AT&T Bell Laboratories, the University of Tennessee and Oak Ridge National Laboratory, and by colleagues world-wide. The collection is replicated at several sites around the world, automatically synchronized, to provide reliable and network efficient service to the global community.
DoCIS Documents in Computer and Library & Information Science, is a service of the rclis digital library, which, in turn, is dedicated to promoting free access to data about documents in computing and library and information science.
Kolekcja matematyczno-fizyczna at Biblioteka Wirtualna Nauki in Poland leaves on line sone issues of International Journal of Applied Mathematics and Computer Science (2001-2004).
Marco Liverani's web pages contain a lot of interesting material on computer science, in particular his notes on Unix and Perl.