Second Coursework for Internet and Web Technologies module (2019/20)


The purpose of this coursework is to help you learn about using server-side PHP to process XML documents and return results over the web. The coursework will be assessed and counts 10% of the final mark for this module.

The task

DBLP is a web site providing information about the publication record of academics in Computer Science. For example, it includes my publication record. This information can also be retrieved as XML. Although the web site rewrites the specified URI (so what you see in the address bar in different), the actual URI used for my record is https://dblp.org/pid/w/PeterTWood.xml. Between pid/ and .xml in the URI is the person identifier (pid), in this case w/PeterTWood.

The following table gives the pid values for a number of colleagues in Computer Science:
PersonpidXML file
Alexp/APoulovassilisPoulovassilis-Alexandra.xml
Andreac/AndreaCaliCali-Andrea.xml
Janh/JanHiddersHidders-Jan.xml
Markl/MarkLeveneLevene-Mark.xml
Nigelm/NigelJMartinMartin-Nigel.xml
Peterw/PeterTWoodWood-Peter.xml
Also included in the table above are links to XML files for each person. You should save these to the directory where you plan to develop your solution. Here is a zip file containing all of them. We could retrieve the above files from the DBLP server when needed, but it's better if you store them in your web server space to avoid overloading the DBLP server.

You should take time to study the structure of these XML files and note the following:

We are interested in finding out the numbers of co-authorships and co-editorships for subsets of those members of the department listed in the above table. For example, if we are interested in Alex, Andrea, Mark, Nigel and Peter, we should get the following table:
AlexAndreaMarkNigelPeter
Alex 152 8 19 9 27
Andrea 8 133 0 1 8
Mark 19 0 180 0 3
Nigel 9 1 0 25 1
Peter 27 8 3 1 79
Here we see that Alex and Peter, for example, have been co-authors and co-editors of 27 publications (note that the entries in the table are symmetrical). For example, if you look at the XML file for my publications, you will see that the first publication includes both Alex and me as authors (among others), as indicated by our pid attribute values. The entries on the "diagonal" in the above table represent the number of publications a person has authored or edited with themselves, i.e., the total number of their publications.

We might also be interested only in publications that appeared in the PODS (Principles of Database Systems) venue. In this case the result will be as follows:
AlexAndreaMarkNigelPeter
Alex 0 0 0 0 0
Andrea 0 2 0 0 0
Mark 0 0 0 0 0
Nigel 0 0 0 0 0
Peter 0 0 0 0 2
We see that none of the people in the table published a PODS paper with anybody else in the table.

A user should be able to specify for which people and for which venue they wish to see the results using a form on an HTML page. The information they enter should be processed by PHP and the results returned as an HTML page. Your solution should work in any browser since the HTML will comprise only a very simple form. The techniques you need to use are discussed in the material on Server-side processing and that on XPath in the Extensible style language part of the module. Extra information is given below.

The tasks you need to perform are as follows:

  1. Create a web page containing an HTML form which will allow a user to select a subset of the 6 people in the first table above, as well as specify a venue of publication (only an exact match with the venue entered is required to be implemented). I suggest that you use checkboxes for the people and a text box for the venue. Examples of venues in the data are BNCOD, ICDT, PODS and VLDB (for conferences), and Comput. Networks, JOCCH and PVLDB (for journals), along with very many others.
  2. The returned HTML page should have a heading which includes the venue entered in the form by the user, if any.
  3. The table in the output should include a header row which includes the names of only those people selected by the user.
  4. The table should include rows for only those people selected. The data cell for row X and column Y should contain the number of publications which person X and person Y have co-authored or co-edited
    1. in total, if no venue is given by the user, or
    2. for only the venue entered by the user.

Handing in the coursework

The deadline for submission is 6pm on Tuesday 28th April 2020. Please submit the coursework via Moodle as a single zip file containing your HTML and PHP files. You should not submit any instructions or explanations in a separate file. Instead, the interface should be self explanatory and the code should be commented appropriately.

Remember that plagiarism is taken very seriously by the Department and the College (see the relevant section in your programme booklet). By submitting the coursework, you are implicitly declaring that your coursework is entirely your own work, except where explicitly stated otherwise. (Of course, you are welcome to reuse code presented during lectures; any other code that is not yours should be acknowledged in comments.) Your submission may be submitted to an online plagiarism detection service. The College's disciplinary procedure will be invoked in any cases of suspected plagiarism.

The College policy with regard to late submission of coursework is described in the MSc/MRes programme booklet. No extensions will be granted. The cut-off date for submissions is 6pm on Tuesday 5th May 2020. Submissions after this date will not be marked. Those submitted after 6pm on the 28th April and before 6pm on the 5th May, where mitigating circumstances are not accepted, will receive a maximum mark of 50%.

Marking guide

Your PHP program should be properly structured and should include comments and some simple error checking.

Marks will be awarded out of 20. The areas in which marks will be awarded and the maximum mark possible in each case are as follows:
PHP code structure and comments2
error handling in the code1
part 12
part 21
part 32
part 4(a)6
part 4(b)6
Full marks for the first 2 items above will not be awarded if only a partial solution is provided for the other parts.

Comments on your coursework, along with the mark you were awarded, should be returned to you within 4 weeks of the cut-off date.

Hints and useful information