The purpose of this coursework is to help you learn about using server-side PHP to process XML documents and return results over the web. The coursework will be assessed and counts 10% of the final mark for this module.
DBLP is a web site providing information
about the publication record of academics in Computer Science. For example,
it includes my
publication record. This information can also be retrieved
as XML.
Although the web site rewrites the specified URI
(so what you see in the address bar in different), the actual URI used
for my record is https://dblp.org/pid/w/PeterTWood.xml
.
Between pid/
and .xml
in the URI is the person
identifier (pid), in this case w/PeterTWood
.
The following table gives the pid values for a number of colleagues in Computer Science:
Person | pid | XML file |
---|---|---|
Alex | p/APoulovassilis | Poulovassilis-Alexandra.xml |
Andrea | c/AndreaCali | Cali-Andrea.xml |
Jan | h/JanHidders | Hidders-Jan.xml |
Mark | l/MarkLevene | Levene-Mark.xml |
Nigel | m/NigelJMartin | Martin-Nigel.xml |
Peter | w/PeterTWood | Wood-Peter.xml |
You should take time to study the structure of these XML files and note the following:
dblpperson
.
r
element, and is (usually)
one of an article
(journal article), book
,
incollection
(chapter in a book), inproceedings
(publication in a conference) or proceedings
(the conference proceedings themselves).
author
of an article
, book
,
incollection
or inproceedings
, and an editor
of a book
or proceedings
.
author
and editor
has a pid
attribute
which takes a value such as those in the table above.
journal
element
for an article
, and the booktitle
element for
incollection
, inproceedings
and proceedings
(a book
doesn't have a venue).
We are interested in finding out the numbers of co-authorships and co-editorships for subsets of those members of the department listed in the above table. For example, if we are interested in Alex, Andrea, Mark, Nigel and Peter, we should get the following table:
Alex | Andrea | Mark | Nigel | Peter | |
---|---|---|---|---|---|
Alex | 152 | 8 | 19 | 9 | 27 |
Andrea | 8 | 133 | 0 | 1 | 8 |
Mark | 19 | 0 | 180 | 0 | 3 |
Nigel | 9 | 1 | 0 | 25 | 1 |
Peter | 27 | 8 | 3 | 1 | 79 |
pid
attribute values.
The entries on the "diagonal" in the above table represent the number of publications a person
has authored or edited with themselves, i.e., the total number of their publications.
We might also be interested only in publications that appeared in the PODS (Principles of Database Systems) venue. In this case the result will be as follows:
Alex | Andrea | Mark | Nigel | Peter | |
---|---|---|---|---|---|
Alex | 0 | 0 | 0 | 0 | 0 |
Andrea | 0 | 2 | 0 | 0 | 0 |
Mark | 0 | 0 | 0 | 0 | 0 |
Nigel | 0 | 0 | 0 | 0 | 0 |
Peter | 0 | 0 | 0 | 0 | 2 |
A user should be able to specify for which people and for which venue they wish to see the results using a form on an HTML page. The information they enter should be processed by PHP and the results returned as an HTML page. Your solution should work in any browser since the HTML will comprise only a very simple form. The techniques you need to use are discussed in the material on Server-side processing and that on XPath in the Extensible style language part of the module. Extra information is given below.
The tasks you need to perform are as follows:
The deadline for submission is 6pm on Tuesday 28th April 2020. Please submit the coursework via Moodle as a single zip file containing your HTML and PHP files. You should not submit any instructions or explanations in a separate file. Instead, the interface should be self explanatory and the code should be commented appropriately.
Remember that plagiarism is taken very seriously by the Department and the College (see the relevant section in your programme booklet). By submitting the coursework, you are implicitly declaring that your coursework is entirely your own work, except where explicitly stated otherwise. (Of course, you are welcome to reuse code presented during lectures; any other code that is not yours should be acknowledged in comments.) Your submission may be submitted to an online plagiarism detection service. The College's disciplinary procedure will be invoked in any cases of suspected plagiarism.
The College policy with regard to late submission of coursework is described in the MSc/MRes programme booklet. No extensions will be granted. The cut-off date for submissions is 6pm on Tuesday 5th May 2020. Submissions after this date will not be marked. Those submitted after 6pm on the 28th April and before 6pm on the 5th May, where mitigating circumstances are not accepted, will receive a maximum mark of 50%.
Your PHP program should be properly structured and should include comments and some simple error checking.
Marks will be awarded out of 20. The areas in which marks will be awarded and the maximum mark possible in each case are as follows:
PHP code structure and comments | 2 |
error handling in the code | 1 |
part 1 | 2 |
part 2 | 1 |
part 3 | 2 |
part 4(a) | 6 |
part 4(b) | 6 |
Comments on your coursework, along with the mark you were awarded, should be returned to you within 4 weeks of the cut-off date.
pid
values as identifiers for people
rather than their names (which might include special characters as well as misspellings).
The pid
attribute can appear on either an author
or an editor
element.
GET
method to send values to the server.
This means that checks can be run simply by changing the URI sent.
<input type="checkbox" id="x" name="y" value="z"/>If the user selects this box, then
y=z
will be appended to the URI
(assuming that GET
was used); otherwise nothing will be sent for this
checkbox.
isset
.
$xmlDoc
references an XML DOM document, then
$xpath = new DOMXpath($xmlDoc);will create a new
DOMXPath
class object referenced by $xpath
.
One can then use the query
or evaluate
methods as follows:
$elements = $xpath->query("...");This will return a list of DOM nodes selected by the XPath expression (
...
).
See
DOMXPath::query
and DOMXPath::evaluate
.
$letter = array('a' => 'A', 'b' => 'B', 'c' => 'C');You can then iterate over this array using
foreach ($letter as $lower => $upper) { ... }where
$lower
will be instantiated to each key
and $upper
to each value in turn.
.
is used for concatenation.