ISWC 2010 Demo – Filtered Browsing

Today’s work on the demo was mostly aesthetic, based on some feedback that I got about it. This included some easy fixes like changing the e-mails to have _AT_ instead of @ to ward off spambots and adding the event type to the Times display as well as some more complicated ones such as checking out the information pages and getting all of the URL’s to have more helpful labels (where possible).

The link changes were done in two different ways. First, in cases where it was available, I was able to pull the rdfs:label information from the dataset by editing the query and added extra processing to make sure that the link used the much easier to read label instead. There were also cases where, although there was no rdfs:label data available, the URL itself could be shortened, mostly in cases of location links to dbpedia.org and data.semanticweb.org.

Although the aesthetic work makes the page looks much better, all of the additional parsing has the unfortunate effect of making the code much more specialized. In particular, there were several cases where I relied on the dataset only having certain kinds of location data as the object of the location predicate when parsing, which may cause odd behavior if I tried to reuse the code on another dataset. However, I think it would still be quite easy to adapt for similar purposes, since it would mostly just be a case of deleting a bunch of the conditionals and just writing some new ones to cope with the new dataset’s particular needs. The same is true of how the endpoint output would also require a bunch of changes to the processing if there were differences in that.

All in all, I’m pretty happy with how my demo turned out, especially since I’ve only been working on it for about two weeks and knew little to no PHP when I started. It’s a little slow and it doesn’t have the searching/visualization that I hoped for, but the browsing functionality that I actually finished looks much better than I was expecting. I’m kind of curious about how it would normally be done, since I’m pretty sure that my way of processing the results is not optimal in the least (giant fgets loop with gratuitous use of conditionals?).

 


 

I’ll go over some of how the page works internally, starting with the query generation. The following are the basic queries that are used in the code; each is called according to the GETS variables, which are the (?var=value&var2=value2) things that you see in the URL.

The query used for the Times page:

SELECT ?s ?p ?o ?eventType WHERE { ?r ?p ?o. ?r a ?eventType. ?r ?time ?o. ?r rdfs:label ?s. FILTER((?time = 'http://www.w3.org/2002/12/cal/ical#dtstart' || ?time = 'http://www.w3.org/2002/12/cal/ical#dtend') && (?eventType = 'http://data.semanticweb.org/ns/swc/ontology#SessionEvent' || ?eventType = 'http://data.semanticweb.org/ns/swc/ontology#TalkEvent' || ?eventType = 'http://data.semanticweb.org/ns/swc/ontology#TrackEvent' || ?eventType = 'http://data.semanticweb.org/ns/swc/ontology#MealEvent' || ?eventType = 'http://data.semanticweb.org/ns/swc/ontology#BreakEvent' || ?eventType = 'http://data.semanticweb.org/ns/swc/ontology#AcademicEvent' || ?eventType = 'http://data.semanticweb.org/ns/swc/ontology#ConferenceEvent' || ?eventType = 'http://data.semanticweb.org/ns/swc/ontology#SocialEvent' || ?eventType = 'http://data.semanticweb.org/ns/swc/ontology#TutorialEvent' || ?eventType = 'http://data.semanticweb.org/ns/swc/ontology#PanelEvent' || ?eventType = 'http://data.semanticweb.org/ns/swc/ontology#WorkshopEvent'))}

The query used for the Papers page:

SELECT ?s ?p ?o WHERE { ?r ?p ?o. ?r a ?paper. ?r rdfs:label ?s. FILTER(?p = 'http://www.w3.org/2000/01/rdf-schema#label' && ?paper = 'http://swrc.ontoware.org/ontology#InProceedings')} ORDER BY ASC(?s)

The query used for the People page:

SELECT ?s ?p ?o WHERE { ?r ?p ?o. ?r a ?person. ?r rdfs:label ?s. FILTER(?p = 'http://www.w3.org/2000/01/rdf-schema#label' && ?person = 'http://xmlns.com/foaf/0.1/Person')} ORDER BY ASC(?s)

The query used for the Organizations page:

SELECT ?s ?p ?o WHERE { ?r ?p ?o. ?r a ?organization. ?r rdfs:label ?s. FILTER(?p = 'http://www.w3.org/2000/01/rdf-schema#label' && ?organization = 'http://xmlns.com/foaf/0.1/Organization')} ORDER BY ASC(?s)

The last three queries are pretty similar, grabbing all matching instances for their type. The only interesting thing is the use of ?r to make sure that ?s is actually the label for the instance, not the instance URI itself. I also ordered them alphabetically so that the pages would iterate through correctly. The first query is quite large, only because I had to make sure it pulled all kinds of events, as well as making sure that it only pulled the triples for each instance where it had its time data.

This query was used to build the full endpoint URL, which is opened and read in the processing step of the page. I used the endpoint with JSON output set, mostly because I had already worked with JSON output on my TWC Locations demo and was familiar with what I had to do to process the results. The processing itself is mostly a giant while loop, grabbing each line of the results and examining them such that it would read the data and output the table that you see.

        
                //Write the display code to $output        
                asort($start);
		$ctime = "temp";
		$output = "<table align='center'>";
		$preoutput = "<form action='ISWC2010.php' method='get'><select name='datetime'>";
		foreach ($start as $name => $time) {
			//Split the time
			$startDay = strtok($time,"T");
			$sdY = date("y",strtotime($startDay));
			$sdM = date("m",strtotime($startDay));
			$sdD = date("d",strtotime($startDay));
			$startDayName = jddayofweek(cal_to_jd(CAL_GREGORIAN,$sdM,$sdD,$sdY),1);
			$startTime = substr(str_replace("-",":",strtok("T")),0,5);
			$endDay = strtok($end[$name],"T");
			$endTime = substr(str_replace("-",":",strtok("T")),0,5);
			//For each time
			if (strstr($ctime,$time) != true) {
				$ctime = $time;
				$preoutput = $preoutput."<option value='".$startDayName.$startTime."'>".$startDayName." (".$sdM."/".$sdD."/".$sdY."), ".$startTime."</option>";
				$output = $output."<tr><th colspan='3' id='".$startDayName.$startTime."'>".$startDayName." - ".$sdM."/".$sdD."/".$sdY."<br>".$startTime."</th></tr>";
			}
			$output = $output."<tr>";
			$output = $output."<td>".$type[$name]."</td><td><a href='ISWC2010.php?filter=eventinfo&subject=".$name."&stime=".$startDay."T".str_replace(":","-",$startTime)."-00&etime=".$endDay."T".str_replace(":","-",$endTime)."-00' target='_blank'>".$name."</a></td>";
			$output = $output."<td>Ends at ".$endTime."</td>";
			$output = $output."</tr>";
		}
		$output = $output."</table>";
		$preoutput = $preoutput."</select><input type='submit' value='Go' /></form>";

In the case of Times, it uses three arrays, one for start times, one for end times, and one for event types, filling them in using the instance label as the key, and has a block after the processing loop that writes the entire table to the output variable. The others print to the output variable as they go, instead of waiting for the end. The way that I did the output, I had a block initializing and writing the header/style/form HTML before the processing, the processing continued to concatenate the table into the output, and finally it had the end tags added on and everything is printed at the end. Doing it this way made it easier to change the output format, since I could easily change the order of the main table, as well as being able to write output later that would still be able to go above the earlier output, since I’d just write it to a preoutput variable and print that first. That is how I generated the Times drop-down menu and the listings in the other categories for the anchor tag navigation; I’d write the data for that alongside the table output, but sent the anchor information to $preoutput and the table to $output.

$query = 	"SELECT ?s ?p ?o ?l WHERE {
				?s ?p ?o.
				?s rdfs:label '".
					html_entity_decode($_GET['subject'])."'.".
				"?s ?start 'http://data.semanticweb.org/conference/iswc/2010/time/".
					html_entity_decode($_GET['stime'])."'.".
				"?s ?end 'http://data.semanticweb.org/conference/iswc/2010/time/".
					html_entity_decode($_GET['etime'])."'.".
				"FILTER(?p != 'http://www.w3.org/2000/01/rdf-schema#label' && ?start = 'http://www.w3.org/2002/12/cal/ical#dtstart' && ?end = 'http://www.w3.org/2002/12/cal/ical#dtend').
				OPTIONAL { ?o rdfs:label ?l }
				}";
	$query = 	"SELECT ?s ?p ?o ?l WHERE {
				?s ?p ?o.
				?s rdfs:label '".
					html_entity_decode($_GET['subject'])."'.".
				"FILTER(?p != 'http://www.w3.org/2000/01/rdf-schema#label').
				OPTIONAL { ?o rdfs:label ?l }
				}";

These were the queries used for the Times info page and the others, respectively. It is different because the query actually changes depending on the specific instance it is searching for. Also, it has the optional clause for ?l, which is for the label of each object in the result. The processing for the info pages were also different, since they had to have the additional task of making the predicates and objects readable, with all sorts of filters put in to make it look better.

Advertisements