Tag Archive: HTML

Tonight was the second hackathon. Whereas last time most of the group was working on migrating pages from the old page to the new one, this one was focused more on the coding issues. I was part of the team working on creating the forms to be used to create new pages, which involved creation of the HTML to be put into the Drupal site to provide the form, a PHP page that processed the form data and both outputs the RDF and the Drupal page (if applicable), and some Javascript pages that go along with the HTML. Within our group, I was working on creating the form for adding the In The News instances for the Announcements.

After figuring out how to get onto the server, I grabbed all of the example files that Evan had made for the Presentations creation form and looked at those. I started out by writing the HTML, figuring that was the easiest place to start, since it was just some simple HTML editing by adding the fields needed, which I found out with a combination of exploring the existing RDF files on the server as well as examining the OWL schema file using Protege. After this, I worked on the PHP since I’d need that to test the new HTML, and because I didn’t have write access to the dev site yet to test it anyhow.

At first, it was just some simple editing of the processing to match with the form data in the HTML. I also removed the section of code that created the Drupal page since the announcements don’t have a page of their own. After tracking down and editing everything I thought was needed and getting write access, I managed to get the HTML into a Drupal page. I ran into several issues here because I had not noticed the Javascript files, which were essential for filling in the multiple selection menus (they pulled a listing of the necessary URIs to be placed into a certain element given the URI type, and enabled filling of the selection boxes and autocompletion capabilities on other fields). This turned out to be a pretty simple fix, just needing basic edits to a new javascript file, and the form became functional.

Most of the time was spent on the next part, which was to make sure that the PHP called by the form processed everything correctly. At first, it seemed to be working, since I would submit the form and get what looked like valid RDF out of it. However, there were many issues of some fields not being written to the RDF, which was due to me not remembering to edit the section grabbing all of the GET variables correctly. Throughout this, I continued to try testing this, but I was not able to get the RDF to successfully be displayed on the front page. Evan looked at it and confirmed that the RDF was finally complete (this was 3 hours into the hackathon already…), but ran into an odd problem which was preventing it from being displayed. He tested the RDF by trying to query the triple store for the site to find my RDF file, but although it returned fine when searching for tw:InTheNews, it would not return when searching for tw:Announcements, although InTheNews is a subset of that.

By the time I left, this problem was still unresolved, but at least the form seems to work in grabbing the input and generating the RDF, and we think the error lies somewhere in how the announcements are being generated.


Today was the TWC Hackathon for the upcoming shiny new website, which I came to (although I had to leave early). I had thought I would try to help with the migration of the old pages onto the new site, but I actually ended up doing some debugging for the site. Patrick had started the meeting by listing all of the various things that we hoped to finish, one of which being some issues when viewing the site using Internet Explorer, so I volunteered to look into those.

The known issue was the front page tables exploding, so that was the first thing I worked on. I started out by just looking through the source to get an idea of how the front page was put together. Most of the others thought it was a CSS issue, so I tried to examine those first, trying to look through the various CSS files to see what affected the area, which I had narrowed down to some issue with the sizing of the announcements column. I didn’t really see anything, so I went line by line through the encompassing tables to try to figure out where the sizing issue was coming from.

A really useful note is that, before examining the code, I realized that I wanted to be able to test changes so that I would know more definitively what was broken/how to fix it, since I would be able to test it. However, since we suspected that the issue might lie in the CSS and because the saved copy would not work right without the accompanying images/CSS/etc, I used a trick that I figured out earlier this year for a reason I now forget, which was downloading a page and its attached files without breaking the links so that they still import correctly. To use this method, you need to have access to the wget facility, which is most likely installed already if you use Linux/Unix. You can also install it on Windows, which I had done already.

A brief intro to wget…or at least the features of it that I use. Wget is a really great utility when it has an opportunity to be used (this is only the second time I’ve used it actually…but it is definitely THE method I know to quickly get a local copy for editing). What it basically does is download files from a given URL, but with flags it can do really useful stuff like downloading a site recursively (-r), converting the links to work with the other locally downloaded files (-k), and grabbing needed imports (-p). I just wanted to download the front page so I could edit and test it offline, so I used wget -k -p URL to grab it.

Back to the actual front page work, I eventually figured out that the issue was actually due to an tag, where the image source was actually something like 900 pixels in width, although the page was displaying it at 100 pixels. However, because the tag had no width attribute specified, it was somehow displaying the 100px resized image, but still trying to make the table fit a 900px width space, causing horrible things to happen to the front page. I tried to edit the page myself to fix it, and realized that the code in the back was actually using Drupal to generate the HTML via the .rq query and the .xsl formatting code, so I talked to Patrick and he edited the .xsl file to fix the front page.

Another reported issue was some sort of error with people’s profile pages, but I did not see any issues, so I moved on to exploring the site on IE to try to see if any other issues were present. A major issue that I found was when using the search form. It had a weird issue where the results would display fine if the user was not logged in, but would seem to lose all the CSS styles if the user was logged on. I attempted to figure out what the issue was, but was unable to determine it from looking at the code, and I was unable to test changes to try to find out that way since I could not log-in using my local copy. After talking to Patrick, I submitted a trac report so that it can be resolved later.

While writing that part of the blog, I just realized that this could have been why I didn’t see any problems with people’s profile pages. I quickly checked it on IE while logged on and sure enough, the profile pages have the same loss of information. I guess this means that my suspicion about it being something about the search code was wrong, but it does mean that the error must be something similar between the two, which should hopefully make fixing it easier. Now to go track down my report to append this to…

Today I was going to move away from working on what data to gather and how to gather it for the scientific locations because I realized that I had really skipped to the end, which probably won’t work very well. The plan was to work on it in the same way I did the ISWC demo by writing out the framework (the skeleton blocks/loops, variable initialization, comments for later) for everything, which really helps in getting a good idea of how everything will work out, and then slowly filling parts of it in to create the functionality part by part, eventually leading to the battle of the main data gathering/visualization. However, it turns out that I hit a snag immediately, seeing as how the CS lab servers were down once again, and I don’t really have any other place where I can test my PHP code.

I started writing up the framework anyhow, since I shouldn’t really need to debug that, but stalled again while working on the initial query used to find the starting location. The problem was that I was trying sample queries for, say, New York, but was not getting any answers! This seemed pretty bad, considering that it should be a pretty simple query and it is rather important that I be able to search the data for a name. After a much longer time than it should have taken, I realized that SPARQL does not equate instances of literals with others if they differ on language tags…and this includes between one without a tag at all and one having one! Since I had been searching for matches of “New York”, I was not getting any results, but as soon as I remembered to add on the @en, it worked.

^Almost an hour of work because of a tag that I should have remembered from previous run-ins with them….

The first part of the project that I will be working on is this initial search, which will search for the location you enter, either by name or directly by latitude/longitude. If there are no results (if searching by name), it’ll return to the search form letting the user know. If there is just one result, it’ll proceed to a page with some useful information on the location, as well as a form with some filters that can be submitted to generate the corresponding visualization. At some point, I want to try to make this information function be an actual mash-up, if I can find other sources to draw data from, and so this could really be a neat demo by itself if enough time is spent on it. Until the entire thing is functional, it’ll just be some info straight from the dbpedia results, however. Finally, if there is more than one result, it’ll display a small bit of info on each result and prompt the user to select one, which will redirect the user to the info page/form for visualization.

Another thing that I found was that trying to use the regex filter on a dbpedia query kills all queries to the point where none of them even finish. I might look into this later to see if there’s better ways to do it, but it makes a lot of sense to me that it would do this seeing as how it would need to run some sort of string processing on every result now. As a result, the search might be much less flexible than I was hoping, since it may need exact matches to find the location. I’m hoping to at least find a way to get capitalization issues to be handled, but any of these optimizations will wait until I get this thing functional, which may or may not be this semester.

Next time I hope to get the initial search functional as well as the functionality for 0 and 2+ results to work. Then I’ll work on the info page, and then finally move onto the form for the visualization and the actual map itself.

Today’s AHM was split into talking about thw ISWC 2010 Demo as well as a presentation about having user annotations in provenance.

Jie started out by giving an overview again of the Semantic Dog Food goals and ISWC 2010 dataset. Alvaro showed his mobile browser, and I presented my filtered browser. It was pretty short, since most of the functions work the same in between areas, so I just showed a bit of what the various displays look like and how the navigation works. No one seemed to have any questions or suggestions, so I didn’t really go into detail about any of the implementation.

Before the meeting, I worked on some more aesthetic changes, as well as enabling local links to use my browser to go to things like people or papers that show up in the retrieved data as objects. I also noticed an issue where any data that had colons in it, such as the literals owl:SameAs, were cut off because of the way I used strtok. I was unable to finish fixing this before the meeting, and I think Professor McGuiness actually noticed, because she asked about the made predicate, one of the areas affected. I did finish fixing this afterwards, however, as well as a similar issue with single quotes, where the single quotes were breaking both the query and the search URL. Professor Hendler noted that I should move the demo to a more appropriate server than my CS account (Evan noted that it’s especially important since the CS servers have been breaking all semester), so I’ll have to look into that.

Some screenshots from the demo:


ISWC 2010 Demo – Filtered Browsing

Today’s work on the demo was mostly aesthetic, based on some feedback that I got about it. This included some easy fixes like changing the e-mails to have _AT_ instead of @ to ward off spambots and adding the event type to the Times display as well as some more complicated ones such as checking out the information pages and getting all of the URL’s to have more helpful labels (where possible).

The link changes were done in two different ways. First, in cases where it was available, I was able to pull the rdfs:label information from the dataset by editing the query and added extra processing to make sure that the link used the much easier to read label instead. There were also cases where, although there was no rdfs:label data available, the URL itself could be shortened, mostly in cases of location links to dbpedia.org and data.semanticweb.org.

Although the aesthetic work makes the page looks much better, all of the additional parsing has the unfortunate effect of making the code much more specialized. In particular, there were several cases where I relied on the dataset only having certain kinds of location data as the object of the location predicate when parsing, which may cause odd behavior if I tried to reuse the code on another dataset. However, I think it would still be quite easy to adapt for similar purposes, since it would mostly just be a case of deleting a bunch of the conditionals and just writing some new ones to cope with the new dataset’s particular needs. The same is true of how the endpoint output would also require a bunch of changes to the processing if there were differences in that.

All in all, I’m pretty happy with how my demo turned out, especially since I’ve only been working on it for about two weeks and knew little to no PHP when I started. It’s a little slow and it doesn’t have the searching/visualization that I hoped for, but the browsing functionality that I actually finished looks much better than I was expecting. I’m kind of curious about how it would normally be done, since I’m pretty sure that my way of processing the results is not optimal in the least (giant fgets loop with gratuitous use of conditionals?).



I’ll go over some of how the page works internally, starting with the query generation. The following are the basic queries that are used in the code; each is called according to the GETS variables, which are the (?var=value&var2=value2) things that you see in the URL.

The query used for the Times page:

SELECT ?s ?p ?o ?eventType WHERE { ?r ?p ?o. ?r a ?eventType. ?r ?time ?o. ?r rdfs:label ?s. FILTER((?time = 'http://www.w3.org/2002/12/cal/ical#dtstart' || ?time = 'http://www.w3.org/2002/12/cal/ical#dtend') && (?eventType = 'http://data.semanticweb.org/ns/swc/ontology#SessionEvent' || ?eventType = 'http://data.semanticweb.org/ns/swc/ontology#TalkEvent' || ?eventType = 'http://data.semanticweb.org/ns/swc/ontology#TrackEvent' || ?eventType = 'http://data.semanticweb.org/ns/swc/ontology#MealEvent' || ?eventType = 'http://data.semanticweb.org/ns/swc/ontology#BreakEvent' || ?eventType = 'http://data.semanticweb.org/ns/swc/ontology#AcademicEvent' || ?eventType = 'http://data.semanticweb.org/ns/swc/ontology#ConferenceEvent' || ?eventType = 'http://data.semanticweb.org/ns/swc/ontology#SocialEvent' || ?eventType = 'http://data.semanticweb.org/ns/swc/ontology#TutorialEvent' || ?eventType = 'http://data.semanticweb.org/ns/swc/ontology#PanelEvent' || ?eventType = 'http://data.semanticweb.org/ns/swc/ontology#WorkshopEvent'))}

The query used for the Papers page:

SELECT ?s ?p ?o WHERE { ?r ?p ?o. ?r a ?paper. ?r rdfs:label ?s. FILTER(?p = 'http://www.w3.org/2000/01/rdf-schema#label' && ?paper = 'http://swrc.ontoware.org/ontology#InProceedings')} ORDER BY ASC(?s)

The query used for the People page:

SELECT ?s ?p ?o WHERE { ?r ?p ?o. ?r a ?person. ?r rdfs:label ?s. FILTER(?p = 'http://www.w3.org/2000/01/rdf-schema#label' && ?person = 'http://xmlns.com/foaf/0.1/Person')} ORDER BY ASC(?s)

The query used for the Organizations page:

SELECT ?s ?p ?o WHERE { ?r ?p ?o. ?r a ?organization. ?r rdfs:label ?s. FILTER(?p = 'http://www.w3.org/2000/01/rdf-schema#label' && ?organization = 'http://xmlns.com/foaf/0.1/Organization')} ORDER BY ASC(?s)

The last three queries are pretty similar, grabbing all matching instances for their type. The only interesting thing is the use of ?r to make sure that ?s is actually the label for the instance, not the instance URI itself. I also ordered them alphabetically so that the pages would iterate through correctly. The first query is quite large, only because I had to make sure it pulled all kinds of events, as well as making sure that it only pulled the triples for each instance where it had its time data.

This query was used to build the full endpoint URL, which is opened and read in the processing step of the page. I used the endpoint with JSON output set, mostly because I had already worked with JSON output on my TWC Locations demo and was familiar with what I had to do to process the results. The processing itself is mostly a giant while loop, grabbing each line of the results and examining them such that it would read the data and output the table that you see.

                //Write the display code to $output        
		$ctime = "temp";
		$output = "<table align='center'>";
		$preoutput = "<form action='ISWC2010.php' method='get'><select name='datetime'>";
		foreach ($start as $name => $time) {
			//Split the time
			$startDay = strtok($time,"T");
			$sdY = date("y",strtotime($startDay));
			$sdM = date("m",strtotime($startDay));
			$sdD = date("d",strtotime($startDay));
			$startDayName = jddayofweek(cal_to_jd(CAL_GREGORIAN,$sdM,$sdD,$sdY),1);
			$startTime = substr(str_replace("-",":",strtok("T")),0,5);
			$endDay = strtok($end[$name],"T");
			$endTime = substr(str_replace("-",":",strtok("T")),0,5);
			//For each time
			if (strstr($ctime,$time) != true) {
				$ctime = $time;
				$preoutput = $preoutput."<option value='".$startDayName.$startTime."'>".$startDayName." (".$sdM."/".$sdD."/".$sdY."), ".$startTime."</option>";
				$output = $output."<tr><th colspan='3' id='".$startDayName.$startTime."'>".$startDayName." - ".$sdM."/".$sdD."/".$sdY."<br>".$startTime."</th></tr>";
			$output = $output."<tr>";
			$output = $output."<td>".$type[$name]."</td><td><a href='ISWC2010.php?filter=eventinfo&subject=".$name."&stime=".$startDay."T".str_replace(":","-",$startTime)."-00&etime=".$endDay."T".str_replace(":","-",$endTime)."-00' target='_blank'>".$name."</a></td>";
			$output = $output."<td>Ends at ".$endTime."</td>";
			$output = $output."</tr>";
		$output = $output."</table>";
		$preoutput = $preoutput."</select><input type='submit' value='Go' /></form>";

In the case of Times, it uses three arrays, one for start times, one for end times, and one for event types, filling them in using the instance label as the key, and has a block after the processing loop that writes the entire table to the output variable. The others print to the output variable as they go, instead of waiting for the end. The way that I did the output, I had a block initializing and writing the header/style/form HTML before the processing, the processing continued to concatenate the table into the output, and finally it had the end tags added on and everything is printed at the end. Doing it this way made it easier to change the output format, since I could easily change the order of the main table, as well as being able to write output later that would still be able to go above the earlier output, since I’d just write it to a preoutput variable and print that first. That is how I generated the Times drop-down menu and the listings in the other categories for the anchor tag navigation; I’d write the data for that alongside the table output, but sent the anchor information to $preoutput and the table to $output.

$query = 	"SELECT ?s ?p ?o ?l WHERE {
				?s ?p ?o.
				?s rdfs:label '".
				"?s ?start 'http://data.semanticweb.org/conference/iswc/2010/time/".
				"?s ?end 'http://data.semanticweb.org/conference/iswc/2010/time/".
				"FILTER(?p != 'http://www.w3.org/2000/01/rdf-schema#label' && ?start = 'http://www.w3.org/2002/12/cal/ical#dtstart' && ?end = 'http://www.w3.org/2002/12/cal/ical#dtend').
				OPTIONAL { ?o rdfs:label ?l }
	$query = 	"SELECT ?s ?p ?o ?l WHERE {
				?s ?p ?o.
				?s rdfs:label '".
				"FILTER(?p != 'http://www.w3.org/2000/01/rdf-schema#label').
				OPTIONAL { ?o rdfs:label ?l }

These were the queries used for the Times info page and the others, respectively. It is different because the query actually changes depending on the specific instance it is searching for. Also, it has the optional clause for ?l, which is for the label of each object in the result. The processing for the info pages were also different, since they had to have the additional task of making the predicates and objects readable, with all sorts of filters put in to make it look better.

I’ve been working since the last update on my demo idea for the ISWC 2010, and it is finally to the point where it has some real functionality. I have finished making the browsing sections, and will probably spend time tomorrow trying to add some sort of visualization to part of it. I don’t think I’ll have time to implement and test the searching that I wanted to have, but I did add some anchor links to try to help the user quickly find what they need that way within the full results.

ISWC 2010 Demo Page

Here is a sample screenshot from the demo, I’ll be talking about how the different parts work, and what those parts are. In general, the form manipulates the GET variables in the URL, which are used to both determine the query that is used as well as the specific parsing/display actions that are performed. Each main page has an organized listing that is the parsed output of the query, and each is linked to an information page containing all of its information, with all links made active for easy navigation. The original plan was three-fold, with browsing, searching, and visualization. Only browsing is done right now, but I’m hoping to at least get one visualization done by Friday’s All-Hands Meeting.

The first element is a simple drop-down menu, which allows filtering by either People, Papers, Organizations, or Times. The default view is Times, which displays the list of all the events, sorted by day/time. This was one of my key goals for this demo, and although it is a different form than I had planned (due to time constraints), I am happy with how it came out.  It is clearly readable, although it might not be immediately clear where overlaps occur.  I had originally wanted to have it in a visualization form, where overlaps could be easily seen.

When on the Times menu, there is also a second drop-down menu, which contains a list of all the different start times that are listed. This is so the user can select one and immediately go to that section.  A similar setup is used for the other sections, with small differences.

The People, Papers, and Organization sections are sorted by alphabetical order.  Also, People are organized in groups of four per row, Organizations with two per row, and Papers with just one per row; this was just to keep the size of the page more compact and easier to read.  Alphabetical anchor-points are used in all three sections to enable fast navigation by users.

In the previous screenshots, you might have noted that all the results are linked.  All of them link to a new window which displays all of the information available in the triple store about that particular thing.  I had planned on created a nicer format for the output, but did not have time to do so.  However, I did implement a lot of parsing to try to make it more readable, including making the links active and making people’s depiction-tagged links automatically load in <img> form.

There were a few points in the process where I spent a large amount of time figuring out.  The first was when I was trying to figure out how to parse the query results to get all the information in a way that let me easily output what I wanted (append &debug=results onto the end of any of the pages to see the raw query results, if you’d like).  This took me until around Sunday, when I finally started to write the code to simply output all the triples in a simple form so I could further decide how I actually wanted to structure everything (there were a lot of ideas that were just simplified down to simple listings later due to time).  Only two of these brainstorm areas are left (the others were deleted or turned into the current pages), but you can see them by editing the url to have filter=types (for a listing of all instances and their type) or filter=events (for a listing of all event-related data).

Today was the biggest section of work, where I wrote the code for the information display pages as well as converting most of the brainstorm areas into the listings with the anchors and nice looking formatting.  The next step would be to add visualizations to all the pages, so I was hoping to make a visualization tomorrow for organizations.  This would be showing the locations of each of them using their long/lat information and the Google Maps API, which Jie was hoping someone would do.  The search aspect will definitely not be reached in time, so I’ll just focus on visualization(s) tomorrow.

As for the code structure itself, it started out pretty organized but mutated rather horribly over the course of today, since I was more concerned with getting everything working.  It is divided into a few sections, which from top-to-bottom looks like: the big code block for the information pages, the form, all the query generation code, the first part of the display output, the giant parsing section that loads/processes results and sets up the output, and the final display section.  All of the code is pretty simple by itself, it’s just conditionals/loops mixed with a lot of string parsing/concatenation as well as a bunch of SPARQL for the queries, but gets very complicated to put together to work the way you want.  I won’t be posting snippets right now since the code and queries are still in flux (and super disorganized right now…there’s debug stuff and code that’s no longer used scattered everywhere).

So, as sort of a continuation of my work on the SPARQL visualization, I decided to try a different approach.  My visualization was generated via a Python script pulling query results from an endpoint, and output my visualization in pure HTML.  The drawbacks of this were that the page generated was static (although this was also a strength, since if the endpoint goes down, the page will still work) and that the query itself was also hardcoded, so it could only visualize the one query.  I was thinking of trying to reverse that, by making a browsing/searching page for a dataset, where the query would be based off a search form, so that the user can choose what output they want to see.  I plan on doing this in PHP, and spent today working out how to access the endpoint from PHP, with some testing to make sure it works with the endpoint/dataset I plan on using. I had hoped to get the initial view and basic search form made, but figuring out the PHP access took too long.

At last week’s All-Hands Meeting, one of the ideas was that browsing, searching, visualization, and more were usages that they wanted to look at for the Semantic Web Dog Food project.  Specifically, they were trying to think of ideas for the ISWC 2010.  With this in mind, I’ll be basing this work on the ISWC 2010 endpoint, which I tracked down through the source code of the demo they showed at the meeting.  I don’t know if I’ll actually get it working in time to see if it would actually be useful, but it seemed like a good dataset to use, since it is actually something that such tools were wanted for.  Originally, I had planned on making a page that would display the schedules for all the sessions/papers and such, but after testing the endpoint, it doesn’t look like any of the events actually have times set…those fields are all blank nodes.

The plan I had for this browse/search was to have a front page that displayed all the general information about the conference in a neat table below the search form.  The user could click the information to see more about each person/session/paper, or they could use the search form (probably implemented in radio buttons/drop down menus) to filter by various things.  I was hoping to write the code so that various filters would display differently, so if it was filtered by time (again, these fields seem to all be blank, so I don’t know if I’ll try to make this) then it would display a table of times and events (events could filter too), or if it was filtered by person, it would show a list of people, with all their papers. I’m not entirely sure of what filters and formats I’ll use, I’ll have to make more detailed plans once I finish getting the PHP framework done and I figure out the result parsing.

This post will detail the important segments of the Python script that I am using to generate my visualization demo, found here, with accompanying snippets. Since part of the reason for making the script was in the hopes of allowing future visualizations of similar queries and graphs to be created quickly by adapting the script, hopefully this will help anyone trying to do so.

The first segment of the script deals with the query itself, loading the query results from the endpoint into an output file for use by the parsing section. Basically, this code downloads the contents of the URL, which is the direct link to the JSON-format output from the wineagent SPARQL endpoint.

queryResults = urllib.urlopen('http://wineagent.tw.rpi.edu:2020/books?query=PREFIX+rdf%3A+%0D%0APREFIX+tw%3A+%0D%0APREFIX+twi%3A+%0D%0APREFIX+foaf%3A+%0D%0ASELECT+%3FLocation+(count(%3FLocation)+AS+%3FNumResearchers)%0D%0AWHERE+{+%0D%0A+++++%3FResearcher+a+foaf%3APerson.+%0D%0A+++++%3FResearcher+tw%3AhasAffiliation+twi%3ATetherlessWorldConstellation.%0D%0A+++++%3FResearcher+tw%3AhasLocation+%3FLocation.%0D%0A}%0D%0AGROUP+BY+%3FLocation&output=json')
queryFile = open("queryResults","w")
for lines in queryResults.readlines():

The code to do so is fairly straightforward. It opens the URL, hardcoded into the script, and opens the queryResults file for writing the output to. It then proceeds to read each line from the URL and outputs the line into the output file. After this loop finishes, the file is closed.

After this, it reopens the results file, this time as read-only, as well as the visualization.html output file, to prepare for the following section of code, for the parsing of the actual data needed for the visualization from the query results.

data = {}
valCheck = False
for lines in queryFile.readlines():
	    if valCheck:
		    loc = lines.find('"value":') #Location of the URI's associated value
		    if loc >= 0:
	                uriValue = lines[loc+10:-4]	
	                data[uriName] = uriValue
		        valCheck = False
		    loc = lines.find("http://tw.rpi.edu/instances/") # Location of the URI
		    if loc >= 0:
	                uriName = lines[loc-1:-5]		
                        valCheck = True

To do so, a dictionary is created, which is basically the Python associative container, like the map structure in the C++ STL. It starts a loop that reads the file, line by line. The else block is the first one that should trigger, as it finds the first line of data for each record (the location URI, in this case). Once found, it uses valCheck to signify that the next line must contain the value associated with the location. This loop is specifically tailored to read the output of the endpoint, and would have to be changed anytime the output changes significantly. However, the actual changes would not take that long, thanks to the consistent formatting of the endpoint output. Another note about the code is the way the actual saved data depends on the array subscripting, which just cuts out a specific substring…again very specifically tailored to the output, but also very easy to alter. After the dictionary is complete, the next step is to grab all that data and write it into the formatted JSONobject string for the Google visualization API.

jsonObjectList = []

# Column names
jsonObjectList.append("var JSONObject={cols:[{id:'s',label:'Locations',type:'string'},{id:'o',label:'Number of researchers',type:'number'}],rows:[")

# The rest of the JSONobject
for k, v in sorted(data.items()):

# Generate full string
jsonObject = ''.join(jsonObjectList)

A list is created to hold the different segments of the string being created, before being joined together at the end. The first addition is the column names, which is hardcoded in. The rest is generic, simply pulling all of the dictionary key/value pairs and outputting that into the correct format.

The largest section is the HTML generation, simply because there are so many lines of the HTML that are hardcoded in. You basically just need to find a Google Visualization example for your desired chart (I used a horizontal bar chart), edit the caption/label/options information to match your visualization, and write it into a string. The string is written in three parts, starting with the first half of the HTML, stopping where the JSONobject string is needed, adding in my newly generated string, then appending the rest of the HTML. Finally, the whole string is written to the visualization.html file and all the files are closed. Done!

The result is a script that you just run and get the ready-to-go HTML file to upload. As noted in my previous blog post, there’s a bunch of advantages to doing it this way. In short, it is much better for future maintenance than translating the endpoint output to the JSONobject manually each time, and it is more robust than a dynamic webpage that tries to reload the results every time it is loaded. This tries to strike a compromise, with a dynamic script that generates a static page whenever an update is needed.

Screenshot of the Visualization

Today I did the actual coding of the Python script I had planned last time.  What had started as an idea to use the script to go from the SPARQL endpoint output to the needed input for the JSON object grew to include the idea of simply generating the entire HTML of the visualization with the script, and finally ended up grabbing the query data from the endpoint by itself.

The script is divided into four main elements.  First, the script accesses the URL for the actual JSON-format results from the SPARQL endpoint, copying the output into a queryResults file.  It then reads these results, parsing out the needed data into a dictionary, using the rooms as a key and the number of researchers as the value.  Using this dictionary, the JSON object needed for the Google visualization API is built.  Finally, the HTML for the page is output, inserting the JSON object line into the correct place.  The final script output is visualization.html, which can be uploaded and viewed online.

This method seems really roundabout, but there were a few reasons why I wanted to do it this way.  I considered trying to do the same method, but entirely within a dynamic webpage, which would have the benefit of always being updated.  However, when I was looking at other visualization demos, I realized that several would not function because their source endpoint was gone.  By having my actual page be completely static, this won’t be a problem.  On the other hand, if I had simply typed out the JSON object manually, I would have to do it manually again if I ever wanted to update the information.  This method means that I can generate a new page anytime I want by running the script, but the current webpage won’t ever have endpoint connection issues.  Even if the endpoint goes down and I need to update it (unlikely, since this is a demo, but I was trying to imagine it as a maintainable task), I will be able to just change the URL in the script, and probably how it parses if the output of the endpoint is different.  The final reason is that when I was brainstorming how to do this, I didn’t actually know if using a normal programming language like C++ would allow easy access to a webpage result, and I didn’t like the idea of possible broken links if using a web programming language like ASP/PHP.  However, I did know from past brainstorms that Python had all the needed methods and capabilities, so I just used that.  Ironically, this was my first time using Python, other than one or two small utilities I made for work…I got a lot of use out of Google and a reference book from the library while doing this!  It was sort of a crash course in both SPARQL and Python.

My SPARQL visualization demo can be found at http://www.rpi.edu/~ngp2/TWC/visualization.html