This post will detail the important segments of the Python script that I am using to generate my visualization demo, found here, with accompanying snippets. Since part of the reason for making the script was in the hopes of allowing future visualizations of similar queries and graphs to be created quickly by adapting the script, hopefully this will help anyone trying to do so.

The first segment of the script deals with the query itself, loading the query results from the endpoint into an output file for use by the parsing section. Basically, this code downloads the contents of the URL, which is the direct link to the JSON-format output from the wineagent SPARQL endpoint.

queryResults = urllib.urlopen('http://wineagent.tw.rpi.edu:2020/books?query=PREFIX+rdf%3A+%0D%0APREFIX+tw%3A+%0D%0APREFIX+twi%3A+%0D%0APREFIX+foaf%3A+%0D%0ASELECT+%3FLocation+(count(%3FLocation)+AS+%3FNumResearchers)%0D%0AWHERE+{+%0D%0A+++++%3FResearcher+a+foaf%3APerson.+%0D%0A+++++%3FResearcher+tw%3AhasAffiliation+twi%3ATetherlessWorldConstellation.%0D%0A+++++%3FResearcher+tw%3AhasLocation+%3FLocation.%0D%0A}%0D%0AGROUP+BY+%3FLocation&output=json')
queryFile = open("queryResults","w")
for lines in queryResults.readlines():
	    queryFile.writelines(lines)
queryFile.close()

The code to do so is fairly straightforward. It opens the URL, hardcoded into the script, and opens the queryResults file for writing the output to. It then proceeds to read each line from the URL and outputs the line into the output file. After this loop finishes, the file is closed.

After this, it reopens the results file, this time as read-only, as well as the visualization.html output file, to prepare for the following section of code, for the parsing of the actual data needed for the visualization from the query results.

data = {}
valCheck = False
for lines in queryFile.readlines():
	    if valCheck:
		    loc = lines.find('"value":') #Location of the URI's associated value
		    if loc >= 0:
	                uriValue = lines[loc+10:-4]	
	                data[uriName] = uriValue
		        valCheck = False
	    else:
		    loc = lines.find("http://tw.rpi.edu/instances/") # Location of the URI
		    if loc >= 0:
	                uriName = lines[loc-1:-5]		
                        valCheck = True

To do so, a dictionary is created, which is basically the Python associative container, like the map structure in the C++ STL. It starts a loop that reads the file, line by line. The else block is the first one that should trigger, as it finds the first line of data for each record (the location URI, in this case). Once found, it uses valCheck to signify that the next line must contain the value associated with the location. This loop is specifically tailored to read the output of the endpoint, and would have to be changed anytime the output changes significantly. However, the actual changes would not take that long, thanks to the consistent formatting of the endpoint output. Another note about the code is the way the actual saved data depends on the array subscripting, which just cuts out a specific substring…again very specifically tailored to the output, but also very easy to alter. After the dictionary is complete, the next step is to grab all that data and write it into the formatted JSONobject string for the Google visualization API.

jsonObjectList = []

# Column names
jsonObjectList.append("var JSONObject={cols:[{id:'s',label:'Locations',type:'string'},{id:'o',label:'Number of researchers',type:'number'}],rows:[")

# The rest of the JSONobject
for k, v in sorted(data.items()):
	jsonObjectList.append("{c:[{v:")
	jsonObjectList.append(k)
	jsonObjectList.append("},{v:")
	jsonObjectList.append(v)
	jsonObjectList.append("}]},")
jsonObjectList.append("]};")

# Generate full string
jsonObject = ''.join(jsonObjectList)

A list is created to hold the different segments of the string being created, before being joined together at the end. The first addition is the column names, which is hardcoded in. The rest is generic, simply pulling all of the dictionary key/value pairs and outputting that into the correct format.

The largest section is the HTML generation, simply because there are so many lines of the HTML that are hardcoded in. You basically just need to find a Google Visualization example for your desired chart (I used a horizontal bar chart), edit the caption/label/options information to match your visualization, and write it into a string. The string is written in three parts, starting with the first half of the HTML, stopping where the JSONobject string is needed, adding in my newly generated string, then appending the rest of the HTML. Finally, the whole string is written to the visualization.html file and all the files are closed. Done!

The result is a script that you just run and get the ready-to-go HTML file to upload. As noted in my previous blog post, there’s a bunch of advantages to doing it this way. In short, it is much better for future maintenance than translating the endpoint output to the JSONobject manually each time, and it is more robust than a dynamic webpage that tries to reload the results every time it is loaded. This tries to strike a compromise, with a dynamic script that generates a static page whenever an update is needed.

Advertisements