Tag Archive: Meeting


Today’s meeting was interesting in that it ended half an hour earlier than usual, but fit in twice as many topics as usual, although that may have been because the first few were summaries and reminders.

The meeting started with a thanks to everyone who came to the Drupal Site Hackathon, with some discussion of the work that was done, the quality (and lack thereof) of the pizza, and some final tasks that need to be looked at such as the RDFa parsing that is needed to make an improved editor GUI. After that, Tim came up once again to remind everyone to keep an eye on the lab’s efforts to coordinate the SPARQL endpoint information and management, with some explanation of how that was working out so far. The last of the organizational talk was about the planned visual presentations on the monitors that will be placed around Winslow for visitors to be able to see our work on. It didn’t really seem to differ much from the initial meeting, seeing as how there are still debates over whether an approach using powerpoint slideshows, web slideshows, interactive demos, or Concerto should be used.

Next, Dominic talked a bit about the recent trip to D.C and Professor Hendler talked about the great rising interest that is being shown in semantic applications and how large numbers of people stopped him to ask about where and how they could get them. The TWC professors emphasized also that, in the future, all trips ought to be accompanied by a blogpost outlining what they did/saw/heard/thought was interesting/etc.

Finally, someone from dotCIO came to talk about their own approach to the decentralized data of the campus. They have a huge amount of different Content Management Systems and so their solution is to aggregate all of the data and deaggregating it in various ways for individual purposes. The example shown was the way they aggregated all of the building names/data to allow easy tagging to each building, and how this same method applies to people, where everyone’s RCSID’s are annoying to look up and this system allows for much easier tagging. He noted that although this is not as optimal as a semantic solution, they were looking to add applications of things like RDFa.

After that, the meeting ended early so that we could welcome two new people working with the lab to manage finance and funding issues (I think).

I didn’t actually make a post about last Friday’s AHM, as it was almost entirely organizational stuff, recaps, and a presentation on some health ontology application work. Much of this week’s was similar, with meetings and deadlines being discussed as well as a congratulatory speech by Professor Hendler for the collaboration and hard work that he is seeing, which led up to the success at the ISWC 2010.

Patrick also gave a presentation giving a general overview of some of the work he has been doing in Australia with CSIRO, a central scientific organization there. Specifically, he was working with a group looking at applying provenance to a hydrology project, where they use a sensor network to gather and process data to be used in models and forecasts for water usage predictions and decisions. In order to justify and defend the results of such a project, they were looking into provenance to help provide information on what leads to the trust in the results. He mentioned that a big discussion on this was between external and internal provenance, where the CSIRO people could work on provenance within their own scope, but more external provenance that is closer to the actual sensors would be under the jurisdiction and responsibility of other groups. Other issues that he mentioned were in the problem space, provenance collection points, modeling methods, and infrastructure design and development. He also noted that they also helped to provide motivation, validation, and confidence by being there to talk about and really present the concepts in a way that helped bolster the Australian team’s efforts.

Today’s AHM was split into talking about thw ISWC 2010 Demo as well as a presentation about having user annotations in provenance.

Jie started out by giving an overview again of the Semantic Dog Food goals and ISWC 2010 dataset. Alvaro showed his mobile browser, and I presented my filtered browser. It was pretty short, since most of the functions work the same in between areas, so I just showed a bit of what the various displays look like and how the navigation works. No one seemed to have any questions or suggestions, so I didn’t really go into detail about any of the implementation.

Before the meeting, I worked on some more aesthetic changes, as well as enabling local links to use my browser to go to things like people or papers that show up in the retrieved data as objects. I also noticed an issue where any data that had colons in it, such as the literals owl:SameAs, were cut off because of the way I used strtok. I was unable to finish fixing this before the meeting, and I think Professor McGuiness actually noticed, because she asked about the made predicate, one of the areas affected. I did finish fixing this afterwards, however, as well as a similar issue with single quotes, where the single quotes were breaking both the query and the search URL. Professor Hendler noted that I should move the demo to a more appropriate server than my CS account (Evan noted that it’s especially important since the CS servers have been breaking all semester), so I’ll have to look into that.

Some screenshots from the demo:




 

So, during today’s All-Hands Meeting, we first went over some of the planning from last time, which included the ISWC 2010 visualization planning. As it turns out, the problem with all the date/time information was fixed, so it is now available for my demo idea, and I checked with Evan to make sure that the endpoint I was using is updated/not going to randomly vanish. The presentation during the meeting was an evaluation of various methods tested on Smart Grid technology for inference rules. It went through some different features, such as forward/backward chaining, built-in rules, and subclass relationships and the effects of them on performance and usability.

I also tried to think about my plans for the browse/search demo, which I have definitely decided to attempt to make. I don’t know how far I will get with it, but at the very least, I want to finish some basic browsing capabilities, where the user can click through links to easily access the information, as well as a basic time/schedule display with some filtering capabilities such as by specific date ranges, specific papers/workshops, or something similar. I worked some more on the skeleton/framework code, which is still changing a lot as my plans change on how to implement it, but I think it is almost to where I can start the actual functionality (parsing/displaying).

It’s still really vague, but it’ll become clearer as I work on it and see what I can actually feasibly finish implementing.

Yesterday’s All-Hands meeting consisted mostly of two presentations and a few brainstorming sessions. The first brainstorm was about ideas for making the lab areas more interesting so visitors can get a good first impression. The ideas were mostly based around a Concerto-like system, where TV’s could be placed around the building. Unlike Concerto, most of the suggestions involved sound and video instead of still slideshows, although an overview slideshow was also discussed.

After that discussion, Jie talked about the Semantic Web Dog Food project, which is an attempt to gather semantic metadata for semantic data conferences, workshops, etc. The name reflects its motivation, since it is based on the saying that people working on something should “eat their own dog-food” and use what they make. The project can be found at data.semanticweb.org. Jie talked about how the hope is to use it to aid in looking up people/papers/events before, during, and after conferences, and that it is populated with basic data for people/papers/programs/etc using a variety of methods, including spreadsheets, dumps, pdfs, latex, and online scraping. For an upcoming conference, he was also looking to get ideas for and have people finish usable visualizations, browsing tools, and/or searches, which were discussed at the end of the meeting. Unfortunately, I had to leave in the middle of that end discussion to go to class for a midterm, so I don’t have notes on the ideas and what plans were made.

Afterwards were the two presentations, starting with Tim’s talk on trust in aggregated government data. He noted that the process of data aggregation results in possibly untrustworthy data, due to many reasons, such as the opaque “cloud of conversion”, when the information is taken from the raw sources (reliable) and translated by the aggregator (less reliable). A key factor in resolving this is provenance information, information about the information and its sources and such. This is needed for distinguishing between sources, minimizing the number of manual modifications needed during conversion, and for tracing/attribution of data. For the capture of the conversion provenance, he listed the steps as following redirects, retrieving and unzipping the raw data, manual tweaks of it, the conversion, and the population of the dataset. Finally, he suggested a three part system for dataset organization to achieve this trust using provenance data. Each URI would be reached through …/source/…/dataset/…/version/…, where source would be the organization’s DNS, the dataset is the ID, and the version would be some broad/release date/modification time designation. With this system, he further elaborated on other features that would be used in conjunction with this, such as using interpretation knowledge to go from raw data to parameters instead of naïve CSV conversion, renaming properties, typing of a property’s domain, promoting a value to a resource, and more.

The second presentation was by Johanna, who presented about her work over the summer on automatic generation of implicit links in data.gov datasets. Basically, there is a problem where all the data.gov projects are very local without links to other ones, and there are many ambiguous literals, such as “New York”, and her work was to automatically change these ambiguous literals and turn them into correct links. The program depends on a mapping set and word banks, using methods such as direct matching by regex or keyword as well as approximate matching using edit distance and prefix filtering.

Today’s All-Hands-Meeting was split into two main parts. First were some introductions and a lot of organizational stuff, figuring out who was doing what, with some outlines of current issues with the site migration, such as the site’s search speed (a SPARQL cache in PHP was suggested/delegated). This took a long of the time, so there was only one presentation, which talked about the sameAs construct, and implications/usages of this in terms of very large linked data networks.

The main point seemed to be that using sameAs between pay-level-domains (domain names) rather than between the individual terms allowed for interesting clustering. His examples were based around some category clusterings created by queries using sameAs between domains in a very large dataset from the Billion Triples Challenge.  There were some questions about the assumptions made when using sameAs, such as with the transitivity and symmetric properties when using it.

A point he made at the beginning was how the idea of rdfs:seeAlso evolved into the owl:sameAs construct.

Today I attended my first All-Hands Meeting for the TWC, where several people talked about what they have been working on.  The following is mostly just a summary from my notes during the meeting, since much of the software and languages used are ones I am not familiar with.

The first person summarized what happened at the International Web Reasoning and Rule Systems Conference.  They talked a bit about the SPARQL presentation by Axel Polleres that I had previously looked at when trying to learn SPARQL, which was interesting.  They summarized a lot of points, where the topics included work to go between datalog and description logic, three valued logic (yes/no/unknown) and contradictions, work on a visualization language and translation to concepts, access control on ontologies, and a paper on removing redundancies in RDF, which apparently won Best Paper and had a proof of NP-complete complexity.

After that, the next person talked about work on optimizing real-time RDF data streams, which was basically trying to integrate social/sensor data with the semantic web in real-time, such as having a semantically enabled Twitter widget.  He listed some RDF update formats, such as SPARQL, Talis changesets, delta ontologies, and more, all of which I’ve never really heard of, except for SPARQL.  He also spent some time going over how the data would be transported, using the POST HTTP protocol, or possibly UDP, because it has higher throughput and lower lag, but has more constraints on it.  He noted that HTTP/TCP use persistent connections, but UDP uses small atomic messages, where it allows some data to be lost, but has much faster speed.  He ended with some examples and talk about how such a widget might be implemented and what the limits would be, in terms of throughput, if any.

Finally, the last speaker talked about linked LISP in ontologically oriented programming.  As a project, he had apparently written a LISP interpreter in Java, with semantic web integration.  The architecture was an ANTLR-based grammar, with types from Java and the Jena memory model.  There was some discussion of how the various functors were implemented, and how the semantic web was integrating using the Jena memory model and inference engine.