Advanced Campus Services
Directory Services Project
Several of us met and discussed foci for remainder of term (this is not complete list).
1. Tools on Web . This is critical to our research work and specifically to the ITR grant.
ACTION : Rishi, Taruna, Manish, Roop, Susan (web components architecture) all have roles here.
2. WordNet . Susan provided preliminary review of WordNet. We believe that implementing of some LDAPnet (or BioinformaticsNet) can be promising approach to knowledge discovery.
Look at second URL below - its mission seems aligned with our clustering, reference set work.
ACTION : Susan, Lei. Susan agreed to give a report at next biweekly (see below)
ACTION : Perhaps Nicole and Anish can provide source data in their Grid Projects Related DB??
3. LDAP COI . Implementation of version 0.1 should be completed this term. We think this will provide the annotation function that we've planned and is important to ITR, SEIII and NIH grants/proposals.
ACTION : Jijie, Roop, Anish?
4. Experiments . We must proceed. Lei said (earlier in Monday meeting) that he was looking at Genetic Algorithm to discover SOM parameter value sets. I will draft up experiment proposals (we did this before, see page 2) and we'll agree on assignments and deadlines and report on next meeting.
ACTION : Lei, Jijie, Susan, Roop, Nova, Imran
Next meeting: Wednesday March 17 , 10am Classroom South 514
From September 16, 2003 meeting agenda (reporting on Sept 2 meeting)
On Tuesday September 2, 2003, Vijay, Lei, Jijie, and Art talked about what experiments may be appropriate to our current state of work. We mentioned some possible candidates and agreed to continue thinking about this. The following comments are from Art's notes:
We noted that "reference sets" was a potentially rich area for work (see Vijay's Aug 24 email attached).
Experiments with human subjects (say, experts validating clustering or users using an interface a là Roussinov) would need Institutional Review Board (IRB) approval. Since the IRB process is being reviewed (and tightened up), we might need to adjust.
We discussed the fact that simple heuristics may be "just as good" as SOM - i.e. is the clustering that results from SOM really just a reflection of inheritance? If so, heuristic algorithm that develops "inheritance tree" may be sufficient - just observe the resulting "branching nodes."
The challenge with clustering of attributes' metadata is that the metadata is sparse: OID, NAME, SYNTAX, DESCRIPTION. And latter two are perhaps limited distinguishing factors (being "one of only several different values.")
With that said, here are some possible experiments:
a) repeat the 320 SOM experiment exactly. Confirm results
b) Conduct the 320 SOM experiment, except using <Novell, OpenLDAP, SecureWay> objects.
c) Conduct a) or b) but vary the domain of the 320 values (x, y, neighborhood size, iterations)
d) SOM using attributes for <iPlanet, Novell, OpenLDAP, SecureWay>.
e) Treat whole schema (thanks Susan Qu). i.e. Objectclasses, attributes, matching rules. and cluster. This has aspects of "DNA (directory node analysis) signature. Hypothesis: using same configuration (same LDAP, same SOM parameters) results in exactly same mapping.
f) Continue with genetic algorithm.
g) Find reference set as intersection of <iPlanet, Novell, OpenLDAP, SecureWay>. Conduct clustering using this reference set as the "expert solution" to achieve. Compare to results of experiments b) or c) which used a "universal vector."
h) Find reference set as <person, organizationalPerson, inetOrgPerson, eduPerson, other_ eduPerson>. Conduct clustering using this reference set as the "expert solution" to achieve. Compare to results of experiments b) or c) which used a "universal vector." Hypothesis: at least from perspective of specific filter , this then clusters appropriately. I.e. "As an expert in a certain area (person info), I'm only interested in those objects anyway."
i) Calculate distances of resulting SOM objects mappings (i.e. don't just use fixed rectangular matrix) and determine clustering. Compare for different LDAPs. Hypothesis: we can determine clusters more accurately.
i) Using similar calculation of object distances on resulting mappings, determine threshold of "nearness" that identifies clusters. If g) or h) reference sets have a certain nearness factor, is that helpful?
j) Consider reference sets that aggregate. Start with core, cluster, include new items in core that are close, recluster. is there a reasonable point at which one now has robust reference set that works in general? Hypothesis: we can build around an "armature" and soon the form becomes self-evident.
January 12, 2004 discussion and agreement.
Tasks that need to be finished by Feb 6, 2004.
These tasks are based on the write-up that each one of us presented at the end of the previous semester.
Grid: (Anish, Nicole, Nova, Imran)
· Catalog: in database, front end need assistance from Susan and Rishi if required. (Anish, Nicole)
· MPI: ACS, UAB (Nova)
· Dr. He's lab: need to get the four PC's on the grid
· ACS grid: Install Cygwin, Globus, MPI
· X1 grid: Globus, MPI
· Grid Portal: ACS + Dr. He's Lab, X1 try to get them talk or communicate (Imran)
Semantic Facilitator TM SM SFA Tool: (Susan, Rishi)
· Parameters: get them working
· SFA: some progress (Rishi)
· SOM graphing: Genetic Algorithm (Susan, Rishi)
Database: (Manish, Roop)
· LDAP schemas: want them done should be on "web services"
· Catalog data: in database, able to tokenize, SOM
Algorithm, SOM, Parallel, Genetic Algorithm: (Lei, Susan, Nova, Taruna, Jijie)
· LSA /LSI- write-up and demo (Taruna)
· Experiments: Reference Sets i.e. when you apply SOM to the subset of the reference set, see if it is working.
· Genetic Algorithm: tool option use ACS
Last Updated: March 2, 2006