November 21, 2002

 

 

Lambda Join Demonstration Wins Award at Supercomputing 02 Conference

 

Baltimore, Maryland.  Project DataSpace, in a collaborative project

with researchers from Chicago, Ottawa and Amsterdam, has won the

SuperComputing '02 High Performance Bandwidth Challenge Award for

Innovative, High Speed, Data Correlation--Best Use of Emerging

Infrastructure. The group includes researchers from the National Center

for Data Mining at the University of Illinois at Chicago (UIC), CANARIE,

and SARA, who have been working together over the past year to produce

real-time merging of data over lambda networks. At SC02, they presented

the first demonstration of the technology, with impressive results.

 

For the past two decades, database researchers have optimized the

ability of databases to join two tables in a database by a common key,

such as an employee or product ID.  Database joins are one of the key

technologies that make data processing practical.

 

As more and more data is distributed over the internet, the ability to

join data located in two different global locations is becoming

critical.  There are two fundamental problems: finding efficient

protocols to move data over long distances and finding efficient

algorithms to merge two data streams.  At the Supercomputing '02, significant progress was made on both fronts.

 

A stream of data was moved over SURFnet connecting a cluster of

computers at SARA Computing and Networking Services in Amsterdam and a

cluster of computers at StarLight in Chicago at over 2.8 Gb/s.  At the

same time a stream of data was moved over Canada's CA*net4 network

connecting a computer cluster at CANARIE in Ottawa and a UIC computer

cluster at StarLight in Chicago at over 2 Gb/s. Both streams used a new

protocol called SABUL designed for high performance data transport

developed by the National Center for Data Mining/Laboratory for Advanced

Computing at the University of Illinois at Chicago.

 

At the same conference, using computer clusters at the StarLight

facility in Chicago, two streams of data were merged at over 500 Mb/s

per node in the three node cluster. These so called "lambda joins" are

an important component for distributed data mining applications.  The

algorithm for joining two lambda streams was developed by scientists at

the National Center for Data Mining at the University of Illinois at

Chicago.

 

"Lambda data joins are an excellent early example of how CA*net4's

lightpath provisioning facility can be used to help build new and

innovative distributed services,' according to Bill St. Arnaud, Senior

Director for Advanced Networks at CANARIE.

 

To many network engineers, lambda and lightpath are used interchangeably

to describe a low layer end-to-end dedicated communications channel of

effective guaranteed bandwidth. Using protocols such as SABUL, it is now

possible to use lambdas to move large data sets over long distances as

fast as the data can be pulled from disk.  Using lambda joins, it is now

possible to merge two such streams and look for patterns.

 

"With lambda joins, it is now practical to look for correlation in

data even if the data is scattered around the world," said Robert

Grossman, Director of the National Center for Data Mining at the

University of Illinois at Chicago and President of the Two Cultures Group.

 

This demonstration was awarded one of the three Quest Bandwidth

Challenges Awards presented at this year's Supercomputing 02 Conference.

 

 

 For more information, contact:

 

 Shirley Connelly, Associate Director, NCDM

 312 413 2176, connelly@uic.edu.

 

 Robert Grossman Director, NCDM

 312 413 2176, grossman@uic.edu.

 

 

National Center for Data Mining

 

The National Center for Data Mining (NCDM) at the University of

Illinois at Chicago (UIC) was established in 1998 to serve as a national

resource for high performance and distributed data mining. The Center

sponsors research projects, standards, testbeds, and outreach.  The

Center is coordinating the development of the Predictive Model Markup

Language (PMML), the standard for data mining models, and sponsoring the

Terra Wide Data Mining Testbed, a worldwide testbed for high performance

and distributed data mining. For more information about NCDM, see

www.ncdm.uic.edu.

 

 

SURFnet

 

SURFnet operates and innovates the national research network, to which

two hundred institutions in higher education and research in the

Netherlands are connected. To remain in the lead SURFnet puts in a

sustained effort to improve the infrastructure and to develop new

applications to give users faster and better access to new Internet

services. For more information please visit www.surfnet.nl. For SARA,

see www.sara.nl.

 

 

SARA Computing and Networking Services

 

SARA is the Dutch National Supercomputing Facility. SARA provides High

Performance Computing and Networking Services and Visualization

(including Virtual Reality) facilities to the Dutch Academia and

Research institutions, and to commercial business. SARA is a

not-for-profit foundation. SARA does the day-to-day operational

management of the SURFnet network.

 

 

CANARIE, Inc.

 

CANARIE is Canada's advanced Internet development organization, a

not-for-profit corporation supported by its members, project partners

and the Government of Canada. Canarie's mission is to accelerate

Canada's advanced Internet development and use by facilitating the

widespread adoption of high-performance, end-user enabled networks and

by stimulating the development of new, next generation products,

applications and services to run on them. Following a $110M funding

agreement with Industry Canada, Canarie, Inc. designed, developed and is

operating CA*Net 4, Canada's national research and innovation network. For more information, visit www.canarie.ca.

 

 

StarLight

 

StarLight(sm), the optical STAR TAP(sm) initiative, is an advanced

optical infrastructure and proving ground for network services optimized

for high-performance applications. Operational since summer 2001,

StarLight is a 1GigE and 10GigE switch/router facility for

high-performance access to participating networks and will ultimately

become a true optical switching facility for wavelengths. StarLight is

being developed by the Electronic Visualization Laboratory (EVL) at the

University of Illinois at Chicago (UIC), the International Center for

Advanced Internet Research (iCAIR) at Northwestern University, and the

Mathematics and Computer Science Division at Argonne National

Laboratory, in partnership with Canada's CANARIE and Holland's SURFnet. 

For more information please visit www.startap.net/starlight.

 

 

 -----------------------------------------------------------------

 Shirley Connelly                      Tel: 312-413-2176           

 Associate Director                    Fax: 312-355-0373                                              

 Laboratory for Advanced Computing     http://www.lac.uic.edu

 National Center for Data Mining       http://www.ncdm.uic.edu

 

 Univ. of Illinois at Chicago                                      

 322 SEO, M/C  249                      Email: shirley@lac.uic.edu                          

 851 S. Morgan Street                   Email: connelly@uic.edu                           

 Chicago, IL 60607-7045                                             

--------------------------------------------------------------------

 

Project DataSpace:  http://www.dataspaceweb.net

 

 

---------------------------------------------------------------i2-news-+

For list utilities, archives, subscribe, unsubscribe, etc. please visit the ListProc web interface at

 

    http://archives.internet2.edu/

 

---------------------------------------------------------------i2-news--