November 21, 2002
Lambda Join Demonstration Wins Award at Supercomputing 02 Conference
Baltimore, Maryland. Project DataSpace, in a collaborative project
with researchers from Chicago, Ottawa and Amsterdam, has won the
SuperComputing '02 High Performance Bandwidth Challenge Award for
Innovative, High Speed, Data Correlation--Best Use of Emerging
Infrastructure. The group includes researchers from the National Center
for Data Mining at the University of Illinois at Chicago (UIC), CANARIE,
and SARA, who have been working together over the past year to produce
real-time merging of data over lambda networks. At SC02, they presented
the first demonstration of the technology, with impressive results.
For the past two decades, database researchers have optimized the
ability of databases to join two tables in a database by a common key,
such as an employee or product ID. Database joins are one of the key
technologies that make data processing practical.
As more and more data is distributed over the internet, the ability to
join data located in two different global locations is becoming
critical. There are two fundamental problems: finding efficient
protocols to move data over long distances and finding efficient
algorithms to merge two data streams. At the Supercomputing '02, significant progress was made on both fronts.
A stream of data was moved over SURFnet connecting a cluster of
computers at SARA Computing and Networking Services in Amsterdam and a
cluster of computers at StarLight in Chicago at over 2.8 Gb/s. At the
same time a stream of data was moved over Canada's CA*net4 network
connecting a computer cluster at CANARIE in Ottawa and a UIC computer
cluster at StarLight in Chicago at over 2 Gb/s. Both streams used a new
protocol called SABUL designed for high performance data transport
developed by the National Center for Data Mining/Laboratory for Advanced
Computing at the University of Illinois at Chicago.
At the same conference, using computer clusters at the StarLight
facility in Chicago, two streams of data were merged at over 500 Mb/s
per node in the three node cluster. These so called "lambda joins" are
an important component for distributed data mining applications. The
algorithm for joining two lambda streams was developed by scientists at
the National Center for Data Mining at the University of Illinois at
Chicago.
"Lambda data joins are an excellent early example of how CA*net4's
lightpath provisioning facility can be used to help build new and
innovative distributed services,' according to Bill St. Arnaud, Senior
Director for Advanced Networks at CANARIE.
To many network engineers, lambda and lightpath are used interchangeably
to describe a low layer end-to-end dedicated communications channel of
effective guaranteed bandwidth. Using protocols such as SABUL, it is now
possible to use lambdas to move large data sets over long distances as
fast as the data can be pulled from disk. Using lambda joins, it is now
possible to merge two such streams and look for patterns.
"With lambda joins, it is now practical to look for correlation in
data even if the data is scattered around the world," said Robert
Grossman, Director of the National Center for Data Mining at the
University of Illinois at Chicago and President of the Two Cultures Group.
This demonstration was awarded one of the three Quest Bandwidth
Challenges Awards presented at this year's Supercomputing 02 Conference.
For more information, contact:
Shirley Connelly, Associate Director, NCDM
312 413 2176, connelly@uic.edu.
Robert Grossman Director, NCDM
312 413 2176, grossman@uic.edu.
National Center for Data Mining
The National Center for Data Mining (NCDM) at the University of
Illinois at Chicago (UIC) was established in 1998 to serve as a national
resource for high performance and distributed data mining. The Center
sponsors research projects, standards, testbeds, and outreach. The
Center is coordinating the development of the Predictive Model Markup
Language (PMML), the standard for data mining models, and sponsoring the
Terra Wide Data Mining Testbed, a worldwide testbed for high performance
and distributed data mining. For more information about NCDM, see
SURFnet
SURFnet operates and innovates the national research network, to which
two hundred institutions in higher education and research in the
Netherlands are connected. To remain in the lead SURFnet puts in a
sustained effort to improve the infrastructure and to develop new
applications to give users faster and better access to new Internet
services. For more information please visit www.surfnet.nl. For SARA,
see www.sara.nl.
SARA Computing and Networking Services
SARA is the Dutch National Supercomputing Facility. SARA provides High
Performance Computing and Networking Services and Visualization
(including Virtual Reality) facilities to the Dutch Academia and
Research institutions, and to commercial business. SARA is a
not-for-profit foundation. SARA does the day-to-day operational
management of the SURFnet network.
CANARIE, Inc.
CANARIE is Canada's advanced Internet development organization, a
not-for-profit corporation supported by its members, project partners
and the Government of Canada. Canarie's mission is to accelerate
Canada's advanced Internet development and use by facilitating the
widespread adoption of high-performance, end-user enabled networks and
by stimulating the development of new, next generation products,
applications and services to run on them. Following a $110M funding
agreement with Industry Canada, Canarie, Inc. designed, developed and is
operating CA*Net 4, Canada's national research and innovation network. For more information, visit www.canarie.ca.
StarLight
StarLight(sm), the optical STAR TAP(sm) initiative, is an advanced
optical infrastructure and proving ground for network services optimized
for high-performance applications. Operational since summer 2001,
StarLight is a 1GigE and 10GigE switch/router facility for
high-performance access to participating networks and will ultimately
become a true optical switching facility for wavelengths. StarLight is
being developed by the Electronic Visualization Laboratory (EVL) at the
University of Illinois at Chicago (UIC), the International Center for
Advanced Internet Research (iCAIR) at Northwestern University, and the
Mathematics and Computer Science Division at Argonne National
Laboratory, in partnership with Canada's CANARIE and Holland's SURFnet.
For more information please visit www.startap.net/starlight.
-----------------------------------------------------------------
Shirley Connelly Tel: 312-413-2176
Associate Director Fax: 312-355-0373
Laboratory for Advanced Computing http://www.lac.uic.edu
National Center for Data Mining http://www.ncdm.uic.edu
Univ. of Illinois at Chicago
322 SEO, M/C 249 Email: shirley@lac.uic.edu
851 S. Morgan Street Email: connelly@uic.edu
Chicago, IL 60607-7045
--------------------------------------------------------------------
Project DataSpace: http://www.dataspaceweb.net
---------------------------------------------------------------i2-news-+
For list utilities, archives, subscribe, unsubscribe, etc. please visit the ListProc web interface at
http://archives.internet2.edu/
---------------------------------------------------------------i2-news--