Moving Data - Sheffield Hallam University

Download Report

Transcript Moving Data - Sheffield Hallam University

Moving Data
• The need to move data around is not new,
and will be with us while ever there are
different DBMS
– Internally, between systems:
• Disparate information systems to one EIS
• Transactional to Warehouse
• Upgrading legacy systems
– Externally, between organisations
– Delivering data to clients with browsers
School of Computing and Management Sciences
© Sheffield Hallam University
Processing
Single
database
Create Table B
AS Select *
from Table A
.dmp
.xml
.edi
.csv
.txt
Common
RDBMS
Flat files
of either
variable
or fixed
length
Import/Export
W3C Standard Electronic Data
compliant
Interchange
XML file
conversion
EDI message
conversion
Other data
stores
Line-by-line
processing
Degree of task complexity
Moving Data
• Between applications using same DBMS:
– internal DBMS processes can include:
• Import/Export - caution, this means different things
to different manufacturers!
• Bulk/Batch Copy/Move
• Backup and Restore
• Create …..as select * from…..
– OS level is possible for some (cruder?) DBMS
School of Computing and Management Sciences
© Sheffield Hallam University
Moving Data
• Between different environments:
– Enable the internal database functionality of
target by making two systems logically the
same:
• Linking DBMS by internal drivers
• Linking DBMS through external drivers (eg ODBC)
– Dump from source into readable file and then
import to target
School of Computing and Management Sciences
© Sheffield Hallam University
External File formats
• Variable length files
– Need to identify where fields and records
terminate
– Probably the most common is the CSV
• Comma Separated Variable
• most dbms can at least dump to csv file
– EG:
Arnold,Band Music,Dallas Wind Symphony,Junkin,2
Arnold,Film Music,Moscow SO,Stromberg,1
Arnold,Symphonies 5 and 6,Ireland NSO,Penny,1
Arnold,Symphonies 7 and 8,Ireland NSO,Penny,1 © Sheffield Hallam University
School of Computing and Management Sciences
External File formats
• Fixed length files
– Each data field occurs at a set location in a
record, or at a precise offset from position 1
• Watch out for record terminators!
School of Computing and Management Sciences
© Sheffield Hallam University
Moving Data - special cases
• Electronic Data Interchange (EDI)
– facilitates the exchange of business documents
between organisations. Steps are:
•
•
•
•
•
Collect data locally
translate into predefined format
transmit
decode the data
insert into relevant target database
School of Computing and Management Sciences
© Sheffield Hallam University
Moving Data - EDI cont...
• The case for EDI:
–
–
–
–
–
Decreased Operational Costs
Improved Accuracy
Greater Transaction Processing Efficiency
Increased Productivity
Improved payment cycles ?????
School of Computing and Management Sciences
© Sheffield Hallam University
Moving Data - EDI cont...
• The case against EDI:
–
–
–
–
–
Highly formalised
High cost of entry
Often industry specific
For historic reasons, often utilises a VAN
EDI implementation can often fall short of expectations
because 20%-40% of trading partners can fail to come
on board (Chute, 1996)
– Kalakota and Whinston, 1996 estimate less than 5% of
US businesses utilise EDI
• But what about the latest e-trends, don’t they cry
out for EDI?
School of Computing and Management Sciences
© Sheffield Hallam University
Moving Data - XML
• Extensible Markup Language (XML)
– “A Web-based markup language that could give
EDI the kick-start it needs…” EDINews
– XML is a language which allows users to create
their own markup language which can describe
one particular content
– It can run on any platform
– Closely related to HTML, via SGML
– Non proprietary - W3C
School of Computing and Management Sciences
© Sheffield Hallam University
XML reminder
• "XML is a human-readable, machine-understandable,
general syntax for describing hierarchical data,
applicable to a wide range of applications,
databases, e-commerce, Java, web development,
searching, and so on. "
– Oracle OTN website, my italics
• Building blocks are: Tags, Elements and attributes
• Separates Data from validity checking and from
presentation
– by using DTD or Schemas to validate, and stylesheets to present
School of Computing and Management Sciences
© Sheffield Hallam University
XML: why do it?
• Because the rest of the world is doing it!
– many tools now available
– Rdbms vendors support its use
• Because it has become the de facto answer to the
need to exchange data between disparate systems
• Because it is self-describing and thus can be
universally used
• XML is an Open standard
• It is platform, database and application independent
• Browser support means cheap delivery
School of Computing and Management Sciences
© Sheffield Hallam University
XML: why not to do?
• Performance
• Massive increase in file sizes (over binary, or even
over CSVs)
• Increased skills level required to query compared
with SQL (arguable)
• New skillset required
School of Computing and Management Sciences
© Sheffield Hallam University
Issues with XML in RDBMS
• Paradigm conflict:
– Hierarchical V Relational
• XML is not designed to be queriable.
– Xpath not as powerful as SQL?
– Whole new syntax to learn
• XML is not normalised
• ISO SQL does not lend itself to generating XML
output
School of Computing and Management Sciences
–http://www.w3.org/TR/xpath
© Sheffield Hallam University
How to store XML in RDBMS
• Assuming XML inbound, do you:
– break it up and store it in relational tables
– store the document as a CLOB (Character Large Object datatype)
– shred and store in a structured way, applying DTD or Schema rules?
• Depends how it will be used:
– part of an existing rdbmd-based system which allows users to do ad hoc
SQL queries
– just to forward on, as XML, to other users but seldom queried
– Held for use by XPath conversant users
– heavily CRUDed
– As permanence for Web-based Distributed Authoring and Versioning
(DAV). Extensions to HTTP which allows users to collaboratively edit
and manage files on remote web servers.
School of Computing and Management Sciences
© Sheffield Hallam University
XMLType in XMLdb
drop table xml_chum ;
create table xml_chum (somexmldata xmltype) ;
Insert into xml_chum Values (
XMLTYPE.createXML('<CHUM><RegistrationID>1</RegistrationI
D><ChumID>st2</ChumID></CHUM>')
);
Insert into xml_chum Values (
XMLTYPE.createXML('<CHUM><RegistrationID>2</RegistrationI
D><ChumID>st4</ChumID></CHUM>')
);
select XMLTYPE.extractValue(somexmldata,'/CHUM/ChumID')
from XML_CHUM
where existsNode(somexmldata,'/CHUM[RegistrationID="2"]') = 1 ;
School of Computing and Management Sciences
© Sheffield Hallam University
Outputing XML with XMLdb
SET ECHO off
set pagesize 0
set long 10000
SET VERIFY off
SET FEEDBACK off
spool F:\OScripts\xml\CD.xml
select '<?xml version="1.0"?>' from dual ;
select '<ROWSET>' from dual ;
SELECT
xmlelement("CD", xmlforest(a.composer, b.title,
b.cdid))
FROM composers a, cds b
where a.composerid=b.composerid
and a.composerid < 10 ;
select '</ROWSET>' from dual ;
spool off ;
Moving Data - XML
• But the W3C Recommendations for XML
include:
–
–
–
–
It shall be straightforward to use XML over the Internet.
XML shall support a wide variety of applications.
XML shall be compatible with SGML.
It shall be easy to write programs that process XML
documents.
– The number of optional features in XML is to be kept to an
absolute minimum, ideally zero.
– XML documents should be human-legible and reasonably
clear.
– Terseness in XML markup is of minimal importance.
School of Computing and Management Sciences
© Sheffield Hallam University
Moving Data - summary
• Laudable XML aims, but how data-centric are
they?
• Is anything more portable than a CSV (with
header)?
• However you wrap it up, a computer-based
data handling system needs to convert
information into ons-and-offs!
School of Computing and Management Sciences
© Sheffield Hallam University