introduction

Download Report

Transcript introduction

Confidential Meta Data in
Presentations and Documents
Mads R. Dahl & Søren K. Kjærgaard
Section for Health Informatics, Institute of Public Health, University of Aarhus, Denmark
METHODS
INTRODUCTION
The Internet has given the opportunity to share findings and distribute
knowledge from experts regardless of time and place. Communication between
scientists, institutions, companies, and private persons have adapted to the
infrastructure of the Internet. To prevent exposing confidential information high
level security policies (HLSP) have been established to guard and protect the
confidential data (1). Over the years, technologies, high encryption level
software, and data management procedures have been developed and
implemented (2) together with firewalls and server protection measurements (3).
Unfortunately, one major threat towards confidential data has so far been
overlooked. Thus a backdoor to confidential data and information has been left
open.
Data mining was conducted on the general database Google. The search was
restricted to selected areas, using keywords such as: Health, Health Care or Health
Informatics. The searching criteria was further restricted using the command:
filetype:ppt, filetype:pps or filetype:doc.
After downloading a presentation or a document it could be viewed, read or
forwarded to colleagues and network partners. Sample files were examined for
confidential data or non-intentional linkage of meta data.
After opening the file in the relevant application, it was examined for graphs and
table content. Slides or pages containing graphs or tables were tested by double
clicking the objects. Objects created using the shortcut keys for copy/paste (Ctrl + C
followed by Ctrl + V) were hereby identified.
1. Ilioudis C, Pangalos G. A Framework for an Institutional High Level Security Policy for the Processing of Medical Data and their Transmission through the Internet J Med
Internet Res. 2001;3(2):e14
2. Albisser AM, Albisser JB, Parker L. Patient Confidentiality, Data Security, and Provider Liabilities in Diabetes Management. DIABETES TECHNOLOGY & THERAPEUTICS
2003;5(4):631-403
3. Norifusa M. Internet security: difficulties and solutions. International Journal of Medical Informatics 1998;49(1):69-74
Figure 1
Figure 2
Health Informatics
Process
Data
Information
Knowledge
Average protein level (µg/ml) i human
plasma
Test Graph
Download this poster from:
www.hi.au.dk
Double click on graph
RESULTS
It became clear within hours of researching the problem that many
scientists were not aware of the mistakes made. Depending on the
topic, up to 5% of the presentation or documents available on the
Internet contained objects that were linked to original datasheets.
The development on this topic will be monitored and researched
further.
1,6
1,4
1,2
1
0,8
0,6
0,4
0,2
0
Mean
X233
MASP-3
1,464252553
1,68866171
CONCLUSION
Unfortunately, when information based on confidential data is uploaded, a serious and nonintentional mistake may take place. The mistake is made during the production of the document
in the following way:
Research data are statistically manipulated and made into informative graphs, charts or tables
using MS Excel or MS Word. One graph from the spreadsheet application may be copied and
pasted into a PowerPoint presentation using the shortcut keys for copy/past (Ctrl + C followed
by Ctrl + V).
The final presentation is published on the Internet or distributed in the following file formats:
PowerPoint Presentation (.PPT), PowerPoint Show (.PPS) or as a Word document (.DOC).
Documents converted into the Portable Document Format (PDF) do not contain the metadata
and are therefore recommended as file type for distribution. Alternatively the Paste special
menu can be found under Edit in the menu bar of PowerPoint applications. By pasting the
graph or table into MS PowerPoint or MS Word as a picture the metadata are also excluded.
E.Mail: [email protected]
University of Aarhus, Health Informatics: WWW.HI.AU.DK
A basic PowerPoint presentation containing 10 slides with few
graphical elements have a file size of approximately 50 – 75 kilobits.
The same presentation containing an additional graph (Figure 1),
pasted as an object linked to the original dataset consisting of more
than 120.000 data points, would have a total file size of 150 kilobits.
Average concentration (µg/ml)
1,8