Report - Valora Technologies

Download Report

Transcript Report - Valora Technologies

11/2000 – 11/2013
13
YEARS
How AutoIndexing Works
The Steps before BlackCat Data Visualization
January, 2014
Confidential
Who We Serve
• Corporate Legal Departments with complex
document/data/content management needs
–
–
–
–
Litigation
Compliance
Records
Information Governance
• Government Agencies with limited resources for
document/data/content monitoring, analysis,
management
–
–
–
–
Litigation
Investigations
Compliance
Records
• Law firms and Service Providers who support these
entities
What We Do
• Utilize technology to understand and interpret
documents (or files, or records, or streamed text, etc.)
This
presentation
centers
around
Tagging
– Probabilistic Hierarchical Context-Free Grammars
– Statistical Pattern-Matching
• Tag documents with as many attributes and indices
as possible
• Analyze those tags, along with text, and other clues,
to provide a disposition on documents
• Report the results in a variety of ways.
Ultimately, Valora is a Consulting Service Provider, utilizing our
own, highly customized tools to deliver excellent, timely and
highly cost-effective work product to our clients.
Many Levels of Analytics & Data Mining
Character
Word
Phrase
Line
@
Covenant
Attorney-client
privilege
SKU 2465 @
$3.41 ea.
Sub-population
Paragraph
Population
Page
Document
CrossPopulation
Multiple
Documents
Valora’s Proprietary Technology
REPORTING
BlackCat, Relativity, .CSV …
ANALYSIS/RULES
Year Total, Hot Doc, Priv…
INDEXING/TAGGING
Date, Author, Patent # …
PowerHouse
INDEXING/TAGGING
DocType = Patent Application
Date = 10/18/2007
Date Format = US
Author = Patent Authors,
Author City, Author Country
Assignee = RIM
Tone = Neutral to slightly positive
Embedded Graphic with Title
Other Data Capturable Data Elements:
• Patent Number
• Filing Date
• Key Phrases & Terms
• Managing PTO
• Implied/Attached Docs
• Bar Code Present
• And many more . . .
AutoCoding Defined
• AutoCoding is the application of software and technique to
capture information about a document.
– Bibliographic Fields: Author, Recipient, CC/BCC, Date, Subject/Title,
Document Type.
– Characteristic Fields: Draft, Confidential, Foreign Language, Pages
Missing, Duplicate/NearDuplicate, Conversation Thread
• Many flavors of AutoCoding
– All use software to some degree
– All use OCR and extracted text from e-docs
• Generally accepted that AutoCoding is faster and lower cost
than manual coding, but sometimes lower quality
How AutoCoding Works
PowerHouse
AutoIndexing
AutoBusinessRules
Analytics
Database Prep
Docs enter the system as
extracted or OCR’ed text
Data is extracted from
each document into a
database table
Indexing/Tagging
ANALYSIS/RULES
Litigation Document Review Manual
Determining Responsiveness
The document should be marked
responsive if any of the following
conditions are present:
• Mentions or discusses the specific
protocol for handling simultaneous voice
and data actions
• Is a design document or graphic that
shows the specific protocol for handling
simultaneous voice and data actions
• Discusses or is related to patent ‘009
• Mentions Apple Inc. or Apple
Computers, Inc. or is a communication
from/to anyone at Apple Computer, Inc.,
or apple.com.
• And so on…
-7-
Rule: Responsive for Protocol Discussion
When: [FullText] contains any of <Voice protocol key
phrases 12> and [FullText] contains any of <Data
protocol key phrases 25> and [DocType] is not any of
[Brochure, Press Release, Website], ...
Rule: Responsive for Patent ‘009
When: Any document in the Attachment Family
matches: [FullText] contains any of <Patent '009 key
phrase list 4>, or Parent of Attachment Family matches:
Any of [Author, Recipient, CCs] contains any of <Patent
'009 experts contact list 23>, …
Rule: Responsive for Apple
When: [FullText] contains (fuzzy match) any of <Apple
key phrase list 7>, or Any of [Authors, Recipients, CCs]
contains any of <Apple contact list 15>, or [Author]
matches "*@apple.com“ …
Analysis/Rules
REPORTING
• Numerous Reporting Options
– Hosting, Early Case Assessment & DataVisualization
– DataVisualization & Hosting in BlackCatTM
– Hosting in other industry platforms
– Load File Import
– Opticon/LFP, Summation DII
– .CSV, other delimited file
– Loading to proprietary platforms
What would you
– Render to File
like to know?
– PDF, Excel, HTML
– Comprehensive Report
Popular Valora Services
•
AutoUnitization
–
•
AutoCoding
–
•
•
•
•
Ability to identify & markup documents to “black out”
select information (such as PII – private identification
information, patient data or privileged information)
Automatic translation of non-English documents to
English text. Supports dozens of originating languages.
Presentation of data in intuitive, graphical ways with
easy navigation, understanding and manipulation of
document subsets. Often used for Early Case
Assessment.
Hosting of pre- or post-processed documents and files
in Valora’s BlackCat database or others (iConect,
Relativity, etc.).
–
Most services available as Auto, Auto+Manual “Hybrid,” and Manual-Only.
Call for specifics.
•
Image conversion for paper documents into
electronic image format (TIF, PDF, JPEG, etc.)
AutoBusinessRules
Hosting & Database Creation
–
Join separated email conversation threads into a
consistent stream from start to finish
Scanning
–
•
Identify documents that are highly similar, if not
identical across custodians and the entire
population. Includes cross-correlation of paper &
electronic documents
EmailThreadGrouping
–
•
Optical Character Recognition for converting
images to searchable text
NearDuplicateDetection
–
•
File Conversion to TIF/PDF format, text and
metadata extraction, de-NISTing, cross-custodian
de-duplication, filtering/culling, analytics
OCR
–
DataVisualization
–
•
–
AutoTranslation
–
•
Identify and labeling documents by groupings
(dupes/near dupes, conversation threads,
issues/clustering) and disposition (responsive,
privileged, “hot,” etc.)
Electronic File Processing (EFP)
AutoRedaction
–
•
Identify and label documents by type (balance sheet,
tax form, memo, etc.), relevant people (authors,
recipients, cc/bcc), date and subject/title.
AutoReview
–
•
Ability to distinguish the beginning & end of
documents, as well as determine which documents
incorporate other documents as attachments
Identify and label documents by workflow
treatment, retention plans, compliance audit or
other groupings.
Professional Services
–
Options for Project Management, Technical
data/file manipulation, Subject Matter Expertise,
Resources & Worfklow Design & Management
Don’t take our word for it,
take theirs…
Thank You!
For More Information:
Valora Technologies, Inc.
101 Great Road, Suite 220
Bedford, MA 01730
781.229.2265
www.valoratech.com
[email protected]