부정클릭 - Data Mining
Download
Report
Transcript 부정클릭 - Data Mining
부정클릭과 Web mining
WEB DATA의 분류
Content:
– Web pages data: text and graphics.
Structure:
– Intra-page structure: HTML or XML tags
– inter-page structure: hyper-links
Usage:
– IP addresses, page references, the date and
time of accesses.
User Profile:
– user’s demographic information
Difficulties of WEB usage
DATA
Single IP address/Multiple Server Sessions:
– Internet service providers (ISPs) typically have a pool
of proxy servers that users access the Web through. A
single proxy server may have several users accessing a
Web site, potentially over the same time period.
Multiple IP address/Single Server Session:
– Some ISPs or privacy tools randomly assign each
request from a user to one of several IP addresses.
Multiple IP address/Single User:
– A user that accesses the Web from different machines
will have a different IP address from session to session.
This makes tracking repeat visits from the same user
difficult.
Multiple Agent/Singe User:
– a user that uses more than one browser, even on the
same machine, will appear as multiple users.
부정클릭
Internet scammers steal
money with 'click fraud.‘ by
Newsweek
When he tried to expand into Germany, Nehoray
found that his site was getting lots of new
visitors but unusually few paying customers.
Nehoray (who prefers we don't name his
company) analyzed his Internet logs and made
an unsettling discovery. Someone—perhaps a
competitor—had written a simple software
program that relentlessly clicked on his ads,
burning up his ad budget and pushing his links
off the search sites by lunchtime each day. After
spending weeks complaining to Google about
the problem and getting a partial refund, he
finally yanked the ads. "It was really bad," he
says, estimating that he lost $50,000 in potential
business. "Nobody knows how to solve this
problem."
SEMPO (Search Engine
Marketing Professional Office)
대부분의 marketer들이 문제점을 인식하지
만 해결하려는 의지는 미흡
SEM 회사 중 1/3은 click fraud 현상을 적절
한 수준 혹은 심각한 수준으로 판단
대기업의 마케팅담당자의 15% 만이 click
fraud를 문제시 함
전체의 23%-33%의 마케터는 click fraud문
제에 대하여 별로 신경을 쓰지 않고 있음
지난 1년간 서비스하고 있는 3rd party
solution
Click Fraud를 통해서 얻는 이
득
경제적 이득: search ad affiliates clicking
for dollars
경쟁 우위: 경쟁사의 PPC자금을 고갈 시키
거나 확대하게끔 유도
복수: 회사임직원 혹은 관계자에 대한 불만
공갈: exploit network limitations for profit
Fraud Techniques
Fake or masked IP addresses
Non-successive clicks
Destroyed referrers
Clickbots
Click armies?
Search Engine Efforts
Dedicated fraud departments
Click filters
Pattern recognition software
ROI analysis
Human intervention
Review of advertiser
documentation
From Overture FAQs:
Rules-based and pattern recognition-based inferences
Two patents pending
Each click is evaluated along 20 to 50 points. Some are:
–
–
–
–
–
–
–
–
–
–
IP address
User session information
User cookie information
The network to which an IP belongs : (eq) C-class
The user's browser information
The search term requested by the user
The time of the click
The rank of the advertiser's listing
The bid of the advertiser's listing
The time of the search
Means of Detection
IP address
Successive clicks
Wide click volume variance
3rd party tools, (eg) browser
Bill reviews
Odd traffic referrers
Credit notice
Unsolicited Google Refund
Notice:
Hello,
…
Google strictly prohibits any method used to
artificially and/or fraudulently generate clicks or
page impressions, and closely monitors clicks on
Google AdWords ads to prevent abuse. We
believe that your AdWords account may have
been affected by invalid clicks, and are crediting
your account for $XXX.XX USD.
…
…
The Google AdWords Team
구글의 AdSense 파트너에 대한 고소장