부정클릭 - Data Mining

Download Report

Transcript 부정클릭 - Data Mining

부정클릭과 Web mining
WEB DATA의 분류

Content:
– Web pages data: text and graphics.

Structure:
– Intra-page structure: HTML or XML tags
– inter-page structure: hyper-links

Usage:
– IP addresses, page references, the date and
time of accesses.

User Profile:
– user’s demographic information
Difficulties of WEB usage
DATA

Single IP address/Multiple Server Sessions:
– Internet service providers (ISPs) typically have a pool
of proxy servers that users access the Web through. A
single proxy server may have several users accessing a
Web site, potentially over the same time period.

Multiple IP address/Single Server Session:
– Some ISPs or privacy tools randomly assign each
request from a user to one of several IP addresses.

Multiple IP address/Single User:
– A user that accesses the Web from different machines
will have a different IP address from session to session.
This makes tracking repeat visits from the same user
difficult.

Multiple Agent/Singe User:
– a user that uses more than one browser, even on the
same machine, will appear as multiple users.
부정클릭
Internet scammers steal
money with 'click fraud.‘ by
Newsweek

When he tried to expand into Germany, Nehoray
found that his site was getting lots of new
visitors but unusually few paying customers.
Nehoray (who prefers we don't name his
company) analyzed his Internet logs and made
an unsettling discovery. Someone—perhaps a
competitor—had written a simple software
program that relentlessly clicked on his ads,
burning up his ad budget and pushing his links
off the search sites by lunchtime each day. After
spending weeks complaining to Google about
the problem and getting a partial refund, he
finally yanked the ads. "It was really bad," he
says, estimating that he lost $50,000 in potential
business. "Nobody knows how to solve this
problem."
SEMPO (Search Engine
Marketing Professional Office)

대부분의 marketer들이 문제점을 인식하지
만 해결하려는 의지는 미흡

SEM 회사 중 1/3은 click fraud 현상을 적절
한 수준 혹은 심각한 수준으로 판단

대기업의 마케팅담당자의 15% 만이 click
fraud를 문제시 함

전체의 23%-33%의 마케터는 click fraud문
제에 대하여 별로 신경을 쓰지 않고 있음

지난 1년간 서비스하고 있는 3rd party
solution
Click Fraud를 통해서 얻는 이
득

경제적 이득: search ad affiliates clicking
for dollars

경쟁 우위: 경쟁사의 PPC자금을 고갈 시키
거나 확대하게끔 유도

복수: 회사임직원 혹은 관계자에 대한 불만

공갈: exploit network limitations for profit
Fraud Techniques

Fake or masked IP addresses

Non-successive clicks

Destroyed referrers

Clickbots

Click armies?
Search Engine Efforts

Dedicated fraud departments

Click filters

Pattern recognition software

ROI analysis

Human intervention

Review of advertiser

documentation
From Overture FAQs:



Rules-based and pattern recognition-based inferences
Two patents pending
Each click is evaluated along 20 to 50 points. Some are:
–
–
–
–
–
–
–
–
–
–
IP address
User session information
User cookie information
The network to which an IP belongs : (eq) C-class
The user's browser information
The search term requested by the user
The time of the click
The rank of the advertiser's listing
The bid of the advertiser's listing
The time of the search
Means of Detection







IP address
Successive clicks
Wide click volume variance
3rd party tools, (eg) browser
Bill reviews
Odd traffic referrers
Credit notice
Unsolicited Google Refund
Notice:






Hello,
…
Google strictly prohibits any method used to
artificially and/or fraudulently generate clicks or
page impressions, and closely monitors clicks on
Google AdWords ads to prevent abuse. We
believe that your AdWords account may have
been affected by invalid clicks, and are crediting
your account for $XXX.XX USD.
…
…
The Google AdWords Team
구글의 AdSense 파트너에 대한 고소장