Slides - Agenda INFN

Download Report

Transcript Slides - Agenda INFN

Analog Memories for Particle Physics
Eric Delagnes
CEA/Irfu Saclay
[email protected]
1
Outline
• Introduction: Front-end architectures.
• Principle of analog memories.
• Ultrafast analog memories.
• Timing with ultrafast analog memories.
2
Front-end architectures
3
Modern « analog » architecture for FE Chips
S
(Fast)
Shaping
Slow
Shaping
L0
L0
logic
Discri
Peak
detector
or T&H
Time
Stamp
Mem
Mem
Mux &
Read
out
ADC
Detector
Threshold
Hit
L1 accept
(option)
•
•
•
•
•
Optimal data reduction: only the data over the threshold are stored
Large dynamic range achievable ( up to ~14bit with single gain)
DeadTime depends on the structure of the memories
Decision taken very early in the chain by the discriminator
When threshold is low: - « HIT » rate very sensitive to the noise (power & spectrum)
- pick-up or common mode noise can kill the acquisition
• The measurement is done once for all in predefined conditions (shaping)
4
Go Digital As Soon as Possible
Filter
ADC
Detector
Highest resolution
Highest speed
FPGA:
Time
Charge
…
Up to now the ADC is off-chip:
• High D Range , high speed ADC difficult to
integrate in ASICs.
• State of the art is 12 bit/20MHz 30 mW .
• For:
Versatility. Reusability
Possible digital filtering before discrimination.
Optimal filtering for Q and T.
Digital data available to build complex triggers.
No limitations for trigger latency: just amatter of memory sizing
• Against: Cost of ADC and FPGAs.
Power consumption.
Cables, integration.
5
Commercial digitizers: State of the art
Big advances on ADCs during the last decades:
 BiCMOS SiGe technology for ultrafast ones (>500MSPS, >8 bits available).
 Technology scaling in pure CMOS:
Decrease of capacitances => higher speed and bandwidth, lower power consumptio
reduced vdd => use of simpler architecture.
Size reduction of digital cells
Rise of algorithmic structures, Generalization of on-chip digital corrections.
FOM ~ P/(2ENOB.2.BW)
1pJ => 40mW@ENOB=9, 40MS/s)
Advanced design (published in 2009-2010)
© P. Aspell (CERN)
Generalization of full differential structures and use of high speed serial output link:
 Make the integration of ADC easier in systems.
Commercial products (2011), Survey of the products from 5
providers.
6
ALTRO: The only existing « GDSP » chip
Currently used on ALICE TPC.
Two 16-channel chips:
PASA (0.35µm) : VFE
ALTRO: ST 0.25µm:
• 10 MHZ/10bits ADC
• Digital filters & zero supress
© L. Musa (CERN)
S-ALTRO 1 single chip: IBM 0.13µm.
Goal 64 channels.
Submitted in May 2010:
 First test results encouraging.
 But Power : 40 mW/ch
 Cost !
S-ALTRO
16ch prototype
~ 45 mm2
Next steps:
• 64 channels GDSP chip for
CMS Muon Chamber/ILC
under study
• Early digitization for the
ATLAS calorimeter upgrade
under foreseen for 2017
7
The Analog memories: an intermediate solution
Lower speed clock
Detector +
FEE
Analogue
Memory
Trigger
ADC
FIFO
Fs
FPGA:
ZOI
treatment
Can be
on-chip
• Analog signal stored in an analog memory.
• Read Back on request (trigger) slowly.
• Requires an external path for the trigger.
• Discriminator
• External trigger
• …
• Analog data is available for any possible Digital Treatment….
• Larger amount of data @ chip output than with pure “analog” solution
8
When can SCAs can be useful ?
Need for a
buffer to wait
for trigger
latency
Multiparametric
Acquisition,
charge, time,
width…
Versatility
GSPS sampling
rate
Large number
of channels
Low cost, high
integration,
low power
Need for signal
shape
(PSA)
Optimal
filtering
(depending on
the rate,…)
Intermediate
dynamic range
10-12 bits
Need for an
early data
concentrator
Several class
of events,
changing
signal.
Pile-up
rejection or
treatment
« ps » timing
Data from all
detector
channels
needed
Need for DSP
(tail cancel.,
baseline
subtract.)
Robustness to
pickup &
coherent noise
Clustering
before
discrimination
9
Interest of analog data availability?: 3 examples
Pile-up: the 2 pulses can be accurately measured =>
Common mode noise subtraction: on-line calculation and subtraction by
the FPGA ( COMPASS GEM tracker & RICH, CLAS12 Micromégas tracker)
before 0 supress.
Part not connected to the detector
Clas12 (~1m long)Mmegas detector prototype + 1.5m cable (with AFTER
chip): Rms Noise vs channel number before and after common mode
10
noise subtraction.
The main SCA drawbacks
Need for a
trigger signal
Large data
througpout
Limited
memory size
Dead time
common to all
the channels
of a chip
Dead time
during RO
Not necessarly True:
Depends on the
SCA digital part architecture
11
Principle of Analog Memories
12
SCAs: First generation
• Introduction of Analog Memories for HEP experiments at the end of the 80’s by S.
Kleinfelder.
• Principle: Sample & Store an incoming signal in an array of capacitors (SCA), waiting
for (selective) readout and digitization= bank of Track & Holds
in
•
•
•
•
•
High integration: 12 to 128 channels, depth of few hundred cells.
Naturally compatible with CMOS technology.
High dynamic range possible.
Low power (few 100µW/ch).
In first designs Sample & Hold commands generated by Flip-Flops (for
example shift registers) clocked in the 10-100 MHz range => Sampling
frequency limitation.
out
13
SCAs around the world: some applications
HESS (Namibia)
4000 channels (ARS0)
HESS-2 : 2000 ch SAM
MAGIC: canary (DRS4)
ATLAS calo 200,000 channels
(HAMAC), CMS,LHCB…CMS
trackers:Several millions channels
JLAB(USA): calo DVCS (ARS0)
Antares
(Mediterranean sea)
: 1000 channels ARS1
CLAS12 (JLAB): Micromeagas
Tracker: 30,000 channels
(DREAM)
AMANDA-IceCube
(Antarctica)
MEG @PSI
3000 channels
(DRS4)
2p emitter
protons
drift of
ionisation
electrons
T2K TPC: 120,000 ch (AFTER)
Also use fot TPC Test around
the world (PANDA/FOPI….)
identification of
implanted
isotopes
Cosmic Ray Radio
Oscilloscopes
(MATACQ)
120 cm
GET electronics for Nuclear
physics : several 10,000 ch
(AGET): under dvt
Codalema (France)
MATACQ
TOF PET Siemens
14
SCAs use (1)
Fast Ck
Low cost !
SCA out
ADC
Memory Array =
SCA
Readout
Input
Signal
Sampling
• Time Expander:
0110101
To DAQ
Slow Ck
SCA Manager
• Time Expander & Mux => data concentration (AFTER chip)
ADC
SCA channel
Mux
SCA channel
0110101
To DAQ
SCA channel
Fast Ck
SCA Manager
ADC Ck
15
SCAs use (2)
• L1 Buffer with zone of interest ReadOut (NECTAr, … )
Trigger
ADC
• L1 Buffer/event Buffer with zero-supress ReadOut (AGET…)
0110101
To DAQ
Trigger
16
SCAs use (3)
• Can be integrated together with low noise front-end
(AFTER,AGET, DREAM…) or as a building block of a complex chip.
AFTER
AGET
1 channel
DAC
Charge range
Hit register
inhibit
FILTER
x64in
Trigger pulse
Discri
SCAwrite
SCA
CSA
tpeak
TEST
SLOW CONTROL
In Test
Serial Interface
ADC
x68
512 cells
12-bit ADC
BUFFER [ADS6422]
Power on
Reset
SCA MANAGER Readout
W /
R
Mode
CK
CK
Mode
Asic “Spy” Mode
AFTER
CSA;CR;SCAin;DISCRIin(N°1)
• Up to ~ 11 bit with front-end.
• 13.6 bit without (HAMAC, ATLAS calo)
17
SCAs applications (4)
• L1 Buffer and event Buffer with “DeadTimeLess” ReadOut (APV
(CMS),HAMAC (ATLAS), DREAM(CLAS12)…)
SCA is used as a dual port memory: (read
during writing)
Requires more complicated logic to
manage pointers. Can be on or off-chip
DREAM for CLAS12:
AMS 0.35µm.
- 64 ch x 512 cells. AFTER/AGET descendant
- For CD up to 200pF.
- Number of cell kept/trigger programmable
without any limitation
- Sustains 40 KHz trigger rate 5 samples/trig
- Under test
18
Storage Cell
Noise: absolute noise limite = kTC noise
vG
<Vs >2= g. k.T/C sampled on Cs
vs
vin
Cs
•
4 .g. k.T. Ron
•
Ron
en 2
Cs
Channel charge + command feedthrough injected in Cs when sampling:
𝑘.𝑊.𝐿.𝐶𝑜𝑥.(𝑉𝐺−𝑉𝑖𝑛−𝑉𝑇)
(𝑉𝐷𝐷−𝑉𝑆𝑆).𝐶𝑜𝑣
cov
+
2.𝐶𝑠
𝐶𝑜𝑣+𝐶𝑠
First term dominant
~ proportional to 1/Cs and to the Ron of the switch (if L min)
At first order: constant + a term proportional to Vin =>Offset + gain different of 1.
But transistors mismatches => Offset & gain spread along the SCA.
=> Possible calibration & correction
“Dummy switch” technique inefficient => increase of the spread.
∆𝑣𝑠 =
•
•
•
•
•
Large Cs is good for noise & uniformity !
19
Storage Cell: Bottom plate sampling
•Edges of the switch command is not infinitely fast.
•Transistor switch off when its gate voltage reach Vin+ VT
=>Dependency of the sampling time with Vin
=> Distortion, Jitter.
•For a 100ps edge => 50ps error possible !
•Solutions:
•Live with it, use the fastest possible edges and a reduced dynamic range.
S2
•Bottom plate Sampling :
Vin
Cph
•S1 has a constant source voltage
Cs
•S1 opened before S2 => sample
S1
V
•Aperture time now independent of Vin
ref
Cpl
•If “flip around” readout, the charge injected by S2 is cancelled
=> Charge injection does not depend on Vin .
• Drawbacks:
=> S1 added in serie => lower BW.
=> generation of S1 command
=> Less compact cell => more parasitic capacitance
20
Storage Cell: Bandwidth
BWcell =
with
𝑅𝑜𝑛
1
2.𝜋.𝑅𝑜𝑛.𝐶𝑠
≈
1
𝑔𝑑𝑠
=
small Cs is good for BW
1
𝑊
𝜇 .𝐶𝑜𝑥 𝐿 (𝑉𝐺−𝑉𝑖𝑛−𝑉𝑇)
•Minimum L for max Ron with smaller parasitics.
ex SAM chip: S1 & S2 switches
•BWcell vary with Vin => distortion.
RS1+RS2 = 600 Ohms
•BWcell is affected by transistor mismatch
BWCell= 820 MHz (Cs=300fF)
•BWcell should not be the contribution limiting the BW
•Possible strategies to limit distortion:
• Use NMOS only and limit the range to low voltages (DRS)
• Linearized by using NMOS & PMOS in //and swing centered to vdd/2
• Bootstrapped switches => never used in SCAs.
• On a given technology for a fix BWcell, Qinj is independent of Cs.
•Technology scaling:
 Lower Ron => higher BW but Smaller linear region
The cell settling time = (Ln(precision).Ron.Cs) must be smaller
than the switch command duration for accurate signal tracking.
21
Some readout options
Wilkinson Readout +AD conversion
1 Comparator/Cell
Counters & ramp generators can be inside or outside the
chip
Parallel digitization of several cells
Need for one offset & one gain/cell
Voltage mode
1 buffer/cell (cut when not used) => low power
Multiplexing toward an external ADC
Need for one offset/cell
« Flip Around » Readout => cancels injected charge
Very well defined gain
1 ampli/ line of cells => critical design very sensitive to Cp
=> speed
=> noise (amplified by (Cp+Cs)/Cs
Multiplexing toward an ADC (on-chip in NECTAR)
Need for one offset/line
22
Leakages
• Switches Leakage currents are discharging Cs:
=> voltage drop depending on time between Write and Read.
• Not an issue with old-fashioned (>0.25µm) technologies:
• Not a SD leakage but a current from S/D to bulk
AFTER Chip (T2K TPC) AMS 0.35µm
Distribution of the voltage drop on 120 chips * 65000 cells after 2 ms.
1 LSB = 0.5mV => 55fA. Not gaussian.
• A real problem in deep submicron technology !!!:
•
•
•
•
•
•
Low VT + Low weak inversion slope
Now a SD current
pA scale leakages in 0.18µm
10 pA scale in 0.13µm => storage time limited to few µs
Use of low-leakage transistors (but lower Ron)
Larger Cs ? => against history !
Reduce the range to work with negative Vgs in offmode
23
Ultrafast analog memories
24
Boosting the SCA sampling Freq
•
•
Introduced in 1990’s again by S. Kleinfelder (ATWR, ATWD chips).
The Sample & Hold commands are now generated using a pulse propagating through a
delay line with NTAP: Fs = 1/d => multiGSPS operation possible even in ~1µm technologies.
•
•
Fs tunable through an analogue command
In the first designs:
–
The digital sampling signal input was a single
pulse = trigger => need for an analogue delay on
the analogue signal path to generate the
“Pretrig”.
– The width of the sampling pulse was defined by
the width of the digital pulse.
25
Delay elements zoology
•
Basically the same as those used in digital TDCs, made with 2 cascaded inverting cells :
S
C1
C2
speed
t p  tr  2.Cl .Vdd.
LMP1
WMP1.K MP1.(VGS MP1  VTH ) 2 .
Only the rising edge is slowed
down
Fast.
2nd inverter
reshapes the signal
=> sampling edge
always sharp
S
C1
speed
More symetric output Highest speed, but
requires low impedance
command
S
C2
speed
Slower
Symetrical
egdes if
C1=C2
Differential. Low jitter.
But Static power.
Modulate PMOS conductance is better for low jitter.
 Separate Vdd for DLL only.
 Vdd – Vcommand easy to filter
26
Delay control
©G. Varner (BLAB chip)
• Delay elements sensitive to temperature, process, ageing…:
• 2 possible philosophies:
– Servo-control loop (PLL, DLL).
– No servo-control:
• Delay control voltages externally generated.
– Delay= f(Control voltage) first calibrated and stored in a LUT used to command DAC.
– Temperature dependency can also be calibrated and corrected
• Delays measured using an extra channel to digitize a clock/ timing signal
27
3rd generation of ultrafast SCAs
Common conclusions of the 4 groups working on this topic : need for
o High bandwidth, low jitter
 short analogue busses
 small Cs
 use of advanced technologies (0.11 to 0.18µm nodes)
o Large depth to accommodate longer latencies…
 Analog bus segmentation
 And/or two stage architecture
o Fast readout
o Multiple events buffering to derandomize deadtime:
 Simultaneous R/W in a large array with pointer
management
 Array of small-size banks of cells.
o Auto-triggering capabilities.
28
Future DRS5 @PSI (2013)
• 32 fast sampling cells at 10 GSPS
• 100 ps sample time, 3.1 ns hold time
• Hold time long enough to transfer voltage to secondary
sampling stage with moderately fast buffer (300 MHz)
• Shift register gets clocked by inverter chain from fast
sampling stage
FPGA
read
pointer
latch
latch
latch
counter
digital readout
analog readout
trigger
write
pointer
©S. Ritt
• Multiple buffering => up to 2MHz with negligible deadtime
29
The NECTAR chip for the Cherenkov Telescope array
240Mbit/s LVDS serial output
Compatible with low cost FPGA
3mm x 7mm
AMS CMOS 0.35µm
QFP120 14 × 14 mm2 package
• 4 SCA channels => 2 fully differential channels
• 0.5 to 3.2 GSPS
• 1024 cells/SCA channel (1µs max latency @ 1GSPS, 0.5µs @ 2GSPS)
• On chip 12 bit 20MSPS ADC
• Output data directly usable for calculation (without pedestal/gain spread
correction)
• Low input capacitance < 5pF including package thanks to input buffers
• Power management: write part powered down during RO/digitizing & vice versa
30
Some NECTAr chip performances
PMT-like pulse (3ns FWHM)
31
Timing with ultrafast
analog memories
32
7
Ultra fast SCAs for precise timing
• High sampling rates help for timing
• Higher sampling frequencies => simpler algorithms
• Continuous ADCs are the perfect digitizers but at least 99% of data are often
going to the bin at owner’s expense! (power, FPGA, …)
• Ultrafast analogue memories are a good alternative to ADC for frequency
above >500MHz.
• Low cost. High integration
Coarse timing
Waveform capture
=> Fine timing
No more critical
For timing
Critical path
for time
measurement
33
Illustrating example: SCA for precise timing
PILAS
[Breton].
35ps FWHM
ANALOG REFERENCE: in the same
conditions, using analogue CFD, TAC +ADC
(resolution with pulser =3.4ps rms)
DIGITIZER
• sSINGLE = 17ps => Low F= 20%.
40 PE
1.5ns FWHM
40 PE
Burle MCPMT
10µm pores
Low gain 2-3 104
J. VaVra’s test setup @ SLAC
• Details in [Breton et al, NIM A Vol 629, Issue 1, 11, p 123-132, 2011 ].
• Spread of timing difference measured
=> timing resolution sSINGLE = sDIF/√2
• Data taken with the Wavecatcher module
– Use SAM analogue memory ASIC
– 2 channels 3.2GSPS/12bit/BW=450MHz.
– < 8ps rms resolution on electrical pulse
• Very good S/N= 550 ! snoise= 1.5mV rms
34
d-CFD (Digital Constant Fraction)
• Time crossing of a threshold set at
to a fix fraction of amplitude (or
Charge).
f=0.1
• If pulses are homothetic: timewalk
is cancelled.
t1
• Compatible with FPGA.
• Easier if f is a power of 2.
t2
Baseline
Calculation
Delay
(>tr+PF
latency)
Data
+S
-
Event Discrimination Threshold

Peak
Find
Yi-d-1
Yi-d
Yi-d
(Option)Peak ymax
Interpolator

Time
Thresh
Interp.
Ymax/f
/f
35
CFD-ZCFD: results
Linear interpolation
F=0.2
d-CFD
Max signal slope zone
d-CFD
DT = 5.387ns/ sT opt= 16.6 ps rms Nov≥2
DT = 5.385ns/ sT opt= 17.5 ps rms with
linear interpolation
d-CFD: resolution vs fraction with varying interpolation factor:
- Results plotted here for Lagrange 3rd order interpolation
- Exactly the same results with spline interpolation or digital
filter (tested up to Nov=5)
- Optimum curve already reached for Nov between 2 and 3.
- Best resolution obtained for F= 0.2
36
Thank you for your attention !
What kind of digital treatment?
Increased Processing Time
Increased Algorithm complexity
(Can be embedded
in modern FPGA)
Real
Time
Acceptable Rate
DSP
or processor
Delayed
Time /
off line
Digi
tize
Detector +
FEE
FPGA (hard)
Digi
tize
Detector +
FEE
Digi
tize
Detector +
FEE
computer
9
Two possible philosophies
High speed clock
Fs
Continuous on the flight data treatment:
o All the digital data enters in the timing electronics at the sampling
rate.
o The result is obtained after a fixed latency
o Only compatible with continuously sampling ADC
o For a 12bit / 3GSPS => 36 Gbit/s stream to treat/channel !!!
Detector +
FEE
ADC
Lower speed clock
FPGA:
ZOI
treatment
Store
Pulses are first discriminated and time stamped.
Compatible with ADC or with analogue memories
may require an analogue discriminator
Only the data within a zone of interest (= 1 event) are treated by
the digital timing electronics => strong data flow reduction
o Possibility to use intermediate digital FIFOs as derandomizing
buffers.
o For the same operations requires lower clock frequency
o The timing electronics may have non fixed latency
>
Th
TimeStamp
Lower speed clock
Fs
Detector +
FEE
Th
Analogue
Memory
>
ADC
FIFO
o
o
o
o
ADC
Store
Zone of interest treatment:
FIFO
Fs
Detector +
FEE
FPGA:on
the flight
treatment
TimeStamp
FPGA:
ZOI
treatment
Ultra fast switched capacitor arrays in the world
Many chips for different projects
Buffered and unbuffered
Very deep arrays
ADC on chip.
Philosophy => pushing the
limit of the SCA technology
Straw3
Labrador Labrador3
Target
BLAB family
Universal chip for many applications
8 + 1 channels 1024 cells
5GSPS, 950 MHz BW
Low power consumption
Short readout time
Several possible modes of operation
DRS2
DRS3
ps family
Goal: reach a 1ps precision !
Pioneering R&D work
130nm IBM
18 GSPS, 256 samples, 6ch
ADC on chip
Initiator of a
networking
activity on SCAs
and ps-timing
D. Breton IN2P3/LAL
E. Delagnes CEA/Saclay
S. Ritt, R. Dinapoli PSI
DRS1
From an Orignal slide of S. Ritt
H. Frisch et al., Univ. Chicago
G. Varner Univ. Hawaii
DRS4
ARS
MATACQ
More than 120.000 SCAs operating worldwide
Buffered (f-3dB 400-500MHz) 3.2GSPS
High dynamic range
Robust (minimum calibration or ext. control)
Conservative technologies
Moderate depth 256-1024 cells/ 2ch
On-chip ADC in the last chip
SAM family
Nectar
The matrix structure (SAM,NECTAR…)
Advantages: robustness => only 1 pedestal/Line to calibrate.
good timing even with no calibration.
Drawbacks: complexity . Not scalable to a large number of channels/chip
Recent ultra-fast SCAs
ASIC
DRS4
Design Internal # Depth Sampling -3dB Dyn. Storag
Team Ampli ? chan /chan [GSa/s] BW Range
e
MHz Bit rms Cap
(fF)
PSI
no
8
1024
1-5
900
12
250
SAM
Orsay/
SAMLONG Saclay
NECTAR
IRS2
32536
1-4
BLAB3A Hawaii Ampli
8
32536
1-4
TARGET Hawaii Buf
TARGET2
Ampli
TARGET3
Buf
16
4192 1-2.5
16384
16384
Chicago
no
no
no
Internal
ADC?
IBM 0.25
no
300
AMS0.35
no
no
pipelined
10
14
TSMC 0.25
wilkinson
1000
10
14
TSMC 0.25
wilkinson
150
10
14
TSMC 0.25
wilkinson
>300
>1600
10
20
20
IBM 0.13
wilkinson
2
256 0.5-3.2 500
Fully 1024 0.5-3.2 >420
diff. 1024 0.5-3.2 >420
8
PSEC3
PSEC4
Hawaii
Buf
Techno
4
6
256
1-16
prelim
>12
11.3
References: SCAs
• ATWD, ATWR (S. Kleinfelder)
–
–
ATWR: S. Kleinfelder’s M.S. thesis, Univ. California, Berkeley 1992
ATWD: IEEE TNS 50-4:955-962 ,2003
A didactic paper by G.Haller et al:
IEEE JSSC 29-4 (1994)500-508
• PSI developments (DRS family)
– IEEE/NSS 2008, TIPP09
–
http://midas.psi.ch/drs
• Orsay/Saclay
–
ARS: IEEE TNS 49-3:1122-1129,2002
–
–
–
MATACQ: IEEE TNS 52-6:2853-2860,2005 / Patent WO022315
SAM: NIM A567 (2006) 21-26, IEEE/NSS 2006, NIM A 629 (2011) 123-132, NDIP 2011(Lyon)
NECTAR: NDIP 2011(Lyon), IEEE/NSS 2011 (N28-6)
• Hawai’i developments
–
–
–
–
–
STRAW: Proc. SPIE 4858-31, 2003
LABRADOR: NIM A583 (2007) 447-460.
BLAB: NIM A591 (2008) 534-545; NIM A602 (2009) 438-445.
STURM: EPAC08-TUOCM02, June, 2008.
Calibration: Nishimura et al, Physics Procedia (TIPP 2011)
• Chicago activities
–
–
–
http://psec.uchicago.edu
Ps timing: NIM AA607:387-393,2009
PSEC 3: IEEE/NSS 2011 (NP2S-75), TIPP 2011