Transcript Document
MICRO LEVEL FORECASTS FOR INDIA’S EXPORT SECTOR
SPECIFIC COUNTRIES AND SPECFIC COMMODITIES
Analytics & Modelling Division
NATIONAL INFORMATICS CENTRE
Department of Information Technology
Ministry of Communication & Information Technology
New Delhi-110003
Major input to “India’s export model”
for a financial year
•Input to an econometric model to derive macro-level
forecasts for strategic planning for India’s export – RIS
Study
• NIC has developed micro-level forecasts for a financial
year for specific country and specific commodities (Total
variables: 319)
Tools and technologies used :
• Monthly time series behavior is captured through Neural network
methodology.
• Final model selected has been simulated with-in and outside
sample and once stabilized with regard to error statistics forecasts
are generated .
• 4Thought/Freefore is the state-of-the-art software tool from
COGNOS which has been used to simulate and generate microlevel forecast India’s export for a financial year.
• The reliability of the forecasts and the degree of confidence are
part of the final model
Table A: SUMMARY OF COUNTRY WISE DATA-SETS
(Time Series Forecasting Carried for the listed number of
data sets)
ComCodes
Var.
for
each
Code
Total
Vbls
Canada
13
4
52
+2(rest)
+1(all)
=55
Apr 1996
to June
2003
USA
17
4
68
China
10
4
Japan
11
4
Country List
Exports Imports
UVI
ROW
Jan 1995
to Nov
2003
Jan 1995
to Nov
2003
Jan 95 to
Nov 2003
Apr 1996
to May
2003
Jan 1993
to Oct
2003
Jan 1993
to Oct
2003
Jan93 to
Oct 2003
40
Apr 1996
to May
2003
Jan 1995
to Nov
2003
Jan 1995
to Nov
2003
Jan 1995
to Nov
2003
44
Apr 1996
to June
2003
Jan 1994
to Nov.
2003
Jan 1994
to Nov.
2003
Jan 1994
to Nov
2003
Table A: Contd.
Country List
ComCodes
Var.
for
each
Code
Total
Vbls
Exports
Imports
UVI
ROW
Malaysia *
1
1
1
Apr 1996 to
Aug 2002
NA
NA
NA
Singapore*
1
1
1
Apr 1996 to
Aug 2002
NA
NA
NA
Thailand*
1
1
1
Apr 1996 to
Aug 2002
NA
NA
NA
Hong Kong*
1
1
1
Apr 1996 to
Aug 2002
NA
NA
NA
Rest of
World*
1
1
1
Apr 1996 to
Aug 2002
NA
NA
NA
* Only single variable total export of “All Commodities” from India is
considered
Table A: Contd.
Country List
ComCodes
Var.
for
each
Code
Total
Vbls
European
Union
26
4
104
+2 (rest)
+1(all)
=107
TOTAL
No. of Obs
(Range)
Period
Range
Exports Imports
UVI
ROW
Apr 1996
to June
2003
Jan 1996
to June
2003
Jan 1996
to June
2003
Jan 1996
to June
2003
77-92
90-130
90-130
90-130
Apr 1996
to June
2003
Jan 1993
to Nov
2003
Jan 1993
to Nov
2003
Jan 1993
to Nov
2003
319
Includes both the series- monthly as well as annual - with 26 items in each
series.
Univariate ARIMA MODEL
1.
In regression analysis, if the error terms are not independent i.e.
autocorrelated, the efficiency of the ordinary least-square (OLS) parameter
estimates gets adversely affected and the standard error estimates are
biased.
2.
Auto Regressive Integrated Moving Average (ARIMA) model is fit for data
with autocorrelated errors. This happens frequently with time series data.
3.
The ARIMA procedure analyzes and forecasts equally spaced univariate
time series data, transfer function data, and intervention data using the
autoregressive moving-average or the more general autoregressive
integrated moving-average (ARIMA) model.
4.
An ARIMA model predicts a value in a response time series as a linear
combination of its own past values, past errors, and current and past values
of other time series.
Univariate ARIMA MODEL – Contd.
An ARIMA model contains three different kinds of parameters:
† the p AR-parameters;
† the q MA-parameters;
† and the variance of the error term.
This amount to a total of p + q + 1 parameters to be estimated.
These parameters are always estimated on using the stationary
time series (a time series which is stationary with respect to it’s
variance and mean).
NEURAL NETWORK
Neural networks cannot do anything that cannot be done using traditional
computing techniques, BUT they can do some things which would
otherwise be very difficult (time consuming).
Neural networks form a model from training data (or possibly input data)
alone.
This is particularly useful when time series behavior is complex, and
forecasts for a period is input for the next period forecast.
In a time series, behavior is complex, follows an unknown pattern, has
large number of variables, Neural networks learns from the past behavior to
develop corresponding complex algorithm and then predicts. (ARIMA:
Univariate, Multivariate)
NEURAL NETWORK
-Neural networks are a form of multiprocessor computer system, with
simple processing elements
a high degree of interconnection
simple scalar messages
adaptive interaction between elements
A biological neuron may have as many as 10,000 different inputs, and may
send its output (the presence or absence of a short-duration spike) to many
other neurons.
Neurons are wired up in a 3-dimensional pattern.
Example
A simple single unit adaptive network:
The network has 2 inputs I0 and I1, and one output. All are binary.
If W0 *I0 + W1 * I1 + Wb > 0, then Output is 1
If W0 *I0 + W1 * I1 + Wb <= 0, then Output is 0
We want it to learn simple : output is 1 if either I0 or I1 is 1.
The network adapts as follows: change the weight by an amount proportional
to the difference between the desired output and the actual output. As an
equation:
Δ Wi = η * (D-Y).Ii
where η is the learning rate, D is the desired output, and Y is the actual output.
Feed Forward Neural Network
1.
EU
A. 30613 (Import of Shrimps and prawns frozen )
Model Statistics
Model fit: 75.5004
Test fit:
78.4198
Overall fit: 76.4137
Adjusted fit:
65.3762
Iterations: 69
RMS error: 16.0265
Standard deviation:
16.1163
95% confidence interval:
Mean absolute error: 12.5406
Mean absolute error (%):
F-Statistic: 20.7884
Durbin-Watson Statistic: 1.0007
32.2326
8.7764
STATISTICAL MEASURES
Model fit
A measure of how well the model fits to the original data used in modeling.
100% represents a perfect fit. The model fit would approach 0% if you guessed
the average value for the target. If the value is negative, the fit is worse than if
you had guessed the average value for the target (that is, you had a naive
model). The model fit is based on an adaptation of the standard R^2 statistic
(that is, the proportion of the relationship explained between two variables).
Adjusted fit
The overall fit adjusted for the number of factors, and the number of rows of
data contained in the model. This assumes that a more complex model or less
data will produce a less predictive model.
Test fit
The percentage of variation in the test set explained by the model. Test fit (or
percent test fit) is a measure of how well the model predicts the test data,
and is the best measure of the genuine predictive performance of the model.
The test fit is an adaptation of the standard R^2 statistic. Unlike the model
fit, the test fit can be negative. This happens if the current model yields a
less accurate prediction of the test set than the naive model.
Overall fit
An indicator of the model quality, and is a combination of the model fit and
the test fit. The overall fit is the percentage of the variation explained in the
dependent variable.
B. 90111 (Export of Coffee neither roasted nor decaffeinated
Model Statistics
Model fit:
75.6046
Test fit:
73.7038
Overall fit:
75.2571
Adjusted fit:
64.0117
Iterations:
54
RMS error:
4.4336
Standard deviation:
4.4593
95% confidence interval:
8.9186
Mean absolute error: 3.1465
Mean absolute error (%):
34.767
F-Statistic:
18.7563
Durbin-Watson Statistic:
0.5446
C. 251611 (Import of Granite,crude/rough )
Model Statistics
Model fit:
67.3539
Test fit:
61.8533
Overall fit:
66.0773
Adjusted fit:
56.5328
Iterations:
66
RMS error:
3.4094
Standard deviation:
3.4285
95% confidence interval:
6.857
Mean absolute error: 2.7858
Mean absolute error (%):
6.6183
F-Statistic:
12.4989
Durbin-Watson Statistic:
2.122
2. CHINA
A. 670300 (Import of Human Hair, dressed, thinned, bleached or otherwise worked; wool or other
animal hair or other textile materials, prepared for use in making wigs or the like )
Model Statistics
Model fit:
85.0775
Test fit:
84.3229
Overall fit:
84.9804
Adjusted fit:
74.6557
Iterations:
30
RMS error:
1.0522
Standard deviation:
1.0571
95% confidence interval:
2.1143
Mean absolute error: 0.7224
Mean absolute error (%):
24.07
F-Statistic:
44.3208
Durbin-Watson Statistic:
1.2491
B. CHINA (Import of rest of the codes)
Model Statistics
Model fit:
87.8544
Test fit:
82.4129
Overall fit:
87.1099
Adjusted fit:
76.5264
Iterations:
126
RMS error:
2828.6593
Standard deviation:
2841.9707
95% confidence interval:
5683.9414
Mean absolute error: 2114.0386
Mean absolute error (%):
12.5192
F-Statistic:
52.9366
Durbin-Watson Statistic:
0.8763
C. CHINA (Unit value index for rest of the codes)
Model Statistics
Model fit:
61.607
Test fit:
76.4597
Overall fit:
66.02
Adjusted fit:
57.6874
Iterations:
46
RMS error:
6.1855
Standard deviation:
6.2157
95% confidence interval:
12.4314
Mean absolute error: 4.2899
Mean absolute error (%):
4.5121
F-Statistic:
14.5718
Durbin-Watson Statistic:
0.9655
3.
USA
A. 420310 (Import of Articles of apparel
MODEL STATISTICS IN TERMS OF THE ORIGINAL DATA
Number of Residuals (R)
=n
70
Number of Degrees of Freedom
=n-m
62
Residual Mean
=Sum R / n
.683103E-02
Sum of Squares
=Sum R**2
121.321
Variance
var=SOS/(n)
1.73316
Adjusted Variance
=SOS/(n-m)
1.95679
Standard Deviation
=SQRT(Adj Var) 1.39885
Standard Error of the Mean
=Standard Dev/ .177655
Mean / its Standard Error
=Mean/SEM
.384512E-01
Mean Absolute Deviation
=Sum(ABS(R))/n .992518
AIC Value ( Uses var )
=nln +2m
54.4962
SBC Value ( Uses var )
=nln +m*lnn
72.4841
BIC Value ( Uses var )
=see Wei p153 -95.0882
R Square
=
.887551
Durbin-Watson Statistic
=[A-A(T-1)]**2/A**2 1.95492
D-W STATISTIC SUGGESTS NO SIGNIFICANT AUTOCORRELATION for lag1
)
B. 570110 ( Import of Carpets and other textile coverings of wool or fine animal hair
MODEL STATISTICS IN TERMS OF THE ORIGINAL DATA
Number of Residuals (R)
=n
103
Number of Degrees of Freedom
=n-m
97
Residual Mean
=Sum R / n
-.783408E-14
Sum of Squares
=Sum R**2
1578.37
Variance
var=SOS/(n)
15.3239
Adjusted Variance
=SOS/(n-m)
16.2718
Standard Deviation
=SQRT(Adj Var) 4.03383
Standard Error of the Mean
=Standard Dev/ .409574
Mean / its Standard Error
=Mean/SEM
-.191274E-13
Mean Absolute Deviation
=Sum(ABS(R))/n 3.10562
AIC Value ( Uses var )
=nln +2m
293.130
SBC Value ( Uses var )
=nln +m*lnn
308.938
BIC Value ( Uses var )
=see Wei p153 -26.2750
R Square
=
.858561
Durbin-Watson Statistic
=[A-A(T-1)]**2/A**2 1.88808
D-W STATISTIC SUGGESTS NO SIGNIFICANT AUTOCORRELATION for lag1.
C. 610510 (Import of Men's or boys' shirts of cotton, knitted or crocheted )
MODEL STATISTICS IN TERMS OF THE ORIGINAL DATA
Number of Residuals (R)
=n
105
Number of Degrees of Freedom
=n-m
99
Residual Mean
=Sum R / n
-.708456E-01
Sum of Squares
=Sum R**2
10575.5
Variance
var=SOS/(n)
100.719
Adjusted Variance
=SOS/(n-m)
106.824
Standard Deviation
=SQRT(Adj Var) 10.3355
Standard Error of the Mean
=Standard Dev/ 1.03876
Mean / its Standard Error
=Mean/SEM
-.682020E-01
Mean Absolute Deviation
=Sum(ABS(R))/n 7.73821
AIC Value ( Uses var )
=nln +2m
496.295
SBC Value ( Uses var )
=nln +m*lnn
512.219
BIC Value ( Uses var )
=see Wei p153 165.540
R Square
=
.848765
Durbin-Watson Statistic
=[A-A(T-1)]**2/A**2 2.04567
D-W STATISTIC SUGGESTS NO SIGNIFICANT AUTOCORRELATION for lag1.
Conclusion :
Time Constraint :
• No. of independent variable for which forecast are to be generated is
approximately 319.
•As the time series data keep coming over time and forecasts are to be
generated based on the latest monthly time series data within a period of
approximately 2 weeks forecasts are to be generated for 319 independent
variables.
•Each variable forecast is an independent exercise.
•Existing software tools arte not fully automated and the subject and tool
specialist intervention is a must.
•Traditional Statistical/Econometric model techniques/software tools are
major constraint in terms of automation.
What is Required :
NIC can develop fully automated forecasting system by
developing algorithms and testing with state-of-the-art tools
available with limited interface.
The state of the art software tool and techniques will require
funding. Manpower and resource mobilization to the tune of
Rs. 10 lakhs and for a period of 8 months.