
    >h                     l    S r SSKrSSKrSSKJrJr  SSKJ	r	J
r
JrJrJr  S r " S S5      r  S	S jrg)
zLPrincipal Component Analysis

Author: josef-pktd
Modified by Kevin Sheppard
    N)ValueWarningEstimationWarning)string_like
array_like	bool_like
float_likeint_likec                 Z    [         R                  " [         R                  " X -  5      5      $ )N)npsqrtsum)xs    oC:\Users\julio\OneDrive\Documentos\Trabajo\Ideas Frescas\venv\Lib\site-packages\statsmodels/multivariate/pca.py_normr      s    77266!%=!!    c                       \ rS rSrSr    SS jrS rS rS rS r	S	 r
S
 rS rS rS rS rS rS rS rSS jrS r  SS jrSS jrSrg)PCA   ab  
Principal Component Analysis

Parameters
----------
data : array_like
    Variables in columns, observations in rows.
ncomp : int, optional
    Number of components to return.  If None, returns the as many as the
    smaller of the number of rows or columns in data.
standardize : bool, optional
    Flag indicating to use standardized data with mean 0 and unit
    variance.  standardized being True implies demean.  Using standardized
    data is equivalent to computing principal components from the
    correlation matrix of data.
demean : bool, optional
    Flag indicating whether to demean data before computing principal
    components.  demean is ignored if standardize is True. Demeaning data
    but not standardizing is equivalent to computing principal components
    from the covariance matrix of data.
normalize : bool , optional
    Indicates whether to normalize the factors to have unit inner product.
    If False, the loadings will have unit inner product.
gls : bool, optional
    Flag indicating to implement a two-step GLS estimator where
    in the first step principal components are used to estimate residuals,
    and then the inverse residual variance is used as a set of weights to
    estimate the final principal components.  Setting gls to True requires
    ncomp to be less then the min of the number of rows or columns.
weights : ndarray, optional
    Series weights to use after transforming data according to standardize
    or demean when computing the principal components.
method : str, optional
    Sets the linear algebra routine used to compute eigenvectors:

    * 'svd' uses a singular value decomposition (default).
    * 'eig' uses an eigenvalue decomposition of a quadratic form
    * 'nipals' uses the NIPALS algorithm and can be faster than SVD when
      ncomp is small and nvars is large. See notes about additional changes
      when using NIPALS.
missing : {str, None}
    Method for missing data.  Choices are:

    * 'drop-row' - drop rows with missing values.
    * 'drop-col' - drop columns with missing values.
    * 'drop-min' - drop either rows or columns, choosing by data retention.
    * 'fill-em' - use EM algorithm to fill missing value.  ncomp should be
      set to the number of factors required.
    * `None` raises if data contains NaN values.
tol : float, optional
    Tolerance to use when checking for convergence when using NIPALS.
max_iter : int, optional
    Maximum iterations when using NIPALS.
tol_em : float
    Tolerance to use when checking for convergence of the EM algorithm.
max_em_iter : int
    Maximum iterations for the EM algorithm.
svd_full_matrices : bool, optional
    If the 'svd' method is selected, this flag is used to set the parameter
    'full_matrices' in the singular value decomposition method. Is set to
    False by default.

Attributes
----------
factors : array or DataFrame
    nobs by ncomp array of principal components (scores)
scores :  array or DataFrame
    nobs by ncomp array of principal components - identical to factors
loadings : array or DataFrame
    ncomp by nvar array of principal component loadings for constructing
    the factors
coeff : array or DataFrame
    nvar by ncomp array of principal component loadings for constructing
    the projections
projection : array or DataFrame
    nobs by var array containing the projection of the data onto the ncomp
    estimated factors
rsquare : array or Series
    ncomp array where the element in the ith position is the R-square
    of including the fist i principal components.  Note: values are
    calculated on the transformed data, not the original data
ic : array or DataFrame
    ncomp by 3 array containing the Bai and Ng (2003) Information
    criteria.  Each column is a different criteria, and each row
    represents the number of included factors.
eigenvals : array or Series
    nvar array of eigenvalues
eigenvecs : array or DataFrame
    nvar by nvar array of eigenvectors
weights : ndarray
    nvar array of weights used to compute the principal components,
    normalized to unit length
transformed_data : ndarray
    Standardized, demeaned and weighted data used to compute
    principal components and related quantities
cols : ndarray
    Array of indices indicating columns used in the PCA
rows : ndarray
    Array of indices indicating rows used in the PCA

Notes
-----
The default options perform principal component analysis on the
demeaned, unit variance version of data.  Setting standardize to False
will instead only demean, and setting both standardized and
demean to False will not alter the data.

Once the data have been transformed, the following relationships hold when
the number of components (ncomp) is the same as tne minimum of the number
of observation or the number of variables.

.. math:

    X' X = V \Lambda V'

.. math:

    F = X V

.. math:

    X = F V'

where X is the `data`, F is the array of principal components (`factors`
or `scores`), and V is the array of eigenvectors (`loadings`) and V' is
the array of factor coefficients (`coeff`).

When weights are provided, the principal components are computed from the
modified data

.. math:

    \Omega^{-\frac{1}{2}} X

where :math:`\Omega` is a diagonal matrix composed of the weights. For
example, when using the GLS version of PCA, the elements of :math:`\Omega`
will be the inverse of the variances of the residuals from

.. math:

    X - F V'

where the number of factors is less than the rank of X

References
----------
.. [*] J. Bai and S. Ng, "Determining the number of factors in approximate
   factor models," Econometrica, vol. 70, number 1, pp. 191-221, 2002

Examples
--------
Basic PCA using the correlation matrix of the data

>>> import numpy as np
>>> from statsmodels.multivariate.pca import PCA
>>> x = np.random.randn(100)[:, None]
>>> x = x + np.random.randn(100, 100)
>>> pc = PCA(x)

Note that the principal components are computed using a SVD and so the
correlation matrix is never constructed, unless method='eig'.

PCA using the covariance matrix of the data

>>> pc = PCA(x, standardize=False)

Limiting the number of factors returned to 1 computed using NIPALS

>>> pc = PCA(x, ncomp=1, method='nipals')
>>> pc.factors.shape
(100, 1)
Nc                 	   S U l         / U l        [        U[        R                  5      (       a"  UR
                  U l         UR                  U l        [        USSS9U l        [        US5      U l
        [        US5      U l        [        US5      U l        [        U
S5      U l        SU R                  s=:  a  S	:  d  O  [        S
5      e[!        US5      U l        [!        US5      U l        [        US5      U l        [        US5      U l        [        US5      U l        U R                  R,                  u  U l        U l        [        USS	SS9nUc!  [2        R4                  " U R0                  5      nOv[2        R6                  " U5      R9                  5       nUR,                  S   U R0                  :w  a  [        S5      eU[2        R:                  " US-  R=                  5       5      -  nXpl        [A        U R.                  U R0                  5      nUc  UOUU l!        U RB                  U:  a"  SS K"nSnURG                  U[H        5        Xl!        Xl%        U RJ                  S;  a  [        SU S35      eU RJ                  S:X  a  SU l        [2        RL                  " U R.                  5      U l'        [2        RL                  " U R0                  5      U l(        [S        U	SSS9U l*        U R                  U l+        U RY                  5         U RV                  R,                  u  U l        U l        U RB                  [2        R@                  " U R                  R,                  5      :X  a0  [2        R@                  " U RV                  R,                  5      U l!        OCU RB                  [2        R@                  " U RV                  R,                  5      :  a  [        S5      eSU l-        S U l.        S U l/        S U l0        S U l1        S U l2        S U l3        S =U l4        U l5        S U l6        S U l7        S U l8        S U l9        S U l:        S U l;        S U l<        U R{                  5       U l/        U R}                  5         U(       a5  U R                  5         U R{                  5       U l/        U R}                  5         U R                  5         U R                   b  U R                  5         g g )Ndata   )ndimgls	normalizesvd_fmtolr      z$tol must be strictly between 0 and 1r	   max_em_itertol_emstandardizedemeanweightsT)maxdimoptionalz!weights should have nvar elements       @zThe requested number of components is more than can be computed from data. The maximum number of components is the minimum of the number of observations or variables)eigsvdnipalszmethod z is not known.r'   missing)r$   zWhen adjusting for missing values, user provided ncomp must be no larger than the smallest dimension of the missing-value-adjusted data size.        )B_index_columns
isinstancepd	DataFrameindexcolumnsr   r   r   _gls
_normalize_svd_full_matricesr   _tol
ValueErrorr	   	_max_iter_max_em_iter_tol_em_standardize_demeanshape_nobs_nvarr   onesarrayflattenr   meanr"   min_ncompwarningswarnr   _methodarangerowscolsr   _missing_adjusted_data_adjust_missing_tss_esstransformed_data_mu_sigma
_ess_indiv
_tss_indivscoresfactorsloadingscoeff	eigenvals	eigenvecs
projectionrsquareic_prepare_data_pca_compute_gls_weights_compute_rsquare_and_ic
_to_pandas)selfr   ncompr    r!   r   r   r"   methodr)   r   max_iterr   r   svd_full_matricesmin_dimrE   rF   s                     r   __init__PCA.__init__   s    dBLL))**DK LLDMtV!4	c5)	#I{;"+,=x"HsE*	499 q CDD!(J7$[-@!&(3 &k=A 2!%
DJWiDI?ggdjj)Ghhw'//1G}}Q4::- !DEEC(=(=(? @@G djj$**-!&gE;; LD MM$-!K<<77wvhn=>><<5 &*D#IIdjj)	IIdjj)	#GYF"ii "&!4!4!:!:
DJ;;"&&11&&!4!4!:!:;DK[[266$"5"5";";<< A B B 		 $%))dl
 !% 2 2 4		%%'$($6$6$8D!IIK 	$$&;;"OO #r   c                    S nS nU R                   S:X  aN  U" U R                  5      u  U l        n[        R                  " U5      S   U l        U R                  U   U l        GOzU R                   S:X  a:  U" U R                  5      u  U l        n[        R                  " U5      S   U l        GO0U R                   S:X  a  U" U R                  5      u  pEUR                  nU" U R                  5      u  pxUR                  n	X:  a%  Xpl        [        R                  " U5      S   U l        OX@l        U R                  U   U l        [        R                  " U5      S   U l        O}U R                   S:X  a  U R                  5       U l        OWU R                   c?  [        R                  " U R                  5      R                  5       (       d  [        S	5      eO[        S
5      eU R                  b<  U R                  U R
                     U l        U R                  U R                     U l        U R                  R                  S:X  a  [        S5      eg)z5
Implements alternatives for handling missing values
c                     [         R                  " [         R                  " [         R                  " U 5      S5      5      nU S S 2U4   U4$ )Nr   r   logical_notanyisnanr   r0   s     r   keep_col%PCA._adjust_missing.<locals>.keep_col3  s6    NN266"((1+q#9:EQX;%%r   c                     [         R                  " [         R                  " [         R                  " U 5      S5      5      nXS S 24   U4$ )Nr   rm   rq   s     r   keep_row%PCA._adjust_missing.<locals>.keep_row7  s4    NN266"((1+q#9:EAX;%%r   zdrop-colr   zdrop-rowzdrop-minzfill-emNzdata contains non-finite values (inf, NaN). You should drop these values or
use one of the methods for adjusting data for missing-values.zmissing method is not known.z2Removal of missing values has eliminated all data.)rK   r   rL   r   whererJ   r"   rI   size_fill_missing_emisfiniteallr6   r+   r,   )
rc   rr   ru   r0   drop_coldrop_col_indexdrop_col_sizedrop_rowdrop_row_indexdrop_row_sizes
             r   rM   PCA._adjust_missing.  s   
	&	& ==J&)1$)))<&D*DI<<.DL]]j()1$)))<&D*DI]]j('/		':$H$MMM'/		':$H$MMM,&.#HH^4Q7	&.##||N;HH^4Q7	]]i'"&"7"7"9D]]";;t2237799  "A B B :
 ;<<;;" MM$))4DM++dii0DK ##q( ) * * )r   c                 8   [         R                  " U R                  SS95      nU R                  U-
  nU R                  U R
                  :X  a  [        S5      eUS-  R                  S5      nSU-  nU[         R                  " US-  R                  5       5      -  nU R
                  nS[        XDR                  5       -  S-  5      -  U-  nUS:  aD  [        [         R                  " Xe-  5      5      nSSKnS	U S
U S3n	UR                  U	[        5        X@l        g)z6
Computes GLS weights based on percentage of data fit
F)	transformzOgls can only be used when ncomp < nvar so that residuals have non-zero variancer%   r         ?g?Nz3Many series are being down weighted by GLS. Of the z- series, the GLS
estimates are based on only z (effective) series.)r   asarrayprojectrP   rD   r>   r6   rB   r   r   introundrE   rF   r   r"   )
rc   r[   errorsvarr"   nvareff_series_perc
eff_seriesrE   rF   s
             r   r`   PCA._compute_gls_weightsc  s    ZZu =>
&&3;;$**$ H I I}""1%)BGGW^$9$9$;<<zzg&=#%E!FF$NS RXXo&<=>J4486 :'L(<@D MM$ 12r   c                 n    U R                  5         U R                  5         U R                  5       U l        g)z
Main PCA routine
N)_compute_eig_compute_pca_from_eigr   r[   rc   s    r   r_   PCA._pca|  s)     	""$,,.r   c                 j    U R                  5       nUS S nUS[        [        U 5      5      -   S-   -  nU$ )Nz, id: ))__str__hexid)rc   strings     r   __repr__PCA.__repr__  s9    (SD]*S00r   c                    SnUS[        U R                  5      -   S-   -  nUS[        U R                  5      -   S-   -  nU R                  (       a  SnOU R                  (       a  SnOSnUSU-   S-   -  nU R
                  (       a  US	-  nUS
[        U R                  5      -   S-   -  nUS[        U R                  5      -   S-   -  nXR                  S:X  a  SOS-  nUS-  nU$ )NzPrincipal Component Analysis(znobs: z, znvar: zStandardize (Correlation)zDemean (Covariance)Noneztransformation: zGLS, znormalization: znumber of components: r&   zmethod: EigenvalueSVDr   )	strr=   r>   r:   r;   r2   r3   rD   rG   )rc   r   kinds      r   r   PCA.__str__  s    0(S_,t33(S_,t33.D\\(DD$t+d2299gF#c$//&::TAA*S-==DD||u/D+%O#r   c                    U R                   n[        R                  " [        R                  " U5      5      (       a@  [        R                  " UR
                  S   5      R                  [        R                  5      $ [        R                  " USS9U l	        [        R                  " [        R                  " XR                  -
  S-  SS95      U l        U R                  (       a  XR                  -
  U R                  -  nO"U R                  (       a  XR                  -
  nOUnU[        R                  " U R                  5      -  $ )z
Standardize or demean data.
r   r   axisr%   )rL   r   r{   rp   emptyr<   fillnannanmeanrQ   r   rR   r:   r;   r"   )rc   adj_datar   s      r   r^   PCA._prepare_data  s     &&66"((8$%%88HNN1-.33BFF;;::hQ/ggbjj(XX*=#)EANOxx'4;;6D\\xx'DDbggdll+++r   c                     U R                   S:X  a  U R                  5       $ U R                   S:X  a  U R                  5       $ U R                  5       $ )zb
Wrapper for actual eigenvalue method

This is a workaround to avoid instance methods in __dict__
r&   r'   )rG   _compute_using_eig_compute_using_svd_compute_using_nipalsr   s    r   r   PCA._compute_eig  sI     <<5 **,,\\U"**,,--//r   c                     U R                   n[        R                  R                  XR                  S9u  p#nUS-  U l        UR                  U l        g)z/SVD method to compute eigenvalues and eigenvecs)full_matricesr%   N)rP   r   linalgr'   r4   rY   TrZ   )rc   r   usvs        r   r   PCA._compute_using_svd  sA    !!))--1H1H-Iacr   c                     U R                   n[        R                  R                  UR                  R                  U5      5      u  U l        U l        g)zI
Eigenvalue decomposition method to compute eigenvalues and eigenvectors
N)rP   r   r   eighr   dotrY   rZ   )rc   r   s     r   r   PCA._compute_using_eig  s6     !!)+
)C&r   c                    U R                   nU R                  S:  a  US-   nU R                  U R                  U R                  pCn[        R
                  " U R                  5      n[        R
                  " U R                  U R                  45      n[        U5       GHH  n[        R                  " UR                  S5      5      nUSS2U/4   n	Sn
SnX:  a  X:  a  UR                  R                  U	5      U	R                  R                  U	5      -  nU[        R                  " UR                  R                  U5      5      -  nU	nUR                  U5      UR                  R                  U5      -  n	[        X-
  5      [        U	5      -  nU
S-  n
X:  a  X:  a  M  U	S-  R                  5       XW'   WUSS2U/4'   US:  d  GM+  XR                  UR                  5      -  nGMK     XPl        X`l        g)zO
NIPALS implementation to compute small number of eigenvalues
and eigenvectors
r   r*   r   Nr   r   )rP   rD   r5   r7   r   zerosr>   rangeargmaxr   r   r   r   r   r   rY   rZ   )rc   r   r   rf   rd   valsvecsimax_var_indfactor_iterdiffvecfactor_lasts                 r   r   PCA._compute_using_nipals  s   
 !!;;?CA#yy$..$++uxx$xxT[[12uA))AEE!H-Kq;-'(FED*!1ccggfof)=>BGGCEEIIcN33$ssuuyy~6V12U6]B
 *!1 {'')DGDQCLqyZZ&& " r   c                    [         R                  " [         R                  " U R                  5      5      n[         R                  " U5      (       a  U R                  $ [         R
                  " U R                  5       5      =o l        U R                  n[         R                  " US5      n[         R                  " US5      n[         R                  " XC:  5      (       d  [         R                  " XS:  5      (       a  [        S5      e[         R                  " U5      n[         R                  " US5      n[         R                  " U R                  S45      U-  nX   n	XU'   Sn
SnXR                  :  a  XR                   :  a  U	nX l        U R#                  5         U R%                  5         [         R
                  " U R'                  SSS95      nX   n	XU'   X-
  n[)        U5      [)        U	5      -  n
US-  nXR                  :  a  XR                   :  a  M  U R*                  S-   n[         R
                  " U R'                  5       5      nX   X&'   U$ )z%
EM algorithm to fill missing values
r   r   z\Implementation requires that all columns and all rows have at least ncomp non-missing valuesr   F)r   unweightr*   )r   rn   rp   r   r{   r   r^   rP   rD   r   ro   r6   r   r?   r=   r9   r8   r   r   r   r   rL   )rc   non_missingr   rd   col_non_missingrow_non_missingmaskmur[   projection_maskedr   r   last_projection_maskeddeltas                 r   ry   PCA._fill_missing_em  s    nnRXXdii%89 66+99 (*zz$2D2D2F'GG$ &&a0&&a066/)**bff_5L.M.M O P P xx~ ZZa  WWdjj!_-2
&,&T
 \\!e.?.?&?%6"$(!&&(DLL5:? %1 %A BJ * 0*J*>E<%(9"::DQJE \\!e.?.?&?  ""S(ZZ/
%
r   c                    U R                   U R                  p![        R                  " U5      nUSSS2   nX   nUSS2U4   nUS:*  R	                  5       (       a  UR
                  S   US:*  R                  5       -
  nX@R                  :  a]  SSKnUR                  SR                  US9[        5        X@l        [        R                  " [        R                  5      R                  XS& USU R                   nUSS2SU R                  24   nXsU l         U l        U R                  R!                  U5      =U l        U l        X l        UR(                  U l        U R,                  (       aw  U R*                  R(                  [        R.                  " U5      -  R(                  U l        U =R$                  [        R.                  " U5      -  sl        U R$                  U l        gg)zB
Compute relevant statistics after eigenvalues have been computed
Nr   r   zgOnly {num:d} eigenvalues are positive.  This is the maximum number of components that can be extracted.)num)rY   rZ   r   argsortro   r<   r   rD   rE   rF   formatr   finfofloat64tinyrP   r   rU   rV   rW   r   rX   r3   r   )rc   r   r   indicesnum_goodrE   s         r   r   PCA._compute_pca_from_eig#  ss   
 ^^T^^d**T"$B$-}AwJAI??zz!}	'88H++% 77=v(v7K/1
 '"$((2::"6";";YLT[[!A||O$)-&%)%:%:%>%>t%DDdlVV
??**,,699DJLLBGGDM)L,,DK r   c                 p   U R                   nU R                  [        R                  " U5      -  n[        R                  " US-  S5      U l        [        R                  " U R
                  5      U l        [        R                  " U R                  S-   5      U l	        [        R                  " U R                  S-   U R                  45      U l        [        U R                  S-   5       Hr  nU R                  USSS9nUS-  R	                  SS9nUR	                  5       nU R                  U-
  U R                  U'   U R
                  U-
  U R                  USS24'   Mt     SU R                  U R                  -  -
  U l        U R                  nUS:*  nUR                  5       (       a,  [        R                   " U5      S   R#                  5       n	USU	 n[        R$                  " U5      n
[        R&                  " UR(                  S   5      nU R*                  U R                  pX-   X-  -  n[#        X5      n[        R,                  " U[        R$                  " SU-  5      -  U[        R$                  " U5      -  [        R$                  " U5      U-  /5      nUSS2S4   nXU-  -   nUR.                  U l        g)	z
Final statistics to compute
r   r   r   F)rd   r   r   r   Nr   )r"   rP   r   r   r   rT   rN   r   rD   rO   r>   rS   r   r   r\   ro   rw   rC   logrH   r<   r=   r@   r   r]   )rc   r"   ss_datar   r[   	indiv_rssrssessinvalidlast_obslog_essrnobsr   sum_to_prodrh   	penaltiesr]   s                     r   ra   PCA._compute_rsquare_and_icG  s    ,,''"'''*::&&Aq1FF4??+	HHT[[1_-	((DKK!OTZZ#@At{{Q'AAOJ#q--1-5I--/C99s?DIIaL$(OOi$?DOOAqD! ( TYY22ii(;;==)!,113Hix.C&&+IIciil#ZZd{t{3d/HHkBFF33D,EE)BFF7O; ffWo79 :	 ag&	9}$$$r   c                    Uc  U R                   OUnXR                   :  a  [        S5      e[        R                  " U R                  5      n[        R                  " U R
                  5      nUSS2SU24   R                  USU2SS24   5      nU(       d  U(       a#  U[        R                  " U R                  5      -  nU(       aO  U R                  (       a  X`R                  -  nU R                  (       d  U R                  (       a  X`R                  -  nU R                  b*  [        R                  " UU R                   U R                  S9nU$ )aN  
Project series onto a specific number of factors.

Parameters
----------
ncomp : int, optional
    Number of components to use.  If omitted, all components
    initially computed are used.
transform : bool, optional
    Flag indicating whether to return the projection in the original
    space of the data (True, default) or in the space of the
    standardized/demeaned data.
unweight : bool, optional
    Flag indicating whether to undo the effects of the estimation
    weights.

Returns
-------
array_like
    The nobs by nvar array of the projection onto ncomp factors.

Notes
-----
Nz=ncomp must be smaller than the number of components computed.r1   r0   )rD   r6   r   r   rV   rX   r   r   r"   r:   rR   r;   rQ   r+   r.   r/   r,   )rc   rd   r   r   rV   rX   r[   s          r   r   PCA.projectp  s    4  %}%;; 4 5 5**T\\*

4::&QY'++E&5&!),<=
"''$,,//J  kk)
  DLLhh&
;;"j.2mm,0KK9J r   c                 (   U R                   n[        R                  " [        R                  " U R                  5      5      nS[        [        U5      5      -   S-   n[        U R                  5       Vs/ s H  oCR                  U5      PM     nn[        R                  " U R                  XQS9nU=U l        U l        [        R                  " U R                  U R                  US9nX`l        [        R                  " U R                  UU R                  S9nX`l        [        R                  " U R                   U R                  US9nX`l        [        R"                  " U R$                  5      U l        SU R$                  l        UR)                  SS5      n[        U R*                  R,                  S   5       Vs/ s H  oGR                  U5      PM     nn[        R                  " U R*                  US	9U l        [        R"                  " U R.                  5      U l        S
U R.                  R0                  l        SU R.                  l        [        R                  " U R2                  / SQS	9U l        S
U R2                  R0                  l        gs  snf s  snf )z*
Returns pandas DataFrames for all values
z	comp_{0:0zd}r   )r0   r1   rY   compeigenvecr   )r1   rd   r\   )IC_p1IC_p2IC_p3N)r+   r   ceillog10rD   r   r   r   r   r.   r/   rV   rU   r[   r,   rX   rW   SeriesrY   namereplacerZ   r<   r\   r0   r]   )rc   r0   	num_zeroscomp_strr   rJ   dfvec_strs           r   rb   PCA._to_pandas  s    GGBHHT[[12	S^!44t;,1$++,>?,>q",>?\\$,,B%''dl\\$//"&-- %' \\$**D"&--1
\\$-- $t=4>>2)""6:6+01E1Ea1H+IJ+Iaq!+IJdnndCyy.")%,,tww0KL$; @* Ks   2J
7Jc           	         SSK Js  Jn  UR                  U5      u  pdUc  U R                  OUn[
        R                  " U R                  5      nUSU R                   nU(       a  [
        R                  " U5      nU(       a  UR                  S5        UR                  [
        R                  " U5      USU S5        UR                  SS9  [
        R                  " UR                  5       5      nUS   US   -
  n	US[
        R                  " U	* U	/5      -  -  nUR                  U5        [
        R                  " UR!                  5       5      n
SnU(       a  [
        R"                  " U
S   U
S   -  5      n	[
        R$                  " [
        R                  " [
        R"                  " U
S   5      X-  -
  [
        R"                  " U
S   5      X-  -   /5      5      n
O)U
S   U
S   -
  n	X[
        R                  " U	* U	/5      -  -  n
UR'                  U
5        UR)                  S	5        UR+                  S
5        UR-                  S5        UR/                  5         U$ )aD  
Plot of the ordered eigenvalues

Parameters
----------
ncomp : int, optional
    Number of components ot include in the plot.  If None, will
    included the same as the number of components computed
log_scale : boot, optional
    Flag indicating whether ot use a log scale for the y-axis
cumulative : bool, optional
    Flag indicating whether to plot the eigenvalues or cumulative
    eigenvalues
ax : AxesSubplot, optional
    An axes on which to draw the graph.  If omitted, new a figure
    is created

Returns
-------
matplotlib.figure.Figure
    The handle to the figure.
r   Nr   boT)tightr   g{Gz?z
Scree Plot
EigenvaluezComponent Number)statsmodels.graphics.utilsgraphicsutilscreate_mpl_axrD   r   r   rY   cumsum
set_yscaleplotrH   	autoscaler@   get_xlimset_xlimget_ylimr   expset_ylim	set_title
set_ylabel
set_xlabeltight_layout)rc   rd   	log_scale
cumulativeaxgutilsfigr   xlimspylimscales               r   
plot_screePCA.plot_scree  s   0 	43&&r*$}%zz$..)LT[[!99T?DMM% 
		% $w-6
4 xx&!WtAwrxx"b	***
Dxx&Q$q')*B66"((BFF47Oej$@$&FF47Oej$@$B C DD a47"BBHHrc2Y///D
D
\"
l#
()
r   c                 ^   SSK Js  Jn  UR                  U5      u  pBUc  SOUn[	        XR
                  5      nSU R                  U R                  -  -
  nUSS nUSU nUR                  UR                  5        UR                  S5        UR                  S5        UR                  S5        U$ )	a  
Box plots of the individual series R-square against the number of PCs.

Parameters
----------
ncomp : int, optional
    Number of components ot include in the plot.  If None, will
    plot the minimum of 10 or the number of computed components.
ax : AxesSubplot, optional
    An axes on which to draw the graph.  If omitted, new a figure
    is created.

Returns
-------
matplotlib.figure.Figure
    The handle to the figure.
r   N
   r   r   zIndividual Input $R^2$z$R^2$z'Number of Included Principal Components)r  r  r  r	  rC   rD   rS   rT   boxplotr   r  r  r  )rc   rd   r  r  r  r2ss         r   plot_rsquarePCA.plot_rsquare  s    $ 	43&&r*mE;;'DOOdoo55!"g&5k


355
-.
g
?@
r   )%rL   r,   r;   rO   rS   r2   r+   r8   r7   rG   rK   rQ   rD   r=   r3   r>   rR   r:   r4   r5   r9   rN   rT   rX   rJ   r   rY   rZ   rV   r]   rW   r[   rI   r\   rU   rP   r"   )NTTTFNr'   NHj>i  r(  d   F)NTT)NTFN)NN)__name__
__module____qualname____firstlineno____doc__ri   rM   r`   r_   r   r   r^   r   r   r   r   ry   r   ra   r   rb   r   r&  __static_attributes__ r   r   r   r      s    kZ CGAF?C49fP3*j2)&,$0D@7r"'H'R.`%%N 04(,:x!r   r   c                     [        XX#XEXgS9nUR                  UR                  UR                  UR                  UR
                  UR                  UR                  4$ )az	  
Perform Principal Component Analysis (PCA).

Parameters
----------
data : ndarray
    Variables in columns, observations in rows.
ncomp : int, optional
    Number of components to return.  If None, returns the as many as the
    smaller to the number of rows or columns of data.
standardize : bool, optional
    Flag indicating to use standardized data with mean 0 and unit
    variance.  standardized being True implies demean.
demean : bool, optional
    Flag indicating whether to demean data before computing principal
    components.  demean is ignored if standardize is True.
normalize : bool , optional
    Indicates whether th normalize the factors to have unit inner
    product.  If False, the loadings will have unit inner product.
gls : bool, optional
    Flag indicating to implement a two-step GLS estimator where
    in the first step principal components are used to estimate residuals,
    and then the inverse residual variance is used as a set of weights to
    estimate the final principal components
weights : ndarray, optional
    Series weights to use after transforming data according to standardize
    or demean when computing the principal components.
method : str, optional
    Determines the linear algebra routine uses.  'eig', the default,
    uses an eigenvalue decomposition. 'svd' uses a singular value
    decomposition.

Returns
-------
factors : {ndarray, DataFrame}
    Array (nobs, ncomp) of principal components (also known as scores).
loadings : {ndarray, DataFrame}
    Array (ncomp, nvar) of principal component loadings for constructing
    the factors.
projection : {ndarray, DataFrame}
    Array (nobs, nvar) containing the projection of the data onto the ncomp
    estimated factors.
rsquare : {ndarray, Series}
    Array (ncomp,) where the element in the ith position is the R-square
    of including the fist i principal components.  The values are
    calculated on the transformed data, not the original data.
ic : {ndarray, DataFrame}
    Array (ncomp, 3) containing the Bai and Ng (2003) Information
    criteria.  Each column is a different criteria, and each row
    represents the number of included factors.
eigenvals : {ndarray, Series}
    Array of eigenvalues (nvar,).
eigenvecs : {ndarray, DataFrame}
    Array of eigenvectors. (nvar, nvar).

Notes
-----
This is a simple function wrapper around the PCA class. See PCA for
more information and additional methods.
)rd   r    r!   r   r   r"   re   )r   rV   rW   r[   r\   r]   rY   rZ   )	r   rd   r    r!   r   r   r"   re   pcs	            r   pcar3  '  sQ    | 
TK 7
KB JJR]]BJJLL",,( (r   )NTTTFNr'   )r.  numpyr   pandasr.   statsmodels.tools.sm_exceptionsr   r   statsmodels.tools.validationr   r   r   r   r	   r   r   r3  r0  r   r   <module>r8     sD     @, ,"L L^ DH(-B(r   