What I learned today at Pickle Research Campus

Every Friday, no matter how tired I have been with the whole week or how many homework I have to finish, I always look forward to my routine trips to Pickle Research Campus. What do I do over there? There are weekly research seminars given by Ph.ds and also advisors and practitioners in Geophysics. For a majority of time I cannot understand what the content means and the questions they target especially when it is very technical in geophysics, but I still try my best to understand the math part and I enjoyed it very much.

I have been constantly moved by the attitudes of the researchers here towards their careers and how candidate they are to knowledge and how passionate they are about the problems they want to solve. I know that there is a lot of things to learn and books to read and I am not supposed to be just satisfied with accomplishing the homework and having high GPAs, but the extra knowledge out of class.

Studying lists about data science

数据科学:简单说就是,不要靠拍脑袋下结论,要以数据为根据,让事实说话。

能力范畴3个词:统计编程表述


A PhD Data Scientist: Jack of All trades, master of one.

展开说:统计(能探索数据,建模,设计实验),. 鍥磋鎴戜滑@1point 3 acres
编程(能取数据,洗数据,至少能Prototype自己的data solution,懂基本大数据工作原理(MapReduce)),
表述(化繁为简,口头Present,书面写报告和论文,作图(静态和web))


简历上(+脑子里)如果有这些:你找工作基本没有问题:
Ttest, Regression, ANOVA, Logistic Regression, DOE, Machine Learning, Data Mining, MapReduce, SQL, R/Matlab, Python, Java

=========================================.鏈枃鍘熷垱鑷�1point3acres璁哄潧
本文主要针对IT类行业做数据科学 It does not define a data engineer. Rather, it’s a close call to a “full-stack data scientist”. Master this list and you will not only be able to work for established firms, but startups too.
其他偏重传统行业应用的,应该对表述要求稍高,对其他要求稍低。
面试之前请务必花1周时间学习对方行业的基本内容,wikipedia即可,起码做到熟悉对方行业常用关键字。
如果目的就是有份还可以的工作,请照单子静下心学习。
如果你希望做的很好,三个方面请突出至少一个方面。
要学过来,需要很多时间,如果希望不太费力就做data scientist, OK, dream on!

请不要mark一份学习清单就.Equals(学习任务已经完成了)一样,一起来学起来吧~~~~~~
【墙裂建议贴出你的学习计划,大家一起监督讨论,几位版主有空也会来给建议,坚持下来的有积分奖励】
=========================================
如果有不清楚的请多google.

=========================================
差不多一年前看市面工作还是很混杂的样子,今天又翻了翻,估计年底账目清算,很多公司很多新职位出来了,职位要求解析在此
感觉现在data scientist/researcher之类职位针对性更强,能更清楚看出来到底对方需要的是什么样的人:是啥都会一点的,还是会点统计的码农,还是Machine learning,还是优化、logistics 供应链,还是会点编程的统计师。. 鐣欏鐢宠璁哄潧-涓€浜╀笁鍒嗗湴
(data business person 一般不叫data scientist) 主要用SQL产生报表的BI analyst 也不在此列。
. 1point 3acres 璁哄潧
学习列表一来是准备面试用,二来本来平时就是要用的。我自己学完的mark as green
=========================================
打算把我自己学的一些东西总结在这里欢迎补充。不定期汇总到首楼。
如果你想收藏本帖请点首楼下方的“收藏” -》 确定 -》 然后文章会出现在 “快捷导航”-》收藏里面
如果没有啥具体内容要补充的,请不必回帖了。想加分的可以加分,不加也无所谓。

请别问我某校的Data Science项目如何,你三围如何能否上某校。I have no idea.
. 鐣欏鐢宠璁哄潧-涓€浜╀笁鍒嗗湴
=========================================. 涓€浜�-涓夊垎-鍦帮紝鐙鍙戝竷
基本上是must have:

统计Statistics 统计和机器学习
hypothesis testing, point/interval estimation
pvalue, power, (type 1/2 error)
clt, delta method, derive coef and var(coef) etc
t-test: assumptions, remedy. 适用问题范围basics listed above 请看这个课 http://onlinestatbook.com/2/index.html
glm (lm, logistic regression, anova etc):asssumptions, model selection and validation, diagnostics, remedy 适用问题范围

  times series         Forecast with R
Time Series Analysis and Its Applications: With R Examples (Springer Texts in Statistics)
and its Upitt course

bayesian
Bayesian for hackers (python)
Coursera Graphical Model (VERY nicely explained)
Bayesian reasoning and machine learning book (quite difficult to read)
入门:A first course in Bayes 一下就看完了,很不错

longitudinal, mixed model
doe:all kinds of design, response surface
(?)survival

Machine Learning        Coursera Andrew Ng. 鍥磋鎴戜滑@1point 3 acres
stanford Statistical Learning (Tibshrani & Hastie)
        — 本书还出了一个本科版,着重动手实践,大量R, very easy to read. recommend starting from here. 
Caltech那个learning from Data我没能跟下来

统计软件Statistical Computing: R/Matlab/Python. SAS(?)
R and Matlab 基本被业界认为是等同的。不过Matlab is not free, Octave is free 但是不是那么好用。请考虑自学R。反正你会Matlab 的话pick up R 也就分分钟的事情。
如果其他语言一个都不会,只会SAS Base/Stat,并且你也不想学其他的,那也许数据科学不适合你。如果你非要用SAS不可,请你至少写过macro。SAS的确在大数据的建模里面非常有用,但是跟其他行业差距较大,如果组里其他人都是R/Py/Java 你跟他们交流起来会异常困难。另外软件很贵,很多地方未必愿意买。
注意,我说的是,会SAS是好事,但是不能仅仅只会SAS.
Python: Data Analysis with Python (book), pandas
R: data.table, or plyr, lubridate, reshape2, build a R package, there are now lots of such courses on both udacity and coursera. Start from any.
know how to get data from any source (DB, web, xml, plain text, etc)
EDA (exploratory) – Descriptive stats udacity
Inference – udacity
Plot/explain
read code from your favorite packages. 鐗涗汉浜戦泦,涓€浜╀笁鍒嗗湴

—————————————————–
编程 : A compiled language, and a scripting language
Python 
我比较偏好Udacity一遍教一遍做quiz 的方式,光做题不讲(codecademy)我自己好像学不清楚
    Udacity CS101. Waral 鍗氬鏈夋洿澶氭枃绔�,
    Udacity CS 215 (Algorithm, 比Coursera Princeton and Stanford要简单,快速过一遍不错)
    Udacity (Peter Norvig) CS212 Design of a Computer Program 非常好,强烈推荐

Java 数据结构和算法
1. Udacity java (这门课我花了40小时学完)适合连什么是函数什么是赋值都不知道的人。
2. Data structure 数据结构建议必学       python: Problem Solving with Algorithms and Data Structures)
Java:  Berkeley 61B http://www.cs.berkeley.edu/~jrs/61b/
教材是Head First Java & Data Structures and Algorithms in Java,. 1point3acres.com/bbs
my progress bar: week 5, lab1, hw1.
3. Algorithm:                  Udacity Algo in Python 比较laid back,如果不太希望费劲,可以上这个课,不过还是严肃点好。。。
Java Coursera Algo I&II (Princeton),如果对这个话题有兴趣,
                  不限语言 Stanford Algo I&II也很好,两者不可相互代替。
.鏈枃鍘熷垱鑷�1point3acres璁哄潧
很少会有人学的第一门语言是C#,所以C#还真没有什么特别入门的书,不推荐。如果没从前没学C, java, C++直接看C#的书简直无法理解
C++比较难,对data scientist 来说应用也没有java广。当然如果你是大牛,plz当我没说。

根据我组里面试别人,和我在其他地方面试,量化一下:数科的编程到底需要什么水平?
我假定你有了上述其他的全部功底,除非职位特别强调是统计师,或者叫Data scientist, statistics/analytics,并且职位说明里面对代码完全一带而过,你都可以假设,是需要一些代码能力的 。
具体水平是:
IT公司数科:Leetcode Medium要可做。所以,刷题吧。 
传统公司:不知道.1point3acres

如果你是码农出身,或者做更偏向data engineer的,要求会更高

涉及知识点包括并且不限于:
浮点溢出
边界情况考虑
改进MapReduce算法(beyond brute force)
如果涉及大数据,对时间复杂度要求会比较高
— 其他我想起来了慢慢补

顺手学掉的小零碎:
Regex (a couple of hours) http://deerchao.net/tutorials/regex/regex.htm



SQL (a week) http://www.w3schools.com/sql/    Coursera: Intro to DB

大数据:
MapReduce: some knowledge    Udacity series:    http://blog.udacity.com/2013/11/sebastian-thrun-launching-our-data.html    Coursera: intro to Data Science  
    Coursera: Big data and web intelligence
    learning by doing — yes! wrote my very first reducer for real life projects!    MongoDB (udacity). 涓€浜�-涓夊垎-鍦帮紝鐙鍙戝竷

If your want to be a DS for IT firms, then Maybe:
jquery/ajax (start from codecademy very simple js and jquery intro, then find books)
—————————————————–
web services   get basic idea of how browsers work (udacity – Website Performance optimization). visit 1point3acres.com for more.
udacity web development (build a blog) (40 hours)
—————————————————–
SE.鐣欏璁哄潧-涓€浜�-涓夊垎鍦�
   Software Development Life Cycles (udacity, mostly videos, as a quick intro only), amazingly, this one filled lots of holes in my knowledge base. Highly recommend
Also a book is mentioned here, worth a quick flip through, unfortunately, no ebook that I found works. Martin Fowler, Kent Beck, John Brant, William Opdyke, Don Roberts-Refactoring_ Improving the Design of Existing Code

— this is helpful not only for working in IT, but helps overall coding style/efficiency as well. Wished I’d known earlier.
—————————————————–. visit 1point3acres.com for more.
Linux
Many servers are in linux. at least familiarize yourself with the command line stuff. There’s a not so good course on Edx.
—————————————————–. visit 1point3acres.com for more.
综合/分析/表述/软技能
    软技能难以表述,
技巧不是最重要,想清楚再开口才是关键。突然发现我导师的lab页面竟然是用这些问题开头,深感心有戚戚。

化繁为简,高屋建瓴的表达能力:hide complex formula/engineering details,尽量传达big picture
    个人经验是,习得这些能力最好的办法是:去讲,不要自顾自的讲话,请随时关注听众是否听懂,鼓励对方马上提问,回答问题要选取符合对方背景的关键字,而不是“自己熟悉”的关键字。不要用缩写,小范围术语。多讲清楚intuition,少堆积公式。
    1. 教一门自己专业的入门课,e.g 统计学生,去给其他专业的人讲入门统计,例子:请给完全不懂统计的人讲,什么是pvalue, power, false positive, randomization, inference etc. 
    2. Consulting – 有些学校会有这种session,别觉得浪费时间,去把别人讲懂,去看看别人用你的专业技术做什么问题,他们的思路跟你哪里不同,你如何理解他们,如何让他们理解你。
    3. 做presentation – 不要像专业学术会议上那样去讲,要向给别人上101课那样讲。讲的目的,不是展示你的专业多么复杂深奥,不是为了impress others with your techinal prowess,而是让对方懂,最终听取你的建议。. visit 1point3acres.com for more.
    Data Journalism (course, starting early 2014) — it was not as good as I expected. I do not recommend it. 

作图,静态的最好能会ggplot (a few hours), 动态的d3,如果你会javascript, also great!, 推荐读
     Nathan Yau: books visualize this & Data points, and his flowing data blog.鐣欏璁哄潧-涓€浜�-涓夊垎鍦�
     for d3: Interactive Data Visualization for the Web . free online tutorial by author: http://alignedleft.com/tutorials/d3/about 真的没那么难
    作图是否好看并不是关键所在,选用合适的图标来帮助解释道理才比较重要
html (a few hours, w3c)
css (a few hours, w3c), or codecademy, or the d3 book mentioned above
javascript (codecademy as a start, a book to follow later)

Rcharts/highcharts
Udacity现在也有一门新开的vis课了. 鍥磋鎴戜滑@1point 3 acres

Prototype your data products:
mean stack. https://thinkster.io/angulartutorial/mean-stack-tutorial/
起码把AngularJS学了,这个不光做数科有用。
R open CPU. R Shiny (limited usage with free version).
. from: 1point3acres.com/bbs 
虽然我们不是要做前段开发,但是看起来也得至少有个半吊子前段,请学习这MM的经验,超赞 http://www.1point3acres.com/bbs/thread-104335-1-1.html
Design:  (optional but nice to know) 如果没有兴趣请至少看(组合起来好看的颜色)  如果你有兴趣让图好看,请花一个周末翻看这几本:
    1. Before and After
    2. Nondesigner’s design book
    3. Don’t make me think
    4. The Wall Street Journal Guide to Information Graphics

Research/publication:
sharelatex (invite enough users to get free versioning) /writelatex.com
Go to conferences, see what people are working on. Read their papers.
如果你想找某些类型的工作,上linkedin找到组员,泛读他们的paper
. 鐣欏鐢宠璁哄潧-涓€浜╀笁鍒嗗湴
Domain Knowledge: google/wikipedia is your friend
.鏈枃鍘熷垱鑷�1point3acres璁哄潧
=========================================
整体思路:. 1point3acres.com/bbs
Doing Data science (book)
Data Science in Business
=========================================
other 一些我感觉不太费时间但是会有用的小东西
excel, power pivot etc
科普类的书:(都很简单易读)

大数据到底是啥???http://www.amazon.com/Big-Data-Revolution-Transform-Think-ebook/dp/B009N08NKW/ref=sr_1_1?ie=UTF8&qid=1384931538&sr=8-1&keywords=big+data
和很近似的一本 http://www.amazon.com/Automate-This-Algorithms-Markets-World-ebook/dp/B0064W5UAS/ref=sr_1_8?ie=UTF8&qid=1384931546&sr=8-8&keywords=algorithms
随便翻翻就好了
然后当然还有Nate Silver http://www.amazon.com/The-Signal-Noise-Predictions-Fail-but-ebook/dp/B007V65R54/ref=pd_sim_kstore_1
=========================================. 鐗涗汉浜戦泦,涓€浜╀笁鍒嗗湴
Case study:  Twitter data analytics http://tweettracker.fulton.asu.edu/tda/. 鐣欏鐢宠璁哄潧-涓€浜╀笁鍒嗗湴
=========================================
有人推荐的 MS  data science 学习curriculum  http://datasciencemasters.org/
=========================================大家给我推荐的帮助整理思路,用正确的方式做事的工具:It’s more important than you think!!
http://software-carpentry.org/lessons.html
coursera reproducible research,学转knitr,不要copy paste anything

Udacity Git Course (最好,没有之一). 1point3acres.com/bbs

Learning from Prof Sergey Fomel’s paper: “Adaptive multiple subtraction using regularized non stationary regression”

stationary process:

Definition

Formally, let \left\{X_t\right\} be a stochastic process and let F_{X}(x_{t_1 + \tau}, \ldots, x_{t_k + \tau}) represent the cumulative distribution function of the joint distribution of \left\{X_t\right\} at times t_1 + \tau, \ldots, t_k + \tau. Then, \left\{X_t\right\} is said to be stationary if, for all k, for all \tau, and for all t_1, \ldots, t_k,

 F_{X}(x_{t_1+\tau} ,\ldots, x_{t_k+\tau}) = F_{X}(x_{t_1},\ldots, x_{t_k}).

Since \tau does not affect F_X(\cdot) F_{X} is not a function of time.

Regression:

In all cases, the estimation target is a function of the independent variables called the regression function. In regression analysis, it is also of interest to characterize the variation of the dependent variable around the regression function which can be described by a probability distribution.

The performance of regression analysis methods in practice depends on the form of the data generating process, and how it relates to the regression approach being used. Since the true form of the data-generating process is generally not known, regression analysis often depends to some extent on making assumptions about this process. These assumptions are sometimes testable if a sufficient quantity of data is available. Regression models for prediction are often useful even when the assumptions are moderately violated, although they may not perform optimally. However, in many applications, especially with small effects or questions of causality based on observational data, regression methods can give misleading results.

match filtering:?

Adaptive subtraction: Standard adaptive subtraction methods use the well-known minimum energy criterion, stating that the total energy after optimal multiple attenuation should be minimal.

Adaptive subtraction

The goal of adaptive subtraction is to estimate the non-stationary filters ${\bf f}$ that minimize the objective function 

 \begin{displaymath}
g({\bf f})=\Vert{\bf Pf}-{\bf d}\Vert^2
,\end{displaymath} (71)

where ${\bf P}$ represents the non-stationary convolution with the multiple model obtained with SRMP (i.e., Chapter [*]) and ${\bf d}$ are the input data. These filters are estimated in a least-squares sense for one shot gather at a time. Note that in practice, a regularization term is usually added in equation ([*]) to enforce smoothness between filters. This strategy is similar to the one used in Chapter [*]. The residual vector ${\bf Pf}-{\bf d}$ contains the estimated primaries.

Multiple subtraction

In this section, the multiple model computed in the preceding section is subtracted from the data with two techniques. The model is obtained after shot interpolation with the sparseness constraint. The first technique is a pattern-based method introduced in Chapter [*] that separates primaries from multiples according to their multivariate spectra. These spectra are approximated with prediction-error filters. The second technique adaptively subtract the multiple model from the data by estimating non-stationary matching filters (see Chapter [*]).

Shaping regularization: http://www.reproducibility.org/RSF/book/jsg/shape/paper_html/

Least squares sense:

Least Squares Fitting

DOWNLOAD Mathematica Notebook EXPLORE THIS TOPIC IN the MathWorld Classroom Contribute to this entryLeastSquaresFitting

A mathematical procedure for finding the best-fitting curve to a given set of points by minimizing the sum of the squares of the offsets (“the residuals”) of the points from the curve. The sum of the squares of the offsets is used instead of the offset absolute values because this allows the residuals to be treated as a continuous differentiable quantity. However, because squares of the offsets are used, outlying points can have a disproportionate effect on the fit, a property which may or may not be desirable depending on the problem at hand.

Matched filter:

In signal processing, a matched filter (originally known as a North filter[1]) is obtained by correlating a known signal, or template, with an unknown signal to detect the presence of the template in the unknown signal. This is equivalent to convolving the unknown signal with a conjugated time-reversed version of the template. The matched filter is the optimal linear filter for maximizing the signal to noise ratio (SNR) in the presence of additive stochastic noise. Matched filters are commonly used in radar, in which a known signal is sent out, and the reflected signal is examined for common elements of the out-going signal. Pulse compression is an example of matched filtering. It is so called because impulse response is matched to input pulse signals. Two-dimensional matched filters are commonly used in image processing, e.g., to improve SNR for X-ray. Matched filtering is a demodulation technique with LTI filters to maximize SNR.

Derivation of the matched filter impulse response[edit]

The following section derives the matched filter for a discrete-time system. The derivation for a continuous-time system is similar, with summations replaced with integrals.

The matched filter is the linear filter, h, that maximizes the output signal-to-noise ratio.

\ y[n] = \sum_{k=-\infty}^{\infty} h[n-k] x[k].

Though we most often express filters as the impulse response of convolution systems, as above (see LTI system theory), it is easiest to think of the matched filter in the context of the inner product, which we will see shortly.

We can derive the linear filter that maximizes output signal-to-noise ratio by invoking a geometric argument. The intuition behind the matched filter relies on correlating the received signal (a vector) with a filter (another vector) that is parallel with the signal, maximizing the inner product. This enhances the signal. When we consider the additive stochastic noise, we have the additional challenge of minimizing the output due to noise by choosing a filter that is orthogonal to the noise.

Let us formally define the problem. We seek a filter, h, such that we maximize the output signal-to-noise ratio, where the output is the inner product of the filter and the observed signal x.

Our observed signal consists of the desirable signal s and additive noise v:

\ x=s+v.\,

Let us define the covariance matrix of the noise, reminding ourselves that this matrix has Hermitian symmetry, a property that will become useful in the derivation:

\ R_v=E\{ vv^\mathrm{H} \}\,

where v^\mathrm{H} denotes the conjugate transpose of v, and E denotes expectation. Let us call our output, y, the inner product of our filter and the observed signal such that

\ y = \sum_{k=-\infty}^{\infty} h^*[k] x[k] = h^\mathrm{H}x = h^\mathrm{H}s + h^\mathrm{H}v = y_s + y_v.

We now define the signal-to-noise ratio, which is our objective function, to be the ratio of the power of the output due to the desired signal to the power of the output due to the noise:

\mathrm{SNR} = \frac{|y_s|^2}{E\{|y_v|^2\}}.

We rewrite the above:

\mathrm{SNR} = \frac{|h^\mathrm{H}s|^2}{E\{|h^\mathrm{H}v|^2\}}.

We wish to maximize this quantity by choosing h. Expanding the denominator of our objective function, we have

\ E\{ |h^\mathrm{H}v|^2 \} = E\{ (h^\mathrm{H}v){(h^\mathrm{H}v)}^\mathrm{H} \} = h^\mathrm{H} E\{vv^\mathrm{H}\} h = h^\mathrm{H}R_vh.\,

Now, our \mathrm{SNR} becomes

\mathrm{SNR} = \frac{ |h^\mathrm{H}s|^2 }{ h^\mathrm{H}R_vh }.

We will rewrite this expression with some matrix manipulation. The reason for this seemingly counterproductive measure will become evident shortly. Exploiting the Hermitian symmetry of the covariance matrix R_v, we can write

\mathrm{SNR} = \frac{ | {(R_v^{1/2}h)}^\mathrm{H} (R_v^{-1/2}s) |^2 }
                  { {(R_v^{1/2}h)}^\mathrm{H} (R_v^{1/2}h) },

We would like to find an upper bound on this expression. To do so, we first recognize a form of the Cauchy-Schwarz inequality:

\ |a^\mathrm{H}b|^2 \leq (a^\mathrm{H}a)(b^\mathrm{H}b),\,

which is to say that the square of the inner product of two vectors can only be as large as the product of the individual inner products of the vectors. This concept returns to the intuition behind the matched filter: this upper bound is achieved when the two vectors a and b are parallel. We resume our derivation by expressing the upper bound on our \mathrm{SNR} in light of the geometric inequality above:

\mathrm{SNR} = \frac{ | {(R_v^{1/2}h)}^\mathrm{H} (R_v^{-1/2}s) |^2 }
                  { {(R_v^{1/2}h)}^\mathrm{H} (R_v^{1/2}h) }
             \leq
             \frac{ \left[
             			{(R_v^{1/2}h)}^\mathrm{H} (R_v^{1/2}h)
             		\right]
             		\left[
             			{(R_v^{-1/2}s)}^\mathrm{H} (R_v^{-1/2}s)
             		\right] }
                  { {(R_v^{1/2}h)}^\mathrm{H} (R_v^{1/2}h) }.

Our valiant matrix manipulation has now paid off. We see that the expression for our upper bound can be greatly simplified:

\mathrm{SNR} = \frac{ | {(R_v^{1/2}h)}^\mathrm{H} (R_v^{-1/2}s) |^2 }
                  { {(R_v^{1/2}h)}^\mathrm{H} (R_v^{1/2}h) }
             \leq s^\mathrm{H} R_v^{-1} s.

We can achieve this upper bound if we choose,

\ R_v^{1/2}h = \alpha R_v^{-1/2}s

where \alpha is an arbitrary real number. To verify this, we plug into our expression for the output \mathrm{SNR}:

\mathrm{SNR} = \frac{ | {(R_v^{1/2}h)}^\mathrm{H} (R_v^{-1/2}s) |^2 }
                  { {(R_v^{1/2}h)}^\mathrm{H} (R_v^{1/2}h) }
           = \frac{ \alpha^2 | {(R_v^{-1/2}s)}^\mathrm{H} (R_v^{-1/2}s) |^2 }
                  { \alpha^2  {(R_v^{-1/2}s)}^\mathrm{H} (R_v^{-1/2}s) }
           = \frac{ | s^\mathrm{H} R_v^{-1} s |^2 }
                  { s^\mathrm{H} R_v^{-1} s }
           = s^\mathrm{H} R_v^{-1} s.

Thus, our optimal matched filter is

\ h = \alpha R_v^{-1}s.

We often choose to normalize the expected value of the power of the filter output due to the noise to unity. That is, we constrain

\ E\{ |y_v|^2 \} = 1.\,

This constraint implies a value of \alpha, for which we can solve:

\ E\{ |y_v|^2 \} = \alpha^2 s^\mathrm{H} R_v^{-1} s = 1,

yielding

\ \alpha = \frac{1}{\sqrt{s^\mathrm{H} R_v^{-1} s}},

giving us our normalized filter,

\ h = \frac{1}{\sqrt{s^\mathrm{H} R_v^{-1} s}} R_v^{-1}s.

If we care to write the impulse response of the filter for the convolution system, it is simply the complex conjugate time reversal of h.

Though we have derived the matched filter in discrete time, we can extend the concept to continuous-time systems if we replace R_v with the continuous-time autocorrelation function of the noise, assuming a continuous signal s(t), continuous noise v(t), and a continuous filter h(t).

Monthly Summary (Jan 21st-Feb 21st)

School at UT Austin has started for over a month. I have 18 hours to get for my third semester at UT as I plan to apply for graduate school this Fall and graduate next spring. They are five math classes and one CS class, which indeed implies heavy workload everyday. Life is tough, this is the sentence that resound in my head for many times recently. Right at this moment when I’m writing blogs It just struck me that I hadn’t carried on my plan of preparation for GRE/GRE subject that I will take this April and this summer. You may know how uncomfortable I feel now. There are tons of things to be accomplished and I’m acting like a robot every single day. But to be honest, I cannot say I have tried all my best. February is the month of Chinese Lunar New Year, a time when my friends in China or abroad post pictures of wonderful food, traveling happy moments, time with family…… Whenever I got a chance to check out the pictures I’m usually in the middle of a coding project, or assignment, or preparation for interview. Sometimes I wish I can quit and do whatever I like without deadlines without restrictions of assignments homework lectures…

But when there is no pain, how can you know there is harvest waiting for you, how can you know you don’t feel bored again, how can you know you don’t feel lost? The same thing may happen and you will hope you switch to life you’re experiencing now. _DSC0519

I want to analyze why I feel tired. I have too high goals set to myself: I want to get good summer internship, I want to keep my high 3.9 GPA,  I want to be appraised as awesome wonderful in every interview I get, I want to do graduate studies in Stanford University, I want to be a strong and tough and smart and adorable girl, I want to be respected and I want to contribute to people and society. I have so many ideas in mind and so grand ambition, but when I examine myself closely I must say, yes you’re hardworking you’re capable but no that’s all way from sufficient if these listed above are your goals! I hope I can reach my targets at once as if I lead my perhaps future 5 or 10 or more years in a week. It is definitely far fetched. Whenever I fail myself, I feel guilty and hopeless maybe. This is maybe the so called imposter syndrome.

I feel lucky in the sense that I can always reorient myself to a promising better orientation after I feel down. This is a really precious capability. Everybody has a time when they don’t feel like doing anything, but only those who can save themselves out of the mess can succeed in their own definitions. I believe I’m one of them. So let me note down what I will do and definitely will have dene in the future weeks.

At the end of February, I will have four upcoming midterms. That is a lot. BUT, yeah Anna you can definitely do it well and perfect. Remember this and next semesters are critical ones for your life. You are at a point where you will make lifelong decisions: where to go what to do after graduation. All the effort you pay during this time will be the effort that pay back the most, they will pay off and yo gotta remember, if you slack off and be laid back theses days, you will be regretful for the rest of your life!!! So get fully prepared for the midterms: do whatever you can to review and practice. March is an important month. I will plan to go to Stanford to visit ICME and also other departments during spring break, and then I’m also hearing from ICES about my summer research internship. Prepare for GRE whenever you can!!!

Life is beautiful and trust me you will utter this sentence a lot in very near future. Be a happy optimistic capable whole person and I believe you can Anna! YOU ARE THE BEST, AREN’T YOU?_DSC0537

When can I say I tried all my best? When can I realize from the deepest of my heart that I’m not satisfied to be mediocre? When can I comprehend the essence of the word “no pain no gain”? When can I save myself from meaningless self doubt? Alas, it should be today. No better timing. It is today when you focus on yourself and on what you crave to reach.

II. How Should I Live in This World: from Anxiety to Peacefulness 我应该怎样存在:(二)从浮躁到平静

I’ve now been studying at UT and living in Austin for a whole year. I cannot help reminiscing about the beginning of the spring semester 2014, when things happening on me in the other half of the world appeared to have been new, unfamiliar, and sometimes hard to handle.

I was looking down from the window of a small interstate airplane. Texas’s land is mainly of dark yellow and dark green color. It was in January; nevertheless I seemed to still feel the heat waves hanging over near this vastness. “Is this indeed the city the soil I will abide in for the next one and a half year before graduation from college? Will I possibly like it?” I was murmuring inside. Texas was too sparse, big, and “barren” to a student who had a fairly long living and academic stay in cities like DC and California. “Anyway, this is my choice and the optimal choice at that moment.”

I was anxious about my life in a new environment. Life exposed his complex and various sides to me which are no longer restricted to grades but issues like which apartment is ideal to pick for next semester, what to buy to cook for meals, how to self protect when I’m in potential danger because of walking homebound alone from the library at midnight, how to make friends who have implicit differences of outlooks from me and have explicit differences of languages used for communication…… Some part of me was so uneasy about all the miscellaneous things popping up everyday that I was so discreet of tackling them down one by one: I moved twice in two months from the west of city of Austin to the east and finally to an apartment near school…… I was on my own at a young age of 19. I like this sentence: “the quickest way of learning new skills is by expanding your comfort zone.”

In the first semester, I met kind and friendly peers and seniors who greatly helped me smoothly transit to regular life. I encountered a great but disputable existence in this world because of whose love I have received selfless love and care from strangers and have been trying to output equivalent attendance to others. I got a 4.0 GPA in the first semester and got an offer from a NGO to campaign for water conservation nationwide. I got an offer to study abroad in Botswana for a month during summer under the department of geography. I knew about a well known and very kind and approachable professor at UT in geophysics– Sergey Fomel and attended his software workshop at Rice University. During summer, I worked in the research team of Columbia University in the department of political science, surveyed the spanish people living in Austin, and used these data to test several hypotheses. However, what I did not do very well was the three summer courses: US government/ Texas History/ US History. The possible reasons are: 1. it was during summer when I was on the one hand part time working and on the other hand studying. What I should enhance is the skill of not only handling multiple tasks but more importantly dealing with them in a way as perfect as expected. 2. I thought it was not relevant to my interest and not taken at UT but the Austin Community College and after a long semester I felt tired. What I should improve is, first my sense of responsibility– to myself and also to my parents’ money and second my respect to knowledge– no matter where it was imparted to me I should treat it as equal and the same. 3. No close friends around or friends are travelling around to places I want to visit as well. What I should  avoid is the negative emotions invoked my others’ happiness gained from doing things I desire to do– If I long to travel, then I should make plans and learn to travel safely on my own, and second I should learn to have fun even when I’m alone in a new environment,  to drive out the unhappiness/isolation– I should utilize the internet to check out the recent events happening or are going to happen in town, so on and so forth.

During the second semester at UT, I took five classes composed of four math and one cs class and simultaneously I was auditing Professor Sergey’s class. I had my first research project, small as it was, with my mentor Sona through the Directed Reading Program at math department and did my presentation in 15 mins in front of my peers and professors. I got an interview with Dell on the position of Software Engineer. I got an interview as a tutor at Sanger Learning Center. I got an interview as an Outreach Assistant at Sanger Learning Center. I got to the second round of membership of Texas Undergraduate Computational Finance. I got 4 As for four of my classes and one B+ for Probability. I applied for Math Honors Program. I registered Austin Half Marathon in this February.

What I have not done perfect are as follows:

1. I made several attempts to read Professor Sergey’s papers and was hoping to get started my undergraduate research with him, but I hadn’t got any improvement about my independent research.

2. I was auditing the class but I was not able to take the final of the class.

3. I failed all the interview I got, and didn’t get up to go to the Outreach Assistant interview in the morning.

4. I was kind of slacking off during the first half of the probability class, so even though I was ranked pretty high in the final accumulative exam, I was able to pull up my grades all by one exam.

5. I was not running or working out regularly as expected.

How to improve:

1. Since now I’m pretty sure what my interest is, geophysics, I should focus the majority of my time and energy on it instead of hitting on too wide a field from Software Development to Sociology. The more you input the more you output.

2. I should lay down myself and do not be too self conscious– Don’t put too much attention on the result and thus be afraid of failure.

3. I should talk to Sergey more about what I’m concerned about, what I did not understand, and how to work for him, and his standards for phds. I should cherish this great resources. I may go to Pickle once every two weeks to talk with him.

4. Refining my resume and my interview skills. Getting as many interviews as I can through websites like Indeed/Linkedin– Remember, I’m young, and I’ve got nothing to lose.

5. Learn to quickly adapt to classes of different styles and forms. Don’t find excuse of laziness. Keep up the good work from the beginning to the end.

6. To merge regular work-out as part of my life.

However unsatisfied I am about my own life, I still thank life for bringing different interesting and respectful people in my life. Jun Zheng was a tough guy who came from a farmers’ background and was truly totally on his own ever since his university life; now he is a L2 student at UT Law School particularly studying the law of Intellectual Property. Heidi Zhang, now applying for UT grad program,  graduated from the same high school as I did and has had a energy loaded and adventure fueled life on the wicked trails among mountains and woods. Lyra Hao was a Phd student at Stanford now in Geophysics; she is a free spirit trespassing the majority land of the world and catching countless wonderful moments by her professional level photography skills. Paul and Judith are an old but young couple who taught me how to face the twists and turns of life and who infuse me the peacefulness/composure. (CONTINUED)

Now I’m in China with my parents. I was not peaceful here. I can only regain the peacefulness, short it might be, there at UT where some part of me, be it the heart or seoul is on the way to the truth and freedom of life, as time never stops for my regret and idleness.

CONTINUED……

I. How Should I Live in This World: From Shanghai to Austin US (我应该怎样存在:(一)从上海到美国)

去年的今天,我在上海和华侨基金会的伙伴们,jojo,veronique度过新年。霓虹的星星点点装点着上海滩。我在虹口区的国峰大厦顶层,华侨基金会的会议厅内,吃着零食,唱着卡拉OK,唱走过去四个月的在上海,在越南河内,在美国华盛顿,在波兰华沙,在中国上海中国成都的经历的劳累,奔忙,焦虑,也感激着这四个月带给我的心智的收获,人生价值观的完整。我期待着一周后飞向另一个城市,德州奥斯汀。

2013年七月初到八月中旬,我在美国华盛顿的乔治城大学度过。我通过原母校,即上海外国语大学,提供的面试和考试的机会,获得去GU交流学习一个月的奖学金赞助机会。短短的一个多月时间,我需要迅速适应新的语言环境,两门文科课程,即美国宪法基础和哲学入门对于一个英语非母语的我来说,并不轻松。我一直对英国议会制辩论感兴趣,选择这两门课也私以为会对我的辩论技巧和内容上有所拔高。但真正一板一眼的学起来,我似乎并不很享受。老师上课所涉及的美国文化生活,比如提到的一些美国大事件,调侃的当时在美国范围内为人熟知的人名地名,总是让那个坐在前排竖起耳朵的我一头雾水。即使没办法完全听懂上课时老师说的每一句话,我完成的法律写作仍然能得到A-到A的成绩。这种让我受到挑战,不舒服的氛围让我感到兴奋鼓舞。我回想在上海读大学的我,总是班上的佼佼者,每学期特等奖学金总有我的份。我又想着我学习的专业,英语口译和笔译,我真的希望以后从事这方面的工作吗?

八月十号,我回到了成都的家,我知道我心中转学的想法早已蠢蠢欲动,那为何要动摇?想到了就去完成,没有什么来不及,那只是自我逃避害怕失败的借口罢了。那时我还有二十天准备托福考试,不,其实是十五天,因为有五天我会去英国议会制辩论训练营当助理教练。虽然时间紧迫,但我告诉自己,我一定可以取得我想要的分数。

九月一号,我回到了上海开始2013年秋季学期的学习。这一学期对于我有很重大的意义,为了转学,我既要维持高GPA,同时,我必须分配出足够的时间准备完成SAT考试。除了这些标准化考试,我需要去遇见不同的人经历不同的事,写出能展现自己思想和个性的文书。也许是想着转学之后的学习生活,我充满力量和精力:还准备着考完SAT后一个星期去波兰华沙参加联合国气候框架会议第十九次各方代表大会。

九月份开始,我每天早上六点半起床,除了正常的上课,图书馆便是我的第二个寝室。SAT虽然是美国的高中生考试,但对我来说并不是小菜一碟。我寻求搜索着各种SAT考试技巧和资源,每天作出计划,尽力完成。我记得很清楚,到过完国庆十天(学校空空荡荡,我哪儿都没去,图书馆不开门,我每天就去教室和考研的哥哥姐姐们自习),我才算真正把单词关过了开始刷题,那时距离我到越南河内考试只有一个月整,距离我到华沙参加论坛有一个月零两周。我没有任何人可以倾诉,每天像个不问世事的女疯子。我记得一次难过至极,在图文大厅给爸妈打了个电话,我放声大哭,根本不顾周围看书的同学,我说我不喜欢我现在的状态,我处在的氛围,我的专业。我要转学去学习我想学的,我想过的生活。爸爸说,你自己看着办吧,你走得了,我们就供,走不了,烂摊子自己收拾。世界好似与我为敌,我别无退路。(由于高中入党,我被分配党章学习小组的任务,但是分身乏术,我几乎没有尽到任何责任,党员身份岌岌可危,爸妈各种施压。。。)

我对于选择学校了解不深,但是我知道我的大方向是ivy不成(现在想来当时真片面),就去理工科较强的学校。那个下午,我在顾悦老师办公室聊到我在准备转学的事情,他随口一句,你去试试德州奥斯汀,我在那儿做过交流学者,是一个很适合你的学校。我回去查了查,随手申请了他的2014春季转学,想着就算是保底校。

十一月一号,我从上海飞向河内,在皇冠酒店完成了五个小时的SAT。十一月十四号,我记得那是星期五,我早上上完两节课后赶向浦东机场飞往华沙。路上的各种drama,不确定性,以及在肖邦机场赶掉飞机就不一一赘述。

一切有条不紊进行着,我已经感觉不到劳累,只有内心的无限希望和坚定不移。

十一月底,我收到了德州奥斯汀的录取通知。录取了社会学和数学专业,正是我所想的,我想在本科阶段成为一个全面的人。

原来这句话是真的:你想去哪里,全世界都会为你开路。

My experiences at Directed Reading Program at Math Department at UT Austin

It has been several weeks from the perfect ending of the great Directed Reading Program sponsored by the Department of Mathematics at UT Austin. I am one of the final 20 presenters on 24/12/2014 who had shown through their presentations that they did make the best and most out of this educational resources.

I conducted my independent mini project under the patient guidance of Sona Akopian about the analysis/theories of wave equations, the numerical algorithms, and also the visualization– making a 2-D wave equation movie in Matlab.

Here are what I output. For further details? Stay tuned!

With CFL conditions. Initial equation as a sine equation

With CFL conditions. Initial equation as a sine equation

with CFL conditions initial equation as a parabola

with CFL conditions initial equation as a parabola

Without CFl conditions. it blows out.

Without CFl conditions. it blows out.

What is Reproducible Research and why it is important.

“A community involvement in actively maintaining reproducibility of previously published results assures that a body of knowledge in a computational field stays alive and can be scientifically extended though continuing contributions.”

–Sergey Fomel

“It is a big chore for one researcher to reproduce the analysis and computational results of another. I discovered that this problem has a simple technological solution: illustrations (figures) in a technical document are made by programs and command scripts that along with required data should be linked to the document itself.”

–Jon Claerbout

I’ve got little research experience particularly with large-scale computationally-based project until I attended the second Madagascar Working Workshop held by “The Rice Inversion Project” this summer in Houston, where I also first knew about an old but also new concept in natural science even recently in social science area– Reproducible Research.

So what is it, how has it been raising researchers’ and professionals’ interest, and why it is of increasing importance?

As the word “Reproducibility” denotes, it is the ability of an entire experiment or study to be reproduced, either by the researcher or by someone else working independently.(Wikipedia-Reproducibility). To put it more vividly, I believe “Reproducibility” is like the recipe you attached to your freshly-made-creative-savoury dish with which people can reproduce the dish at home and even add some new ingredients or spices as time flies. Then Reproducible Research is naturally one where researchers in particularly but not restricted to the field of computational science deliberately “serve” for the scientific community to provide the complete development environment and the complete set of instructions.

How did reproducibility raise professionals concerns? I want to mention a recent event in academia in biology, and also a situation I just run into this afternoon with my mentor Sona in our small computational research project about wave equation.

140806_SCI_Riken.jpg.CROP.promo-mediumlarge

January 2014, two papers, whose lead author, Haruko Obokata, a young and beautiful Japanese female researcher in her early thirties, about a simple new method for creating stem cells were published in the prestigious journal Nature to much fanfare. It suddenly made a stir in media in Japan and beyond. However, these two papers were retracted in July after another researcher based in UC Berkeley concluding that her studies are dubious with the fact that his team had tried so many times but were still unable to reproduce the results. We can see the paramount importance of reproducibility in disciplines like biology, physics, mathematics whose quality of detailed deriving process determines the persuasion of authors’ achievements. In terms of scientific computation or the field of software development, both of which underline the final results and figures rather than the process, however, the significance of the implementation of reproducible-research frameworks stems from, first of all, the necessity for long-term maintenance of reproducible results. “Simple storage of software codes is useful but insufficient, because typical scientific codes have multiple dependencies( libraries, compliers, operating systems, etc.). With time, different parts of the software environment change and cause the reproducibility of previously published results to break down.” (“Reproducible research as a community effort: Lessons form the Madagascar project”–Sergey Fomel). Further, situations where researchers and publishers’ colleagues question part of their paper and look for backup details from them are too common and familiar to many of the professionals in the field, and their review and tracing back the right set of codes used in that paper and not to mention the right set of parameter they select to output the nice results takes these researchers a fairly big amount of time and thus decrease their research productivity or academic competitiveness. This is so true even for an undergraduate student who is doing her own independent project like me. This afternoon, I was debugging the codes my mentor sent me to refer to which can make a movie in 2-D about a stable moving wave. It’s a matlab code and I tried many times but still could not find what’s going wrong. I decided to check with google one command by one command. After tedious repetition of “copy and paste” on google I finally found out that it is because one of the file “avifile” has been removed from Matlab and I shall use “VideoWriter” class instead. My personal experience exemplifies that inevitable changes in the software environment easily cause breakdown and without dedicated and continuous maintenance, the computational results easily loose their credibility and practicability.

At the end of this blog, I’d like to provide two very useful and detailed website which people interested in the specific tools for implementation of Reproducibility can refer to afterwards:

http://reproducibleresearch.net/events/

http://www.ahay.org/wiki/Houston_2014 You’re also welcome to join the Madagascar open community, a great source comprised of three levels: programs, workflow scripts, and papers.

What I learned from Saranya Murthy’s talk at AWM

I will give you but three words, IMG_20141114_163651“FORESIGHT”,”DILIGENCE”,”STEADFASTNESS”, If you ask me to summarize what I took away from Saranya Murthy’s talk at UT AWM (American Women in Mathematics)’s weekly talks by successful and influential women in STEM in academia or industry. Saranya Murthy is an International Product Support Associate Engineer at Dell. She was a formal employee at Workbrain(now Infor), a software used by employees, providing web-based workforce management solutions for large enterprises. She graduated from University of Waterloo in Honors program in computer science in the faculty of mathematics. Later on after her graduation and working for years, she accomplished a part-time MBA program at York University in Canada. Miss Murthy must be originally from India, as shown from her skin and that her parents who are traditionally dressed in Sari(an Indian female garment) and Lungi are also in the audience while she’s giving us the talk. She maintained a graceful demeanour from beginning to end. She was very clear in logic- she gave us useful suggestions(I will demonstrate them later in this blog) about how to build personal development and why those ways are efficient and helpful; she was very precise about diction too, which can indirectly reflect her work ethics and lifestyles. Later after the presentation she told me she loves reading and holds book clubs with her friends (recently they’re reading “Lean in” written by Sheryl Sandberg- Chief Operation Officer at FB. Accidentally I am reading this book too. I’m currently on the first chapter about the gap between women’s ambition and leadership).

I want to specifically note down the advice given by Saranya Murthy during her talk and thus to benefit my readers.They’re as follows:

KNOW THYSELF

1. 360º feedback –http://www.reachcc.com/360reach -Click on 360º Reach Basic- it’s free!

2.MBI test(Apply the rest results to interviews and occasion of the same nature)

3.”Please understand me” by Dr. David Keirsey

4.”And Now Discover your Strengths” by Marcus Buckingham and Donald O.Clifton.

HOW TO BE A “CARE-FOR-YOUR-FUTUER” UNDERGRADUATE STUDENT

1.When deciding to registering for a class, ask the professor also for yourself “How can I use this in the real world?”;

2.Connect with like-minded professionals through information interviews and relevant social events;

3.Thoughtfully craft a 30 second elevator pitch which you may use later to pique people’s interest and let them quickly experience your charisma;

4.Develop a professional social media persona:

-Linkedin (Employers will usually check for Linkedin for candidates’ information and they particularly pay attention to your coworkers’/mentors’/professors’ recommendations and comments on it)

-Twitter (Thought leader) -WordPress blogging (why I’m blogging now:=)

Before the conclusion of my blog, I also want to point out that Miss Murthy is not only a role model in terms of work ethics and academic excellence, but more importantly she is a role model for me in terms of how respectful and obedient she is towards her parents. I approached Saranya for answers and guidance to my specific questions right after she concluded her talk, when her elderly parents are waiting for her besides. She friendly and politely addressed to me that we’d better find a place to sit down and talk so that on the one hand we can talk in details and on the other hand my parents can sit down and wait for me. These details of her bearing unintentionally touched my heart. I recalled how I yelled at my parents when I’m unhappy and how I unconsciously ignore them when I’m with friends…… I’d love to end my first blog at WordPress with Miss Murthy’s self-summaries on Linkedin which can constantly remind me of what precious personality my role model has and what I’m supposed to act to acquire that kind of personality and charisma in near future.

“• 9 years of experience in Quality Assurance (QA) for enterprise software

• Demonstrated leadership skills – motivated individual who can take charge and drive change

• Strong understanding of Software Development Life Cycle and relevant QA concepts

• Excellent technical and analytical skills – experienced at troubleshooting software issues and tracing defect triggers

• Strong experience in defining and improving software test processes

• Proven Project Management skills – responsible for overseeing QA team to deliver thorough and timely test efforts for 3-month release cycle

• Enthusiastic team player with effective interpersonal and leadership skills

• Experience working in Cross-Functional Teams – representing QA and liaising with Developers, Product Managers and Technical Writers

• Client-oriented approach: well-honed ability to test software from the “user’s point of view”

• Excellent written and verbal communication skills”