Data Science Study Note1
Prerequsite: As we all know that data science is heat in recent year, thus we all should learn knowledge and skills about data analysis and data mining.
- First we should study knowledge about Numpy, Pandas, SciPy and SciKit-learn.
- Then, we start learning about Tensorflow, Keras, and Torch.
- Also, we need be familar with basic math concepts like probability, linear algebra. So, let’s start it.
1. Basics of Python
I skip some terms of this chapter beacasue I have used Python before, hence no need to repeat them again. I just share tips about Python.
Python has data set formats like these: set, list, dictionary, tuple
- About zip Operation
a = range(10) b = range(10, 20) c = zip(a, b) c
<zip at 0x10b91af08>
for m in c: print(m)
(0, 10) (1, 11) (2, 12) (3, 13) (4, 14) (5, 15) (6, 16) (7, 17) (8, 18) (9, 19)
- List comprehensions
This is a kind of useful tools to deal with data set, like this:
x = [1,23,32,8,33,97,123]
result = [m if m <30 else 0 for m in x] result
[1, 23, 0, 8, 0, 0, 0]
- all and any Operation
This is an useful method to judge whether the elements in a data set match the requirements or not.
conda = [item < 30 for item in x]
we can check the value of variables like this:
Variable Type Data/Info ----------------------------- a range range(0, 10) b range range(10, 20) c zip <zip object at 0x10b91af08> conda list n=7 m tuple n=2 result list n=7 x list n=7
- Generate Data Set
Note: In Python, funtion can return multi-results
def cal(x, y): return (x + y), (x*y), (x/y)
a,b,c = cal(10, 5) a
- Counter Collection
This is an useful collection, it inherited from collections library, that can be used to count the numbers of elements in a data set, like this:
x = [1,20, 10, 2, 20, 2, 2, 20]
from collections import Counter result = Counter(x) for key in result: print(str(key) + " : " + str(result[key]))
1 : 1 20 : 3 10 : 1 2 : 3
Actually, I do not often use that, but I must admit that this is useful tool to generate data. It is shown as follows:
def generate_odd(n): i = 2 while i < n: yield i i+=2 generate_20 = generate_odd(20) for m in generate_20: print(m)
2 4 6 8 10 12 14 16 18
- How to use *args and **kwargs
def cal(*args, **kwargs): sum = 0 for m in args: sum +=m for n in kwargs: sum+=kwargs[n] return sum cal(1,2,3,5,k=12,m=34)
Numpy is one of the efficient Library to operate datas, I have studied this before, and I have a study note on my blog. So, I just skip this.
Here is a simple example as follows:
import numpy as np import matplotlib.pyplot as plt %matplotlib inline x = np.linspace(0,5,500) y = np.cos(x) mplot = plt.plot(x,y)
x = np.linspace(0, 10, 1000) y = np.sin(x) mplot = plt.plot(x, y)