I've heard Pandas offer powerful, flexible and fast solutions for advanced data analysis and manipulation. Today I was able to tick off the intermediate Panda module @DataCamp and I was given just a taste of their power.
Getting to a stage where you learn actionable insights and begin to code off-platform is an exciting one. Todays lesson covered dictionaries and the Panda DataFrame. According to DataCamp, this is 'the de facto standard to work with tabular data in Python'.
I can now import, create, call, access and manipulate a database. I'm actually flying through this course much faster than expected. Yesterday I hypothesised 4-6 weeks. I'll probably be done on Sunday. I've been waking up earlier and earlier and the mornings before work is the perfect time.
As yesterday, here are my rudimental notes on Panda code structure and syntax:
pd.read_csv('INSERT_DOCUMENT_PATH_HERE', index_col = 1)
I dug around my applications and found PyCharm which I was using last year. There was one small coded project where I was importing .csv files and calling data. Today's lesson was very similar and a great recap.
A few things I had to clarify along the way once I was introduced to .loc and .iloc:
.loc is used for label based access. ['insert text here']
.iloc is used for integer-position based access [2]
Series VS DataFrame ['La la la'] VS [['La la la]]
A pandas series is a one-dimensional data structure that comprises of a key-value pair. It is similar to a python dictionary, except it provides more freedom to manipulate and edit the data.
A pandas dataframe is a two-dimensional data-structure that can be thought of as a spreadsheet. A dataframe can also be thought of as a combination of two or more series.
Thanks to educative.io for the definitions.
Peace out!
Comments