Brainstorming – Week of 01/11/2021

Notes: Getting used to a workflow for this weekly post, so some delays are expected throughout the first couple posts.

Python tid-bits

Make things small

Very often python consumes a stupid amount of memory, and it make suck processing scripts or mass processing annoying. TO summarize the tips from the blogpost:

  1. Data classes are smaller than dicts. Quote:
    Starting in Python 3.3, the shared space is used to store keys in the dictionary for all instances of the class. This reduces the size of the instance trace in RAM.
    Which seems a bit magical, but I will take it.
  2. __slots__ helps to remove __dict__ and __weakref__ from a class and it is noted that it's the main memory saving technique for pure python classes. Quote:
    This reduction is achieved by the fact that in the memory after the title of the object, object references are stored — the attribute values, and access to them is carried out using special descriptors that are in the class dictionary.
  3. There are some additional classes we can use in the RecordClass library, which involves tuples not part of the cyclic GC process. This can minimally reduce size of the pure python class object size.
  4. Cython is not just for speed. It's for memory footprint too.
  5. Numpy for arrays (as usual)

Make things fast

The topic of database insertion from python is quite fascinating, and this is probably a pretty good primer for psycopg2 and postgres. The takeaways are:

  1. Iterators matter. Use yield a lot
  2. Prefer built-in approaches for complex data types. This means in general using psycopg2.extras.execute_batch and psycopg2.extras.execute_values for complex data types and small data volume.
  3. Page size matters, but allows some mem usage blow-ups.
  4. A copy approach with the following text class is speedy, but not type safe, since we are directly injecting strings

As an aside, it also came with a very nice profiler:

Make things fast again

Very simply put:

ADVANTAGES OF NUMBA:

  • Ease of use
  • Automatic parallelization
  • Support for numpy operations and objects
  • GPU support

DISADVANTAGES OF NUMBA:

  • Many layers of abstraction make it very hard to debug and optimize
  • There is no way to interact with Python and its modules in nopython mode
  • Limited support for classes

ADVANTAGES OF CYTHON:

  • Control over Python API usage
  • Easy interfacing with C/C++ libraries and C/C++ code
  • Parallel execution support
  • Support for Python classes, which gives object-oriented features in C

DISADVANTAGES OF CYTHON:

  • Learning curve
  • Requires expertise both in C and Python internals
  • Inconvenient organization of modules

By standard, I don't seem to find data science problems that need anything beyond Numba. Great library. Though I do want to write more Cython as a way to learn , which always is fascinating but, um, challenging.

Jupyter Notebooks eats the world

Have we gone too far? I have seen people say the future is Jupyter as a book, which makes sense in my view, although could do with some formatting nice-ness beyond markdown. But even for me Jupyter Notebooks as a Product is pretty radical.

Effectively the solution proposed here is nbconvert based, with Plotly. The notebook structure is pretty normal: Background, ToC, setups and data-loading, analysis, with interactivity and dashboarding coming from Plotly. Nbconvert then nicely exports to html for host, or handover.

This allows all things to be done in the same "stack", (even extending to the papermill airflow world!) but from my perspective, it still feels a bit raw.

Reviewing "Machine Learning in X"

Machine Learning in a week/year

This is a classic medium post, in fact, I do think it is the quintessential data science medium post: some useful information, a clickbait-y title, and definitely some over-promising. Still the post is pretty good to structuring a self learning path. I do think there are two major things of note:

  1. The path is too aggressive. It's fairly odd to me that it's a very deep-end method of learning. The idea is application as soon as possible, which is admirable, but the lack of review and seeking out mentorship and collaboration seems off.
  2. The path is very light on statistics and mathematics basis. Which is fine, if you view doing data science and machine learning as a purely engineering endeavour (which it often can be), but I would

As an aside I have started a unscheduled post of learning path and resources compiled in the blog, hopefully I will update it from time to time.

A good data science profile

Speaking of the "learning data science" side of things, I have heard about a framework of portfolios (more on this some other time:

  1. Demonstration
  2. Narrative
  3. Product

This is just a fantastic example of demonstrative profiles. Readable, clear and concise.

Fintech, SaaS and work

Fintech, a guide to enlightenment

A quick guide to how we can think of fintech as a business prospect:

  1. Doing things banks (or other financial institutions) have done but better? The author mentioned risk scoring in insurance and whilst I do feel personally attacked, it is true. From my experience, the data moat is very very real, especially data moat that comes from private source (IoT, existing customer etc.)
  2. Transaction (or other exotic) data is valuable? I think a lot of founders/managers/directors type get very enchanted by this type of exotic data process. My counter would be, you better have a good statistician on the team, if not, you are going to spend a lot of time wheel-spinning on either bad results or worse, releasing into the wide world something that is misleading. (see $Z)
  3. People will trust you instead. As the author pointed out, being a good middlemen or platform is mostly about building trust and relationships (B2B) and that's more FIN than TECH.
  4. Someone else's network is nice, and I am a bit more hardworking than the incumbent. This often includes paying marginal costs to be a third competitor, or run on incumbent's networks. But see 7 for how it actually ends up being.
  5. Ignoring regulators until you can't anymore. And indeed it is true, start-ups can go under the radar and hope bad things only befall them when it's too late!
  6. Cheap but useless directly to customers. I see this as the flashy "app-based" movement. Apps are actually cheap, compared to real customer support and physical presence. As we get more comfortable in that regard of being completely virtual, we can do cheap but bare minimum useful better. Personally I am not very bullish on customer interaction systems, as it seems it barely works with larger internet platforms.
  7. Getting your act together with respect to an industry standard where the industry has conspicuously failed to do so. Again though, as the author states, this is all good until incumbents decide to move their butts, and that's when the founder will inevitably sell.
  8. Fucking around with new assets. Whatever the new hype cycle is, build a platform to sell that! Do it fast and maybe you can become Coinbase/OpenSea/{Next DeFi Platform}!

Anyway, a fun, perhaps too cynical way to think of what's viable and possible in modern fintech.

Working alone

Some thoughts on working alone:

  • God bless productivity nerds,. Not agreeing it totally, but I think remote puts an emphasis on documentation and writing as a form of communications. Meetings are cool for collaborative problem solving, but very often meeting results in wheel-spinning and nothing concrete gets noted etc.
  • Open offices are like, 25% the reason I personally don't ever want to go back to a true office 100% of the time. Give me my space.
  • I wouldn't argue productivity being a main factor for WFH, the author is correct in that regard.
  • New starters do have it rough in this kind of full remote situations.
  • If you gonna build product, especially alone or small team, B2B is the way to go.
  • Business > Code.
  • Share things with people, don't fear people "stealing" ideas.
  • Tool and playbook building is good! it's never time wasted, because it's always learning and reusable!
  • JFS. Just let it go, there's no perfect, just let it happen.

Scalable Offices with Raspberry Pi?

Raspberry Pi is so Cool Cool Cool Cool Cool.

I think my ambition now is the following:

  1. Raspberry Pi smart home office. Gonna keep normal home things out of the way, but a form of automated, smart personal office is very cool. (Productivity might suffer, but it's cool!)
  2. Build a NAS and processing blade using Raspberry Pi. It's a form of cheapo-neato server room, that seems like a great thing to have. Between this and NGRok I have a pretty cool personal-ish cloud going.
  3. Get a 3D printer and start doing custom casing, custom tiny products, like the retro handheld kit etc.

Might take years to get going, but this is what I am thinking of now as a fun hobby. More on this eventually.

Other things

Space X Starships

Bullish on Space X. It does make me semi-uncomfortable that perhaps the key to true space exploration and colonisation is in the hands of a private company, but it is what it is. If Starship delivers true mass lifting capabilities, we would be in the future already. Sadly, our aesthetics for the future is way, way less cool than what we have imagined.

Visual story telling gold

The best thing an analysis data scientist can do is befriend or become a designer and a web dev. Histograms might be good, but this is what's truly compelling.

Tokyo, walkability and urban planning

Walkability is a gift. I think more and more, we must think of cities being squished together villages, its the driving idea of good public transport networks, its the driving idea of European city walking blocks. Zoning must allow a walkable zone be self sufficient, if not, the energy and resource it takes to shift people in and out, is that really worth people time and good for the environment?

“You must orient yourself in it not by book, by address, but by walking, by sight, by habit, by experience.”

And that's a very nice sentiment to building something where people actually try to build out a community.

How demagnetizers work

Space Britannia

ELASTIC KHAOS SERVICE

Making friends with ML

Hope to see this series to the end. I think this is such a great first intro series.

ASMR

So, when am I getting my new 60% keyboard?

Hi, do you know about Hatsune Miku?

Every so often, a creator you follow finally hunts his white whale. I don't think this is a perfect video, might even be narratively kind of weak? but always fantastic to see the conclusion of a saga.

All Tomorrows

I spent a whole night staying up reading it. A review of all Tomorrows and it's brethren soon.

SARDAUKAR

** Throat-singing **