This post is going to be about crawling an entire domain in Node.js. You can find the first posts of the series here: Web Scraping / Web Crawling Pages with Node.js.
For testing purposes I have created a simple set of HTML pages, that should resemble a generic website. It has some page and we want our crawler to go through them and make sure it finds all of them, where they’re linked. That means when our crawler hits a page, it should keep track of the links it finds and then only proceed to pages it has not crawled yet.
Continue reading Crawling an entire Domain / Website
Elixir is a language that runs on the Erlang VM (BEAM) and looks a bit like ruby. It caught my interest, because I wanted to learn something that was different and not very mainstream, a little inspired by the 12 resolutions for programmers post.
Continue reading Accessing Structs in Elixir
I’m aiming to automate things I don’t want to do. Reinstalling my operating system is one of them. Now I have Dogmeat. (like the Fallout 4 companion dog)
It all started when I had to say goodbye to my old work computer and had to use my old private laptop again, I had to install a recent Linux Mint on it.
Continue reading Dogmeat
This is just a short post about exceptional efforts (not mine), motivation and excuses.
Reading and writing are essential parts of my life and recently I had this song stuck in my head from Hamilton, a musical about U.S. history. It’s pretty great and I can recommend it!
Continue reading Write like tomorrow won’t arrive
Welcome to part 2 of the series crawling the web with Node.js. In this article we’re going to have a look at what valuable content we can grab from a page. Important parts when writing a crawler are obviously links, because our crawler wouldn’t know where to go next without them.
The data I’m going to extract from a page are not necessarily the ones you’ll want and it really all depends what you want with the project. Maybe you only want the content of specific tags or status codes. I’ll just put up some examples and you can see from there what’s possible and see what would make sense for your purpose.
Continue reading Web Crawling with Node.js #2: Building the Page Object
When working with code, especially with front end code, you might want to see a diff of two files. Maybe you have a build tool that’s doing something with it or just two different versions. The point is: You want to know exactly if two files are the same or just have all the differences listed. I’ll just share some of my favourite tools for that.
Continue reading Show the diff(erence) between two files [free GUI client]
A phrase I’ve come to think about recently is making it as in Wow, she/he really made it. For me, the problem in that is not that people are saying it wrong, but thinking it wrong. For me there is no such thing anymore or at least there will never be enough. It’s like that insatiable greedy devil in our heads that drives us further.
The drive is important, it makes us chase a job, clients, pick up the phone late at night or learn an extra skill, read an extra article or an extra book on top of what we’re expected to do.
The sad part is: You’ll never make it.
Continue reading How to know that you’ve made it
Bootstrap is a great CSS framework, but what if we only want to use the grid and not all the other features? You can do this if you either use the SASS or LESS version of the bootstrap framework. I’ll quickly demonstrate how you only take the necessary parts. I dug into this, because I was creating a landing page only featuring parts of the bootstrap framework to increase the page speed.
Continue reading Bootstrap 4 Grid only and SASS with Gulp
For starters, you’ll need to clone the github repository by running:
Continue reading Trying out the Facebook Flux Examples
This post series is going to discuss and illustrate how to write a web crawler in node.js. I’m going to write some posts on a topic that are database agnostic and the database part split up into the respective different databases you could imagine using.
Continue reading Web Scraping / Web Crawling Pages with Node.js