I don't think any of this is wrong. When learning subnetting, learning it on paper with binary is a great way to understand the theory. I am not, by any means, advocating that this should not be the foundation upon which a network professional builds. That being said, I also don't think many network folks who are carving subnets on a daily basis are imagining all the ones and zeros, AND'ing to get the network ID and NOR'ing to get the host ID. Part of subnetting becoming easier is finding a way to easily do the math in your head.
Some may just use a subnet calculator tool, as there are many available online. However, I feel that relying on such a tool for anything other than sanity-checking can be time-consuming. When building out or checking networks, I can't imagine having to stop and look up where my subnet boundaries are, or how many hosts (or networks) I should expect to be able to fit inside.
After spending time thinking about a real answer to how I can subnet in my head, I think I've got an answer. I've found that over time, I use the same basic method of calculating subnets in my head, and it works well enough for me. I am sure this is one of those personal preferences -- but hopefully my experience will be helpful to others. Also, the old adage of "if you do it enough, it becomes easy" still applies! Even following mental shortcuts like this require practice before it is easy and repeatable.
Step 1: Think CIDR
I find that it's easier for me to think in CIDR notation, and then convert after-the-fact into a traditional dotted decimal subnet mask. I'm not sure why, but this just makes it easier in my head. CIDR notation is the forward-slash followed by number of "high" bits in the subnet mask. For instance, the typical subnet for most home networking gear is 192.168.1.0/24. This means that the subnet mask is 255.255.255.0 (or in binary: 11111111.11111111.11111111.00000000). If you count the ones, there are 24 of them.
Step 2: Commit the main boundaries to memory
In the example above, we used /24, which I'll call a main boundary. I consider main boundaries /8, /16, /24, and I suppose /32. Using these boundaries as reference points can help to easily move up or down from a subnetting perspective.
To explore the concept, let's look at the number of IP's per subnet, and number of subnets per /24, for each longer mask:
/25: 128 IP's, 2 networks per /24
/26: 64 IP's, 4 networks per /24
/27: 32 IP's, 8 networks per /24
/28: 16 IP's, 16 networks per /24
/29: 8 IP's, 32 networks per /24
/30: 4 IP's, 64 networks per /24
/31: 2 IP's, 128 networks per /24
/32: 1 IP, 256 hosts* per /24 (the /32 represents an individual host - called the host mask - not a subnet)
Note that as the subnet got smaller (each one is half the size of the prior), twice as many could fit into the /24.
As shown above, I find it's easier to only "think" one level up or down when possible. What I mean by this, is that it is easier to stay contextual to the main boundary either above or below. For instance, when I think of a /24, I think about it as the fact that each /24 is comprised of 256 IP's (not all assignable, of course!). In the same respect, I think about a /16 as that it can hold 256 /24's. At that point, if someone were to ask what subnet size is needed to accommodate 900 /24's, I can divide 900 by 256 to find that I need at least 3, but less than four /16's to cover that.
Using this technique, it becomes easier to at least work within some large windows of subnetting to get us at least in the right ballpark.
Step 3: Split the difference
If you look at the example above showing the number of IP's and networks (per /24) in the CIDR /25-/32, does any one of those look easier to remember than the others? To me, /28 sticks right out. Halfway between /24 and /32, it's a good mid-point. Importantly, also note that /28 has the same number of hosts as it has networks: 16.
The same is true for the other midpoints: /20, /12, and /8; each contain 16 subnets of their next longer boundary, and each will need 16 subnets to fill its next shorter boundary. In other words, a /20 contains 16 /24's, and there are 16 /20's in a /16. This makes it relatively simple to move up or down from anywhere, without having to keep track of too much in your head.
Step 4: Shift up or down as needed
Now that we have a solid number of well-understood reference points to work from, subnetting can be done in much more digestible chunks.
Take the example earlier, if you were asked to provide a supernet, possibly for route summarization, that one could carve out 900 /24's starting above 10.25.0.0 to assign as end-user subnets. We know that there are 256 /24's in a single /16. By dividing 900 by 256, we know the number is "3 point something"; in other words, we need to subnet up to the next bit boundary after 3 (in context of a /16) to accommodate this. Thanks to the simplicity of binary, we can count upwards til we pass 3: 1 /16 (/16), 2 /16's (/15), 4 /16's (/14). In order to have over 3 /16's worth of /24's, we had to move to a /14, which gives us 4 * 256 /24's, or 1024 /24's.
Alternatively, another way to solve this would be to again figure out that at least 3 /16's are necessary, so immediately jump to next memorized CIDR - /12. Knowing that a /12 will contain 16 different /16's, we can then work through longer CIDR's to hit one that holds only four /16's. A /13 holds eight /16's, and a /14 holds four /16's. Bingo!
Step 5: Figure out network ID and subnet mask
Following the same example, we now know that we need to assign a /14 somewhere after 10.25.0.0. We know that we could fit four /16's in a /14, so the bit boundaries for /14's are going to be in multiple of four. Starting with 10.0.0.0, the next /14 would start at 10.4.0.0, then 10.8.0.0, and so on.
Thinking of our multiples of four, we know that 10.24.0.0 is a multiple of 4, but we need to start after 10.25.0.0. This means that our first /14 after 10.25.0.0 must be 10.28.0.0/14.
Now we know our network ID is 10.28.0.0, and spans through 10.31.255.255. However, we still need that dotted decimal subnet mask!
Remember that a /16 is 255.255.0.0 (16 high bits). Taking away one high bit would leave 255.254.0.0 (/15), and then one more would leave 255.252.0.0 (/14). Now, another mental shortcut: since we're being contextual to the /16 -- knowing that four /16's fit into a /14, we can glean the subnet mask by subtracting 4 from 256 in the second octet (the octet we are cutting into by taking away "ones"): 255.(256-4).0.0 = 255.252.0.0. Play with this - it works. For instance, a /20 holds 16 /24's -- so 255.255.(256-16).0 = 255.255.240.0, which is the same as a /20. Neat, eh?
To summarize now, we know that to provide a supernet of 900 /24's, starting after 10.25.0.0, we will provide:
CIDR: 10.28.0.0/14
Network ID: 10.28.0.0
Subnet mask: 255.252.0.0
Step 6: Practice!
Writing all of this out, it it obvious that this is not a simple easy answer to mental subnetting. However, with practice, these rules will help to make subnetting a manageable task that can be done without the need for calculators or tools. Again, this is not a substitute for understanding the theory of how to subnet; it is simply a way that I find it easier to run calculations without having to scratch out a bunch of notes on paper.
Whether you follow this method of making the math more bite-sized, or another method, it is important to practice it thoroughly. When the time comes to complete subnetting tasks, being able to do it in your head on the fly will not only impress others but will also save you time.